Skip to content Skip to sidebar Skip to footer

Python Find All File Names In Folder That Follows A Pattern

I am trying to find all file names in a folder which follows this pattern: 'index_YYYYMMDD.csv'. The 'YYYYMMDD' part represents the date of the data file. Some of the files names a

Solution 1:

import glob
glob.glob('index_[0-9]*.csv')

This will math the filename that starts with a digital .

John's solution matches exactly 8 digital .


Solution 2:

If you want to match exactly 8 digits with glob you need to write them all out like this

import glob
glob.glob('index_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9].csv')

Help on function glob in module glob:

glob(pathname) Return a list of paths matching a pathname pattern.

The pattern may contain simple shell-style wildcards a la fnmatch. However, unlike fnmatch, filenames starting with a dot are special cases that are not matched by '*' and '?' patterns.

If you want real regex, use os.listdir and filter the result

[x for x in os.listdir('.') if re.match('index_[0-9]*.csv', x)]

Solution 3:

I would take the following approach. You can define a simple file filter factory.

import time

def make_time_filter(start, end, time_format, file_format='index_{time_format:}.csv'):
    t_start = time.strptime(start, time_format)
    t_end = time.strptime(end, time_format)
    ft_fmt = file_format.format(time_format=time_format)

    def filt(fname):
        try:
            return t_start <= time.strptime(fname, ft_fmt) <= t_end
        except ValueError:
            return False

    return filt

Now, you can simply make a predicate to filter out the date range you want

time_filt = make_time_filter('20091101', '20091201', '%Y%m%d')

Then pass this to filter

filter(time_filt, os.listdir(your_dir))

Or put it a comprehension of some sort

(fname for fname in os.listdir(your_dir) if time_filt(fname))

A regex will be more general, but you don't need one in your case since your file names all follow a simple pattern which you know must contain a date. For more on the time module see the docs.


Solution 4:

This will get you where you want to be and allows you to provide start and end dates:

import os
import re
import datetime

start_date = datetime.datetime.strptime('20071102', '%Y%m%d')
end_date = datetime.datetime.strptime('20071103', '%Y%m%d')

files = os.listdir('.')
files_in_range = []
for fl in files:
    if re.match('index_\d+\.csv', fl):
        date = re.match('index_(\d+)\.csv', fl).group(1)
        date = datetime.datetime.strptime(date, '%Y%m%d')
        if date >= start_date and date <= end_date:
            files_in_range.append(fl)

print files_in_range

Post a Comment for "Python Find All File Names In Folder That Follows A Pattern"