2

I currently have a folder that contains multiple files with similar names that I am trying to read from. For example: Folder contains files:

apple_2019_08_26_23434.xls
apple_2019_08_25_55345.xls
apple_2019_08_24_99345.xls

the name format of the file is very simple:

 apple_<date>_<5 random numbers>.xls

How can I read the excel file into a pandas df if I do not care about the random 5 digits at the end?

e.g.

df = pd.read_excel('e:\Document\apple_2019_08_26_<***wildcard***>.xls')

Thank you!

OPM_OK
  • 190
  • 1
  • 1
  • 11
  • First list the files that match your specifications and then open the path of every item in that list? Like: https://stackoverflow.com/questions/2225564/get-a-filtered-list-of-files-in-a-directory – spijs Aug 26 '19 at 13:54

3 Answers3

3

You could use unix style pathname expansions via glob.

import glob

# get .txt files in current directory
txt_files = glob.glob('./*.txt')

# get .xls files in some_dir
xls_files = glob.glob('some_dir/*.xls')

# do stuff with files
# ...

Here, * basically means "anything".

Example with pandas:

import glob

for xls_file in glob.glob('e:/Document/apple_2019_08_26_*.xls'):
    df = pd.read_excel(xls_file)

    # do stuff with df
    # ...
Tankred
  • 196
  • 1
  • 9
2

Change your directory with os.chdir then import all files which startwith the correct name:

import os
os.chdir(r'e:\Document')
dfs = [pd.read_excel(file) for file in os.listdir() if file.startswith('apple_2019_08')]

Now you can access each dataframe by index:

print(dfs[0])

print(dfs[1])

Or combine them to one large dataframe if they have the same format

df_all = pd.concat(dfs, ignore_index=True)
Erfan
  • 40,971
  • 8
  • 66
  • 78
1

If you want the 5-digit part to be changeable in the code, you could try something like this:

from os import listdir
from os.path import isfile, join
import pandas as pd

mypath = '/Users/username/aPath'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

fiveDigitNumber = onlyfiles[0][17:22]
filename = onlyfiles[0][:17]+fiveDigitNumber+onlyfiles[0][22:]

df = pd.read_excel(filename)
Nora_F
  • 431
  • 5
  • 17