1

I'm stuck again! my story is:

I need to find files named "tv.sas7bdat" that exists in different folders in a directory and save the content of all files found into a single excel file on my desktop. With my actual Code I can get all paths for that file and transfer their content to a dataframe. But, I can't append all dataframes into one single Excel file.

In my excel I find only the last dataframe !!

Here is my Code,

import pandas as pd
from sas7bdat import SAS7BDAT
import os

path = "\\"
newpath = "\\"

files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
    for file in f:
        if 'tv.sas7bdat' in file:
            files.append(os.path.join(r,  file))

lenf = range(len(files))
for f in files:
    print(f)

for df in lenf:

    with SAS7BDAT(f) as file:
        df = file.to_data_frame()
        print(df)

    group =pd.concat([df], axis=0, sort=True, ignore_index = True)
    df.to_excel(newpath + 'dataframes_tv.xlsx',index=False)
Shahine Greene
  • 196
  • 1
  • 3
  • 15

2 Answers2

2

You may want to use the shutil module, it allows you to copy directories and the files within them using the copytree function. Example:

import shutil
shutil.copytree('/path/to/source', 'path/to/destination')
3NiGMa
  • 545
  • 1
  • 9
  • 24
1

if don't want to change your code, you can enumerate your list of files to split the process by getting the first file from your list to assign the initial dataframe as a placeholder on one side, then the remaining files in the list to append all the rest of dataframes to the initial one

EDIT a snippet of your code with enumerate and your files list

    # save the first dataframe from 1st list element
    df = SAS7BDAT(files[0]).to_data_frame()

    # enumerate the list to access greater elements
    for k, f in enumerate(files):
        # from 2nd element onward
        if k > 0:
            with SAS7BDAT(f[k]) as file:
                # append all elements to the 1st
                df = df.append(file.to_data_frame())

    group = pd.concat([df], axis=0, sort=True, ignore_index=True)
    df.to_excel('dataframes_tv.xlsx', index=False)
jslipknot
  • 435
  • 2
  • 5
  • 14
  • thank you jonathan , actually the replace function does not replace the old path @p = x.replace(newpath, path) the old path still appears – Shahine Greene Jul 09 '19 at 06:52
  • yw, @ShahineGreene. please, try now with this update and let me know if you need further iterations. any error you see in your OS, please, show it here so it can be addressed. please, note that I'm avoiding sas7bdat here for convenience since it's irrelevant for what you really need in your question. I'm generating my own Excel. adding back the model into the code is straightforward – jslipknot Jul 09 '19 at 10:21
  • I succeded to run the first part [def rename_files()] and it changes the extension from "sas7bdat" to "xlsx" in the source file permanently. – Shahine Greene Jul 09 '19 at 16:02
  • are you getting errors when copying files to your target directory? if so, let us know what it is to be able to adapt the solution to your scenario (: we can even generalise a solution from there – jslipknot Jul 09 '19 at 16:12
  • Hi jonathan I changed the code a bit and i think with the new code we are very close to a solution please take a look at the story too. – Shahine Greene Jul 09 '19 at 19:27
  • cool! @ShahineGreene try the updated code and please, let me know if that gets you all dataframes. for me it's working. I edited what you are no longer using from the original code (: – jslipknot Jul 09 '19 at 21:08
  • I took off "[k]" : with SAS7BDAT(f) as file: otherwise an error appears OSError: [Errno 22] Invalid argument: ':' – Shahine Greene Jul 10 '19 at 07:36
  • oh! okay. glad it helped you (: if you want me to update it with this detail that worked for you, please, let me know so others can benefit – jslipknot Jul 11 '19 at 00:17