-2

I need to move files from one folder to another based on a given condition. The condition is that if a certain file has more than or equal to 100 rows (for example), it will be moved to another folder.

I've tried different versions of shutil however, I have been unsuccessful with moving any files at all.

src = 'path1'
dest = 'path2'

for files in src.glob('*.csv'):
   df = pd.read_csv(files)
   if len(df) >= 100:
      shutil.move(src, os.path.join(dest, files))

or

src = 'path1'
dest = 'path2'

for files in src.glob('*.csv'):
   if len(files) >= 100:
      shutil.move(src, os.path.join(dest, files))

I would like to know why it's not working at all. It seems pretty much straightforward with the shutil command.

Updated the code above from the comments. All still don't work at all.

** Funny I get down points for an honest question. I'm new to python and I'm struggling to code. This isn't a nice community as far as my experience.

solo1111
  • 11
  • 2
  • 2
    There are no files like `'\*.csv'`, so your loop executes correctly for each of the zero matched files. Try `'*.csv'` instead. – Amadan Jul 04 '23 at 02:24
  • if `df = pd.read_csv` then `df` is the read function itself, not a dataframe you'd get by calling it. And `len(df)` would crash. [ask] and [mre] – Julien Jul 04 '23 at 02:27
  • for files in output_dir.glob('\*.csv'): df = pd.read_csv # Here df is a function, not a file -> pd.read_csv(filepath) – Xiaomin Wu Jul 04 '23 at 02:35
  • Thanks for pointing all these out. I've adjusted the code. Unfortunately, it still wouldn't work. – solo1111 Jul 04 '23 at 04:53
  • `len(files)` would be the length of the file name. And you probably don't want to move `src`, the entire directory. – Amadan Jul 04 '23 at 04:58
  • @Amadan I still don't understand why it doesn't work when I use `len(df)`. And isn't it, since I'm calling a dataframe, it should read `len(df)` as the number of rows? – solo1111 Jul 04 '23 at 04:59
  • @XiaominWu I have changed it to `pd.read_csv(files)` but it still wouldn't work – solo1111 Jul 04 '23 at 05:02
  • Why don't you check if the control is entering the `if block` using a simple print statement? Try to print the `os.path.join(dest, files)` part too. Edit: what is output_dir.glob(...)? – Sai Suman Chitturi Jul 04 '23 at 05:25
  • @ChitturiSaiSuman I think the problem is with the if statement. The `os.path.join(dest, files)` is okay. I'm not so sure how to frame the `if condition` so it counts the number of rows of each file and check if it is greater than or equal to 100. – solo1111 Jul 04 '23 at 05:39
  • @solo1111, Can you try using `df.shape[0]` instead of `len(df)`? I found this here - https://datascienceparichay.com/article/get-the-number-of-rows-in-a-pandas-dataframe/ – Sai Suman Chitturi Jul 05 '23 at 05:21

3 Answers3

2

pd.read_csv is a function, df = pd.read_csv just assigns the function to df, not the csv file content. You should call the function and pass the file name as parameter to read the csv file content. So you should change df = pd.read_csv to df = pd.read_csv(files).

OneMadGypsy
  • 4,640
  • 3
  • 10
  • 26
zikcheng
  • 21
  • 3
1
from pathlib import Path
import pandas as pd
import shutil

src = Path('./')
dest = Path('./ttt/')

for files in src.glob('*.csv'):
    print(files)
    df = pd.read_csv(files)
    if len(df) >= 100:
        shutil.move(files, dest / files.name)  # the src of move should be files, not your src
Xiaomin Wu
  • 400
  • 1
  • 5
0

You should use df.shape[0] instead of len(df) to check the number of rows (excluding header).

len(df) must be avoided, since it also includes the header in the count.

Edit: files must be appended to src, not only to dest.

So, the move statement changes to shutil.move(os.path.join(src, files), os.path.join(dest, files))

  • Thank you so much! This explains why it is not moving any files to the destination folder. I'm really learning from you and @XiaominWu – solo1111 Jul 06 '23 at 00:36