0

I am using a loop to open consecutive files and then a second loop to calculate the average of y at specific row nrs (x). Why is the second loop showing the average only of the last file? I would like to append the average from each file into one new dataframe.

path = '...../'

for file in os.listdir(path):
    if file.endswith('.txt'):
       with open(os.path.join(path, file)) as f:
        df = pd.read_csv(f, sep="\t", header=0,usecols=[0,11])
        df.columns = ["x", "y"]

average_PAR=[]
list=[]

for (x, y) in df.iteritems():
   average_PAR = sum(y.iloc[49:350]) / len(y.iloc[49:350])
   list.append(average_PAR)
print(list)

Thank you!

  • 1
    Because the second loop is not nested within the first one. – BigBen Mar 25 '21 at 17:07
  • 1
    because you're first reading all your files, and saving them to the save variable, after that's executed you then apply your loop which will only be executed on `df`. you can highly simply your code with a few statements. – Umar.H Mar 25 '21 at 17:08
  • @BigBen ah true! – Martina Lazzarin Mar 25 '21 at 17:08
  • if you add some sample data with what you're trying to do as well as your expected output i'll update my answer – Umar.H Mar 25 '21 at 17:13

1 Answers1

0

You're main issue is with indentation and the fact your'e not saving df to a dictionary or list.

additionally, you're first opening the file and then passing it to pandas, there is no need for this as pandas handles I/O for you.

a simplified version of your code would be.

from pathlib import Path
import pandas as pd 


dfs = {f.stem : pd.read_csv(f, sep="\t", header=0,usecols=[0,11]) 
                 for f in Path('.../').glob('*.txt')}


for each_csv, dataframe in dfs.items():
    dataframe.iloc[35:450] # do stuff. 
Umar.H
  • 22,559
  • 7
  • 39
  • 74