4

I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?

for EachMonth in MonthsInAnalysis:
    TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
    MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
    with open('my_csv.csv', 'a') as f:
        df.to_csv(f, header=False)

So in the above code how do I create the my_csv.csv prior to the for loop?

Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.

In terms of context I have the following csv files:

enter image description here

Each of which have the following structure:

enter image description here

The Day column reads up to 30 days for each file.

I would like to output a csv file that looks like this:

enter image description here

But obviously includes all the days for all the months.

My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.

Input as text:

    Unnamed: 0  AirTemperature  AirHumidity SoilTemperature SoilMoisture    LightIntensity  WindSpeed   Year    Month   Day Hour    Minute  Second  TimeStamp   MonthCategorical    TimeOfDay
6   6   18  84  17  41  40  4   2016    1   1   6   1   1   10106   January Day
7   7   20  88  22  92  31  0   2016    1   1   7   1   1   10107   January Day
8   8   23  1   22  59  3   0   2016    1   1   8   1   1   10108   January Day
9   9   23  3   22  72  41  4   2016    1   1   9   1   1   10109   January Day
10  10  24  63  23  83  85  0   2016    1   1   10  1   1   10110   January Day
11  11  29  73  27  50  1   4   2016    1   1   11  1   1   10111   January Day
PaulBarr
  • 919
  • 6
  • 19
  • 33

5 Answers5

5

Just open the file in write mode to create it.

with open('my_csv.csv', 'w'):
    pass

Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.

with open('my_csv.csv', 'w') as f:
    for EachMonth in MonthsInAnalysis:
        TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
        MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
        df.to_csv(f, header=False)
Stop harming Monica
  • 12,141
  • 1
  • 36
  • 56
3

Creating a blank csv file is as simple as this one

import pandas as pd

pd.DataFrame({}).to_csv("filename.csv")
Shinto Joseph
  • 2,809
  • 27
  • 25
1

I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean() and save resulting DF into CSV file:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')

and if want to ignore the year:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')

Some details:

(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))

will generate tuple of data frames from all your CSV files

pd.concat(...)

will concatenate them into resulting single DF

df.groupby(['Year','Month','Day']).mean()

will produce wanted report as a data frame, which might be saved into new CSV file:

.to_csv('my_csv.csv')
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • The csv's are in a subdirectory `MonthlyDataSplit/Day` I don't quite understand in this example how I would direct it. Would i use `glob.glob('MonthlyDataSplit/Day/*.csv')`? – PaulBarr Mar 10 '16 at 13:36
  • Thank you I think this approach is very clean and also more flexible. I appreciate your help – PaulBarr Mar 10 '16 at 13:41
  • 2
    I'm happy to help. Please next time asking 'Pandas' questions post sample input and desired output (as text) - it helps to better understand what OP wants and also helps to develop a solution. :) – MaxU - stand with Ukraine Mar 10 '16 at 13:43
0

The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:

 #Before loops
 dflist=[]

Then in each loop do something like:

 dflist.append(MeanDailyTemperaturesForCurrentMonth)

Then at the end:

 final_df = pd.concat([dflist], axis=1)

and this will join everything into one dataframe.

Look at:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

http://pandas.pydata.org/pandas-docs/stable/merging.html

Chris
  • 957
  • 5
  • 10
  • IMO doing `pd.concat()` in loop is not the best idea - you may want to collect data frames into list and concatenate them in one short, of course if they are not huge. – MaxU - stand with Ukraine Mar 10 '16 at 13:08
0

You could do this to create an empty CSV and add columns without an index column as well.

import pandas as pd
df=pd.DataFrame(columns=["Col1","Col2","Col3"]).to_csv(filename.csv,index=False)
JazzyJ
  • 331
  • 1
  • 2
  • 9