Create empty csv file with pandas

Question

I am interacting through a number of csv files and want to append the mean temperatures to a blank csv file. How do you create an empty csv file with pandas?

for EachMonth in MonthsInAnalysis:
    TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
    MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
    with open('my_csv.csv', 'a') as f:
        df.to_csv(f, header=False)

So in the above code how do I create the my_csv.csv prior to the for loop?

Just a note I know you can create a data frame then save the data frame to csv but I am interested in whether you can skip this step.

In terms of context I have the following csv files:

Each of which have the following structure:

The Day column reads up to 30 days for each file.

I would like to output a csv file that looks like this:

But obviously includes all the days for all the months.

My issue is that I don't know which months are included in each analysis hence I wanted to use a for loop that used a list that has that information in it to access the relevant csvs, calculate the mean temperature then save it all into one csv.

Input as text:

    Unnamed: 0  AirTemperature  AirHumidity SoilTemperature SoilMoisture    LightIntensity  WindSpeed   Year    Month   Day Hour    Minute  Second  TimeStamp   MonthCategorical    TimeOfDay
6   6   18  84  17  41  40  4   2016    1   1   6   1   1   10106   January Day
7   7   20  88  22  92  31  0   2016    1   1   7   1   1   10107   January Day
8   8   23  1   22  59  3   0   2016    1   1   8   1   1   10108   January Day
9   9   23  3   22  72  41  4   2016    1   1   9   1   1   10109   January Day
10  10  24  63  23  83  85  0   2016    1   1   10  1   1   10110   January Day
11  11  29  73  27  50  1   4   2016    1   1   11  1   1   10111   January Day

why do you need to create it first? surely creating from scratch at save time is equivalent to append to an already existing, empty csv? — Chris, Mar 10 '16 at 12:34
Because I don't know which csv's are present before the grouping occurs so I figure it is easier to create first and fill with whatever is present. How would you approach this? — PaulBarr, Mar 10 '16 at 12:38
So you want to overwrite 'my_csv.csv' file `len(MonthsInAnalysis)` times - is that what you want? ;-) — MaxU - stand with Ukraine, Mar 10 '16 at 12:54
Well not overwrite, the `for` loop will run `len(MonthsInAnalysis)` times and each time I get a new groupby object I want to append it to the csv. I thought thats what the `with open` part achieved. — PaulBarr, Mar 10 '16 at 12:56
@PaulBarr, I guess it would be easier to help you if you would explain bit more - what is your source data and what do you want to achieve (i.e. how the output should look like). There might be another more elegant solution where you won't need to make any loops... — MaxU - stand with Ukraine, Mar 10 '16 at 13:02
Please post sample input data (5 rows would be enough) and expected output for that input data — MaxU - stand with Ukraine, Mar 10 '16 at 13:04
Could you post 5 rows, showed in your input [sample](http://i.stack.imgur.com/tHsy2.png) _as_text_ so we could use it, please? — MaxU - stand with Ukraine, Mar 10 '16 at 13:14
@PaulBarr, do you want to ignore `year` when you grouping your data ? — MaxU - stand with Ukraine, Mar 10 '16 at 13:29
I don't think any of the data will span more than a year so should be fine to ignore. Thank you. — PaulBarr, Mar 10 '16 at 13:33

Stop harming Monica · Answer 1 · 2016-03-10T13:17:22.880

5

Just open the file in write mode to create it.

with open('my_csv.csv', 'w'):
    pass

Anyway I do not think you should be opening and closing the file so many times. You'd better open the file once, write several times.

with open('my_csv.csv', 'w') as f:
    for EachMonth in MonthsInAnalysis:
        TheCurrentMonth = pd.read_csv('MonthlyDataSplit/Day/Day%s.csv' % EachMonth)
        MeanDailyTemperaturesForCurrentMonth = TheCurrentMonth.groupby('Day')['AirTemperature'].mean().reset_index(name='MeanDailyAirTemperature')
        df.to_csv(f, header=False)

edited Mar 10 '16 at 13:17

answered Mar 10 '16 at 13:10

Stop harming Monica

12,141
1
36
56

Thank you, this makes a lot more sense that what I was doing. I will accept in a few minutes. – PaulBarr Mar 10 '16 at 13:21
this will overwrite CSV file `len(MonthsInAnalysis)` times – MaxU - stand with Ukraine Mar 10 '16 at 13:37
@MaxU no it won't. – Stop harming Monica Mar 10 '16 at 13:56
@Goyo, OK run the following test: `[pd.DataFrame(np.random.randn(4, 4)).to_csv('out.csv') for i in range(5)]` and tell us how many rows do you have in the `out.csv` at the end! Following your logic there must be 5*4 = 20 rows in the CSV file. Please test – MaxU - stand with Ukraine Mar 10 '16 at 14:11
@MaxU That has nothing to do with my suggestion. It's more like `[pd.DataFrame(np.random.randn(4, 4)).to_csv(f) for i in range(5)]` where `f`is a writeable file object, not a file name. – Stop harming Monica Mar 10 '16 at 14:23
@MaxU That's the point. If you pass a file name a new file object is created for each call to `.to_csv()` and the file gets overwrited. If you pass always the same file object each dataframe is written to it, one after another. Or just test. – Stop harming Monica Mar 10 '16 at 14:28
sorry, you are right, as the file object `f` will still be open in the loop. So ignore my comments, please – MaxU - stand with Ukraine Mar 10 '16 at 14:28

score 3 · Answer 2 · answered Apr 12 '20 at 05:08

3

Creating a blank csv file is as simple as this one

import pandas as pd

pd.DataFrame({}).to_csv("filename.csv")

answered Apr 12 '20 at 05:08

Shinto Joseph

2,809
27
25

MaxU - stand with Ukraine · Accepted Answer · 2016-03-10T13:40:04.700

I would do it this way: first read up all your CSV files (but only the columns that you really need) into one DF, then make groupby(['Year','Month','Day']).mean() and save resulting DF into CSV file:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Year','Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Year','Month','Day']).mean().to_csv('my_csv.csv')

and if want to ignore the year:

import glob
import pandas as pd

fmask = 'MonthlyDataSplit/Day/Day*.csv'
df = pd.concat((pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob(fmask)))
df.groupby(['Month','Day']).mean().to_csv('my_csv.csv')

Some details:

(pd.read_csv(f, sep=',', usecols=['Month','Day','AirTemperature']) for f in glob.glob('*.csv'))

will generate tuple of data frames from all your CSV files

pd.concat(...)

will concatenate them into resulting single DF

df.groupby(['Year','Month','Day']).mean()

will produce wanted report as a data frame, which might be saved into new CSV file:

.to_csv('my_csv.csv')

The csv's are in a subdirectory `MonthlyDataSplit/Day` I don't quite understand in this example how I would direct it. Would i use `glob.glob('MonthlyDataSplit/Day/*.csv')`? — PaulBarr, Mar 10 '16 at 13:36
Thank you I think this approach is very clean and also more flexible. I appreciate your help — PaulBarr, Mar 10 '16 at 13:41
I'm happy to help. Please next time asking 'Pandas' questions post sample input and desired output (as text) - it helps to better understand what OP wants and also helps to develop a solution. :) — MaxU - stand with Ukraine, Mar 10 '16 at 13:43

Chris · Answer 4 · 2016-03-10T13:12:46.657

0

The problem is a little unclear, but assuming you have to iterate month by month, and apply the groupby as stated just use:

 #Before loops
 dflist=[]

Then in each loop do something like:

 dflist.append(MeanDailyTemperaturesForCurrentMonth)

Then at the end:

 final_df = pd.concat([dflist], axis=1)

and this will join everything into one dataframe.

Look at:

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

http://pandas.pydata.org/pandas-docs/stable/merging.html

edited Mar 10 '16 at 13:12

answered Mar 10 '16 at 13:06

Chris

957
5
10

IMO doing `pd.concat()` in loop is not the best idea - you may want to collect data frames into list and concatenate them in one short, of course if they are not huge. – MaxU - stand with Ukraine Mar 10 '16 at 13:08

score 0 · Answer 5 · answered Dec 06 '22 at 23:05

0

You could do this to create an empty CSV and add columns without an index column as well.

import pandas as pd
df=pd.DataFrame(columns=["Col1","Col2","Col3"]).to_csv(filename.csv,index=False)

answered Dec 06 '22 at 23:05

JazzyJ

331
1
2
9

Create empty csv file with pandas

5 Answers5