Read, Transform, and Write Column Contents of a CSV file

Question

I'm trying to change a column of dates in my .csv file from month/day/year, to timestamp format.

I would like to change this...

VIX Close,Date,VIX High,VIX Low,VIX Open  
18.22,1/2/2004,18.68,17.54,17.96  
17.49,1/5/2004,18.49,17.44,18.45  
16.73,1/6/2004,17.67,16.19,17.66  
15.5,1/7/2004,16.75,15.5,16.72

to this...

VIX Close,Date,VIX High,VIX Low,VIX Open  
18.22,1073023200.0,18.68,17.54,17.96  
17.49,1073282400.0,18.49,17.44,18.45  
16.73,1073368800.0,17.67,16.19,17.66  
15.5,1073455200.0,16.75,15.5,16.72

I've tried many things, and I can get it to print out my desired reformatted date but below is one of the many ways I've tried to write it to the csv column.
What's the most pythonic way to get it to read, and immediately write the desired conversion in place without too many steps, which is where things lead for my when I'm trying to make this happen.

with open('testvix.csv', 'wb') as myFile:
    reader = csv.writer(myFile, delimiter=',')
    reader.writerow([(time.mktime(time.strptime(row[1], "%m/%d/%Y")))])

returns

Traceback (most recent call last):
File "<stdin>", line 3, in <module>
TypeError: a bytes-like object is required, not 'str'

I'm sure I'm missing some kind of iterating through the csv somehow. I haven't been able to source this solution to my problem online yet

Just out of curiosity, why do you call the writer `reader`, as if you have no other things to worry about? — DYZ, Mar 03 '18 at 21:56
Possible duplicate of [python 3.5: TypeError: a bytes-like object is required, not 'str' when writing to a file](https://stackoverflow.com/questions/33054527/python-3-5-typeerror-a-bytes-like-object-is-required-not-str-when-writing-t) — DYZ, Mar 03 '18 at 22:02
"why do you call the writer reader" just from pasting my attempt at the solution from different places online. — Ant, Mar 03 '18 at 22:06
DyZ, my problem is more than 'bytes-like' object problem. I would not have found my solution from the question or answer in the post. My problem was more to do with reading and writing a column in a csv. — Ant, Mar 03 '18 at 22:28
I appreciate the scruteny. I want Stack Overflow to be the best it can be, but I needed a bit more help on this problem than what was in the post you provided. — Ant, Mar 03 '18 at 22:32

Evya · Accepted Answer · 2018-03-03T22:25:42.450

2

Using pandas

import pandas as pd
import time

df = pd.read_csv('file.csv', sep=',')
df['Date'] = df['Date'].apply(lambda x: time.mktime(time.strptime(x, "%m/%d/%Y")))

# Save to a new file or overwrite your file
df.to_csv('file2.csv', sep=',')

,VIX Close,Date,VIX High,VIX Low,VIX Open
0,18.22,1075586400.0,18.68,17.54,17.96
1,17.49,1083358800.0,18.49,17.44,18.45
2,16.73,1086037200.0,17.67,16.19,17.66
3,15.5,1088629200.0,16.75,15.5,16.72

edited Mar 03 '18 at 22:25

answered Mar 03 '18 at 22:02

Evya

2,325
3
11
22

Thank you. Though I can do it without importing datetime, without 'date_format = "%d/%m/%Y"' and shortening the longest line to df['Date'] = df['Date'].apply(lambda x: time.mktime(time.strptime(x, "%m/%d/%Y"). Do you want to change your answer and I'll give it the green check? – Ant Mar 03 '18 at 22:22
What does 'sep=','' and 'lambda x' do? – Ant Mar 03 '18 at 22:23
1

@Renoldus I've edited my answer. `sep` determines the csv column separator and lambda is an anonymous function which is applied for each `['Date']` value in each row. It could have been a regular function receives `x` as a date string and returns `time.mktime(time.strptime(x, "%m/%d/%Y")))` just as well. – Evya Mar 03 '18 at 22:29
Is this 'sep' thing the same thing as the 'delimiter' from the csv module? I think I was close to figuring this out myself with 'delimiter', though I still don't understand why it's needed. I wish I knew. Sorry to nag about it. Why is sep=',' needed when reading and saving? – Ant Mar 03 '18 at 22:39
1

CSV files can use different separators (or delimiters). commas, semicolons, pipes, tabs etc. This just specifies how pandas (or the csv module) should read/write the file. – Evya Mar 03 '18 at 22:41
I've noticed your code put a lone comma in the header at the front, and numbered all of the rows. Is that something you do standard on csv files? Is there an option to stop this from adding on? Also. I noticed if the date is in 12/1/93 (example) format it won't change it unless it's the long line of code you had up before... df['Date'] = df['Date'].apply(lambda x: time.mktime(datetime.datetime.strptime(x, "%d/%m/%Y").timetuple())) – Ant Mar 04 '18 at 03:35
1

You can pass `index=False` to `df.to_csv` to remove column indexing. If there are inconsistencies in your date format your should parse each date using `datetime` like before. – Evya Mar 04 '18 at 06:10

Read, Transform, and Write Column Contents of a CSV file

1 Answers1