0

I have a column in my Pandas dataframe called "date" that contains unix timestamps (int64). I am trying to iterate over the entire frame and extract month and year from the timestamps and add them to my dataframe. Once I have the month and year, I want to be able to create masks so that I can save to CSV new dataframes based on months and years Here is the code that I have written:

# import useful libraries
from datetime import datetime
import pandas as pd

# read csv as dataframe
df=pd.read_csv('./ct.csv')

# function to get year
def get_year(x):
    return datetime.fromtimestamp(x).strftime("%Y")

# function to get month
def get_month(x):
    return datetime.fromtimestamp(x).strftime("%m")

# add month and year to new dataframe columns
df['year'] = df['date'].apply(get_year)
df['month'] = df['date'].apply(get_month)

# set the beginning and end date for mask
beginning = datetime(2002, 1, 1)
end = datetime(2003, 1, 1)

# get datetime from timestamp
def to_datetime(x):
    print(x)
    return datetime.fromtimestamp(x)

# create datetime series
df['datetime'] = df['date'].apply(to_datetime)

# create dataframe mask
msk = (df['datetime'] > beginning) & (df['datetime'] < end)

# apply mask
df_range = df[msk]

# write dataframe to csv
df_range.to_csv('ct_2002.csv', index=False)

I am getting the following error when trying to run this:

    runfile('C:/Users/x/Desktop/Wine/daterange.py', wdir='C:/Users/x/Desktop/Wine')
Traceback (most recent call last):

  File "C:\Users\x\Desktop\Wine\daterange.py", line 17, in <module>
    df['year'] = df['date'].apply(get_year)

  File "C:\Users\x\Anaconda3\lib\site-packages\pandas\core\series.py", line 3848, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)

  File "pandas\_libs\lib.pyx", line 2329, in pandas._libs.lib.map_infer

  File "C:\Users\x\Desktop\Wine\daterange.py", line 10, in get_year
    return datetime.fromtimestamp(x).strftime("%Y")

OSError: [Errno 22] Invalid argument

Any help would be much appreciated.

youngguv
  • 103
  • 1
  • 7
  • Have you seen [this issue](https://stackoverflow.com/questions/37494983/python-fromtimestamp-oserror/49773492)? Seems like the script is getting hung up when it calls `get_year` on one of your timestamps. Maybe try running it on a single timestamp pulled from your data, else check if that column has an improperly formatted stamp – Tom Jun 09 '20 at 23:35
  • How does your input data look like? I general, I'd suggest to have a look at datetime methods built into pandas – FObersteiner Jun 10 '20 at 05:28

1 Answers1

0

I am not sure what your dataframe looks like, but here is how I would proceed. As MrFuppes pointed out, Pandas has built-in datetime functionality.

import pandas as pd

df = pd.read_csv('./ct.csv')

#Convert unix timestamps to datetime.
df['datetime'] = pd.to_datetime(df['date'],unit='s')

#Set the 'datetime' field to be the index.
df.set_index('datetime',inplace=True)

#Extract the month from the index.
df['month'] = df.index.month

#Extract the year from the index.
df['year'] = df.index.year

#Apply a temporal slice.
df = df['2002-1-1':'2003-1-1']

df.to_csv('ct_2002.csv')

Let me know if this doesn't solve your problem.

etotheipi
  • 771
  • 9
  • 12