1

I am trying to learn Python, coming from a SAS background.
I have imported a SAS dataset, and one thing I noticed was that I have multiple date columns that are coming through as SAS dates (I believe). In looking around, I found a link which explained how to perform this (here):

The code is as follows:

alldata['DateFirstOnsite'] = pd.to_timedelta(alldata.DateFirstOnsite, unit='s') + pd.datetime(1960, 1, 1)

However, I'm wondering how to do this for multiple columns. If I have multiple date fields, rather than repeating this line of code multiple times, can I create a list of fields I have, and then run this code on that list of fields? How is that done?

Thanks in advance

Community
  • 1
  • 1
BPC
  • 109
  • 6

2 Answers2

1

Yes, it's possible to create a list and iterate through that list to convert the SAS date fields to pandas datetime. However, I'm not sure why you're using a to_timedelta method, unless the SAS date fields are represented by seconds after 1960/01/01. If you plan on using the to_timedelta method, then its simply a case of creating a function that takes your df and your field and passing those two into your function:

def convert_SAS_to_datetime(df, field):
    df[field] = pd.to_timedelta(df[field], unit='s') + pd.datetime(1960, 1, 1)
    return df

Now, let's suppose you have your list of fields that you know should be converted to a datetime field (along with your df):

my_list = ['field1','field2','field3','field4','field5']
my_df = pd.read_sas('mySASfile.sas7bdat')  # your SAS data that's converted to a pandas DF

You can now iterate through your list with a for loop while passing those fields and your df to the function:

for field in my_list:
    my_df = convert_SAS_to_datetime(my_df, field)

Now, the other method I would recommend is using the to_datetime method, but this assumes that you know what the SAS format of your date fields are.

e.g. 01Jan2016 # date9 format

This is when you might have to look through the documentation here to determine the directive to converting the date. In the case of a date9 format, then you can use:

df[field] = pd.to_datetime(df[date9field], format="%d%b%Y")
Scratch'N'Purr
  • 9,959
  • 2
  • 35
  • 51
0

If i read your question correctly, you want to apply your code to multiple columns? to do that simple do this:

alldata[['col1','col2','col3']] = 'your_code_here'

Exmaple:

import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
                    'C' : ['Pharmacy of IDAHO','Access medicare arkansas','NJ Pharmacy','Idaho Rx','CA Herbals','Florida Pharma','AK RX','Ohio Drugs','PA Rx','USA Pharma'], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})

df[['E', 'D']] = 1 # <---- notice double brackets
print(df)

     A    B                         C  D  E
0  NaN  1.0         Pharmacy of IDAHO  1  1
1  NaN  0.0  Access medicare arkansas  1  1
2  3.0  3.0               NJ Pharmacy  1  1
3  4.0  5.0                  Idaho Rx  1  1
4  5.0  0.0                CA Herbals  1  1
5  5.0  0.0            Florida Pharma  1  1
6  3.0  NaN                     AK RX  1  1
7  1.0  9.0                Ohio Drugs  1  1
8  5.0  0.0                     PA Rx  1  1
9  NaN  0.0                USA Pharma  1  1

Notice the double brackets in the beginning. Hope this helps!

MattR
  • 4,887
  • 9
  • 40
  • 67
  • Thanks MattR. Not quite what I was looking for, but the response below worked great. Thanks for taking the time to help out. – BPC Mar 09 '17 at 21:49