0

I have two columns of data in a pandas dataframe, UTC timestamp and 'Timezone' for user data. I am trying to create a column for the 'local' timestamp of that user. For example, I have a UTC timestamp:2019-10-27 17:18:31.034, and a corresponding Timezone as "Timezone:"Asia/Singapore". I would like to create a new column that gives that persons timestamp in the Asia Singapore timezone, so I can do analysis on what time of day the event occurred. There are lots of other Timezones such as America/New_York, America/Chicago, etc.

Is there a way to do this efficiently? I have just over 2 million rows in the dataset.

Example Data is as follows:

Event Timestamp | Timezone | Local Time  
2019-10-23 18:48:36.291 | Timezone:"America/Anchorage" | ''  
2019-10-04 07:55:34.964 | Timezone:"Asia/Jerusalem" | ''

Expected output is to generate Local Time column for the Event Timestamp in that timezone. As mentioned, Event Timestamp is in UTC.

I now have my local time as the UTC time + or - an amount of hours to convert it. How can I get the new local column to just read as a local timestamp?

Current Data:

Event Timestamp | Timezone | Local Time  
2019-10-23 18:48:36.291 | Timezone:"America/Anchorage" | '2019-10-23 18:48:36.291000-08:00'  

Expected Output:

Event Timestamp | Timezone | Local Time  
2019-10-23 18:48:36.291 | Timezone:"America/Anchorage" | '2019-10-23 10:48:36.291000'  
Roy2012
  • 11,755
  • 2
  • 22
  • 35
acroft08
  • 1
  • 1
  • Please add a sample data along with expected output – some_programmer Jun 09 '20 at 15:01
  • It would be helpful to see the data, please paste a sample of your DataFrame in your question – NYC Coder Jun 09 '20 at 15:12
  • Does this answer your question? [Efficiently convert timezones in pandas dataframe](https://stackoverflow.com/questions/47822265/efficiently-convert-timezones-in-pandas-dataframe) – Roy2012 Jun 09 '20 at 15:14
  • I posted an update. I was able to get the local time as UTC timestamp + or - hours, but not as the local timestamp. Please see above for current data and expected output. – acroft08 Jun 09 '20 at 19:02

1 Answers1

0

Here's a solution, starting with the original dataframe:

Source data:

Event Timestamp | Timezone | Local Time  
2019-10-23 18:48:36.291 | Timezone:"America/Anchorage" | ''  
2019-10-04 07:55:34.964 | Timezone:"Asia/Jerusalem" | ''

Code:

def get_localtime(row):
    ts = row["Event Timestamp"]
    tz = row["Timezone"]
    return ts.astimezone(tz)

df["Timezone"] = df.Timezone.str.replace("Timezone:\"", "").str.replace('"', "")
df["Event Timestamp"] = pd.to_datetime(df["Event Timestamp"])
df["Event Timestamp"] = pd.Series(df.set_index("Event Timestamp").index.tz_localize("utc"))    

df["Local Time"] = df.apply(get_localtime, axis = 1)

Result:

Event Timestamp                     Timezone            Local Time 
0 2019-10-23 18:48:36.291000+00:00  America/Anchorage   2019-10-23 10:48:36.291000-08:00   
1 2019-10-04 07:55:34.964000+00:00  Asia/Jerusalem      2019-10-04 10:55:34.964000+03:00  
Roy2012
  • 11,755
  • 2
  • 22
  • 35