1

I'm getting a dataset with UTC data, and coordinates lat,long I want to compute the solarposition for each of the row of this dateset, but I'm having trouble with manipulating the timezone.

So far, I've managed to make the UTC data, timezone aware by:

# library for timezone computations
from timezonefinder import TimezoneFinder
from pytz import timezone
import pytz

# scientific python add-ons
import numpy as np
import pandas as pd   


tf = TimezoneFinder()
litteralTimeZone = tf.timezone_at(lng=longitude, lat=latitude)
print(litteralTimeZone)
tz = pytz.timezone(litteralTimeZone)
# Adjust date Time, currently in CSV like: 20070101:0000
Data['time(LOC)'] = pd.DatetimeIndex(
    pd.to_datetime(Data['time(UTC)'], format='%Y%m%d:%H%M')
).tz_localize(tz, ambiguous=True, nonexistent='shift_forward')
Data = Data.set_index('time(LOC)')

now, when I pass the data to the get solar position function with

pvlib.solarposition.get_solarposition(
    data.index, metadata['latitude'],metadata['longitude']) 

The get_solarposition are computed on the UTC portion of the data, ignoring the localized part of it.

Any thoughts?

Mark Mikofski
  • 19,398
  • 2
  • 57
  • 90
Snick
  • 1,022
  • 12
  • 29
  • What latitude and logitude are you using? What packages are you importing? – Adam R. Jensen Apr 17 '20 at 15:26
  • I've added the imports. Latitude and Longitude are variable, and I've been running it with few example from each timezone (london: 51, 0; rome: 45,10; bucarest: 44,28, etc). – Snick Apr 18 '20 at 05:38

1 Answers1

2

Thanks for using pvlib!

I believe your issue is that you have UTC timestamps, but you are mixing them with the local timezone. UTC is a timezone. Therefore, you should first localize the naive timestamps with 'UTC'.

# make time-zone aware timestamps from string format in UTC
>>> Data['time(TZ-UTC)'] = pd.DatetimeIndex(
...     pd.to_datetime(Data['time(UTC)'], format='%Y%m%d:%H%M')).tz_localize('UTC')

Then you can use these directly in pvlib.solarposition.get_solarposition.

# mimic OP data
>>> Data = pd.DataFrame(
...     {'time(UTC)': ['20200420:2030', '20200420:2130', '20200420:2230']})
>>> Data
#        time(UTC)
# 0  20200420:2030
# 1  20200420:2130
# 2  20200420:2230

# apply the UTC timezone to the naive timestamps after parsing the string format
>>> Data['time(TZ-UTC)'] = pd.DatetimeIndex(
...     pd.to_datetime(Data['time(UTC)'], format='%Y%m%d:%H%M')).tz_localize('UTC')
>>> Data
#        time(UTC)              time(TZ-UTC)
# 0  20200420:2030 2020-04-20 20:30:00+00:00
# 1  20200420:2130 2020-04-20 21:30:00+00:00
# 2  20200420:2230 2020-04-20 22:30:00+00:00

# now call pvlib.solarposition.get_solarposition with the TZ-aware timestamps
>>> lat, lon = 39.74,-105.24
>>> solarposition.get_solarposition(Data['time(TZ-UTC)'], latitude=lat, longitude=lon)
#                            apparent_zenith     zenith  apparent_elevation  elevation     azimuth  equation_of_time
# time(TZ-UTC)
# 2020-04-20 20:30:00+00:00        34.242212  34.253671           55.757788  55.746329  221.860950          1.249402
# 2020-04-20 21:30:00+00:00        43.246151  43.261978           46.753849  46.738022  240.532481          1.257766
# 2020-04-20 22:30:00+00:00        53.872320  53.895328           36.127680  36.104672  254.103959          1.266117

You don't need to convert them to the local timezone. If desired, use pd.DatetimeIndex.tz_convert to convert them from UTC to the local (eg: Golden, CO) timezone. Note: it may be more convenient to use a fixed offset like Etc/GMT+7 because daylight savings time may cause Pandas to raise an ambiguous time error.

>>> Data['time(LOC)'] = pd.DatetimeIndex(Data['time(TZ-UTC)']).tz_convert('Etc/GMT+7')
>>> Data = Data.set_index('time(LOC)')
>>> Data
#                                time(UTC)              time(TZ-UTC)
# time(LOC)
# 2020-04-20 13:30:00-07:00  20200420:2030 2020-04-20 20:30:00+00:00
# 2020-04-20 14:30:00-07:00  20200420:2130 2020-04-20 21:30:00+00:00
# 2020-04-20 15:30:00-07:00  20200420:2230 2020-04-20 22:30:00+00:00

The solar position results should be exactly the same with either local (eg: Golden, CO) time or UTC time:

>>> solarposition.get_solarposition(Data.index, latitude=lat, longitude=lon)
#                            apparent_zenith     zenith  apparent_elevation  elevation     azimuth  equation_of_time
# time(LOC)
# 2020-04-20 13:30:00-07:00        34.242212  34.253671           55.757788  55.746329  221.860950          1.249402
# 2020-04-20 14:30:00-07:00        43.246151  43.261978           46.753849  46.738022  240.532481          1.257766
# 2020-04-20 15:30:00-07:00        53.872320  53.895328           36.127680  36.104672  254.103959          1.266117

Does this help? Happy to answer more questions! Cheers!

Mark Mikofski
  • 19,398
  • 2
  • 57
  • 90
  • 1
    Mark, thank you so much for you comment. That is exactly what I needed. And the penny dropped here: 'you have UTC timestamps, but you are mixing them with the local timezone. UTC is a timezone'. Shameless plug to another question related to PVLIB (but honestly more about data manipulation with Pandas) https://stackoverflow.com/questions/61366516/read-csv-malformed-3-csv-concatenated-in-a-single-url-call – Snick Apr 23 '20 at 11:27