1

Im trying to use the method pyarrow.compute.assume_timezone but i get the error:

pyarrow.lib.ArrowInvalid: Cannot locate timezone 'UTC': Unable to get Timezone database version from C:\Users\Nick\Downloads\tzdata\

I tried download the db from https://www.iana.org/time-zones without success

Anyone got it work ?

import pyarrow
import pyarrow.compute as pc

import numpy
dt = pyarrow.array([numpy.datetime64("2022-10-10T12:00:12.123456789")], pyarrow.timestamp("ns"))
print(pc.assume_timezone(dt, "UTC"))
Devyl
  • 565
  • 3
  • 8
  • Could you try pc.assume_timezone(dt, "Etc/UTC")? My assumption here is only names in the "TZ database name" column in the table here https://en.wikipedia.org/wiki/List_of_tz_database_time_zones will work. – Rok Oct 31 '22 at 18:17
  • It's working for me with python 3.9, numpy==1.23.4 and pyarrow==10.0.0 – 0x26res Oct 31 '22 at 18:37
  • @Rok - `"UTC"` is a valid alias for `"Etc/UTC"` – Matt Johnson-Pint Oct 31 '22 at 23:24
  • 1
    @Devyl - Does [this](https://arrow.apache.org/docs/developers/cpp/windows.html#downloading-the-timezone-database) or [this](https://arrow.apache.org/docs/cpp/build_system.html#download-timezone-database) help? – Matt Johnson-Pint Oct 31 '22 at 23:34
  • 1
    You're right @MattJohnson-Pint. I forgot that tzdb is not set up for pyarrow on windows yet. Here are relevant tickets [ARROW-16054](https://issues.apache.org/jira/browse/ARROW-16054), [ARROW-13168](https://issues.apache.org/jira/browse/ARROW-13168) . – Rok Nov 02 '22 at 00:05

1 Answers1

1

Indeed there is doc how to install in Arrow, thanks @Matt Johnson-Pint

I made a script to install it if anyone wants

def download_tzdata_windows(
    base_dir=None,
    year=2022,
    name="tzdata"
):
    import os
    import tarfile
    import urllib3

    http = urllib3.PoolManager()
    folder = base_dir if base_dir else os.path.join(os.path.expanduser('~'), "Downloads")
    tz_path = os.path.join(folder, "tzdata.tar.gz")
    
    with open(tz_path, "wb") as f:
        f.write(http.request('GET', f'https://data.iana.org/time-zones/releases/tzdata{year}f.tar.gz').data)
    
    folder = os.path.join(folder, name)
    
    if not os.path.exists(folder):
        os.makedirs(folder)
    
    tarfile.open(tz_path).extractall(folder)
    
    with open(os.path.join(folder, "windowsZones.xml"), "wb") as f:
        f.write(http.request('GET', f'https://raw.githubusercontent.com/unicode-org/cldr/master/common/supplemental/windowsZones.xml').data)
download_tzdata_windows(year=2022)
Devyl
  • 565
  • 3
  • 8