0

I am trying to read some Canadian census data from Statistics Canada (the XML option for the "Canada, provinces and territories" geograpic level). I see that the xml file is in the SDMX format and that there is a structure file provided, but I cannot figure out how to read the data from the xml file.

It seems there are 2 options in Python, pandasdmx and sdmx1, both of which say they can read local files. When I try

import sdmx

datafile = '~/Documents/Python/Generic_98-401-X2016059.xml'

canada = sdmx.read_sdmx(datafile)

It appears to read the first 903 lines and then produces the following:

Traceback (most recent call last):
  File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/xml.py", line 238, in read_message
    raise NotImplementedError(element.tag, event) from None
NotImplementedError: ('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}GenericData', 'start')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/__init__.py", line 126, in read_sdmx
    return reader().read_message(obj, **kwargs)
  File "/home/username/.local/lib/python3.10/site-packages/sdmx/reader/xml.py", line 259, in read_message
    raise XMLParseError from exc
sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message}GenericData', 'start')

Is this happening because I've not loaded the structure of the sdmx file (Structure_98-401-X2016059.xml in the zip file from the StatsCan link above)? If so, how do I go about loading that and telling sdmx to use that when reading datafile?

The documentation for sdmx and pandasdmx only show examples of loading files from online providers and not from local files, so I'm stuck. I have limited familiarity with python so any help is much appreciated.

For reference, I can read the file in R using the instructions from the rsdmx github. I would like to be able to do the same/similar in Python.

Thanks in advance.

ramesesjd
  • 181
  • 1
  • 11

2 Answers2

1

As per the sdmx1 developer, StatsCan is using the older, unsupported version of the SDMX (v. 2.0). The current version is 2.1 and rsdmx1 only supports this (support is also going towards the upcoming v.3).

ramesesjd
  • 181
  • 1
  • 11
0

From a cursory inspection of the documentation, it seems that Statistics Canada is not one of the sources that is included by default. There is however an sdmx.add_source function. I suggest you try that (before loading the data).

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
  • Thank you. I am not sure this will work, as the Statistics Canada site provides zip files containing the data and structure files. – ramesesjd Jan 23 '22 at 18:55