pd.read_json() to read all json files in a folder

Asked Mar 14 '19 at 20:58

Active Mar 14 '19 at 21:10

Viewed 635 times

I have 100 large json files in gcs and want to load them in a panda dataframe. I've used something like below in dask:

 dd.read_json('gs://dask_poc/2018-04-18/data-*.json')

But when I used:

 pd.read_json('gs://dask_poc/2018-04-18/data-*.json')

I got the below error: ValueError: Expected object or value

Wondering if panda cant aggregate all the files together similar to dask?

edited Mar 14 '19 at 21:10

asked Mar 14 '19 at 20:58

MT467

This may sound like a silly question and I probably already know the answer, but where are you running this code? – cs95 Mar 14 '19 at 21:01
@coldspeed locally in my jupyterlab – MT467 Mar 14 '19 at 21:01
you could probably use a for loop to open each file in that folder rand execute whatever code you have for each json file – dataviews Mar 14 '19 at 21:02
Unfortunately, pandas does not have native GCP support, nor can it be expected to magically understand GCP links. – cs95 Mar 14 '19 at 21:02
Does the answer to this question help? https://stackoverflow.com/questions/46885631/loading-multiple-files-from-google-cloud-storage-into-a-single-pandas-dataframe – dim_user Mar 14 '19 at 21:04
@coldspeed I thought the same but today when I check this link, they put gcs as a path too! https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html – MT467 Mar 14 '19 at 21:04
1

Well, I'm floored. How is that possible without any authentication from your part? – cs95 Mar 14 '19 at 21:05
@saul cruz yeah, saw similar answers but my files are huge and concat may not be promising nor for loop. – MT467 Mar 14 '19 at 21:07
@coldspeed lol, not sure – MT467 Mar 14 '19 at 21:08
Learned about gcs support in pandas today, thanks guys! See [`pandas.io.gsc.py`](https://github.com/pandas-dev/pandas/blob/v0.24.2/pandas/io/gcs.py), and [`gcsfs` documentation](https://gcsfs.readthedocs.io/en/latest/) for some details. Not sure what the described error is. – FabienP Mar 14 '19 at 21:28
[This answer](https://stackoverflow.com/a/52106361/6914989) could help. Looks like glob aggregation has to be done manually for pandas. – FabienP Mar 14 '19 at 21:33

pd.read_json() to read all json files in a folder

0 Answers0