1

I have a notepad file which I save as a json file and I'm trying to read it in pandas dataframe.

My json file looks like this:

{
  "date" : "2000-01-01",
  "i" : "1387",
  "xxx" : "aaaa",
}, 
{
  "fecha" : "2000-01-02",
  "indicativo" : "1387",
  "xxx" : "aaaa",
}, 
{
  "data" : "2000-01-03",
  "indicativo" : "1387",
}, 
{
  "date" : "2000-01-04",
  "i" : "1387",
  "xxx" : "aaaa",
}, 
{
  "fecha" : "2000-01-05",
  "indicativo" : "1387",
  "xxx" : "aaaa",
}

How can I change this to the correct json format using code? (Keep in mind that I just posted some lines, the actual json file is hundreds of hundreds of lines so it is impractical for me to do it manually)

And then once I have that file the code would be:

import pandas as pd
from pandas.io.json import json_normalize
name = pd.read_json(r"file.json", lines=True, orient='records')

I tried running the above code with the json file but kept getting :

ValueError: Expected object or value.

After much trial and error I believe it is due to the fact the it is not in correct json format so I would appreciate if someone helps me with at least the first part.

vvvvv
  • 25,404
  • 19
  • 49
  • 81
mariancatholic
  • 81
  • 1
  • 1
  • 6

3 Answers3

1
  • The question addresses How can I change this to the correct json format using code?
  • Given what is shown in the file as rows of comma and \n separated dictionaries.
  • Read and fix the file by adding [ to the beginning of a file and ] to the end of the file.
    • Once the file is fixed, it doesn't need to be fixed again.
  • Read the file back in with pandas.read_json
    • The list of dictionaries can be loaded into pandas, but there are different keys in each dict, so some additional cleaning may be necessary.
import json
import pandas as pd
from pathlib import Path

# path to file
p = Path('e:/PythonProjects/stack_overflow/test.json')

# read and fix the file
with p.open('r+') as f:
    file = f.read()  # reads the file in as a long string
    file = '[' + file + ']'  # add characters to beginning and end of string
    f.seek(0)  # find the beginning of the file
    f.write(file)  # write the new data back to the file
    f.truncate()  # remove the old data

# after fixing the file with code 
df = pd.read_json(p)

# display(df)
         date     i   xxx       fecha indicativo        data
0  2000-01-01  1387  aaaa         NaN        NaN         NaN
1         NaN   NaN  aaaa  2000-01-02       1387         NaN
2         NaN   NaN   NaN         NaN       1387  2000-01-03
3  2000-01-04  1387  aaaa         NaN        NaN         NaN
4         NaN   NaN  aaaa  2000-01-05       1387         NaN
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
0

I think your json file should have [] in the beginning and end.

Farid Jafri
  • 346
  • 1
  • 9
  • Thanks I added that but I get this error: JSONDecodeError: Expecting property name enclosed in double quotes: line 5 column 1 (char 62) – mariancatholic Jul 24 '20 at 22:37
  • @mariancatholic I copied the same json and put `[]` around it in my file `abcd.json` and read the file like `name = pd.read_json("abcd.json")`. It worked and the `name` object had the dataframe Also putting the json directly into code like so `pd.DataFrame(json.loads('[{ "date" : "2000-01-01", "i" : "1387", "xxx" : "aaaa"},{ "date" : "20200-01-01", "i" : "1387", "xxx" : "aa2aa"}]'))` worked for me. – Farid Jafri Jul 25 '20 at 00:09
0

I think your data file is a list of dictionaries, but with opening and closing square brackets missing. (The file is not JSON, as there are dictionaries (values), but no keys).

The response above shows how to add the '[' and ']'.

After you do this, you can call the DataFrame constructor directly:

data = [
    {
      "date" : "2000-01-01",
      "i" : "1387",
      "xxx" : "aaaa",
    }, 
    {
      "fecha" : "2000-01-02",
      "indicativo" : "1387",
      "xxx" : "aaaa",
    }, 
    # remaining dictionaries, omitted, to save space
]

pd.DataFrame(data)
jsmart
  • 2,921
  • 1
  • 6
  • 13