0

I am trying to read a JSON file (BioRelEx dataset: https://github.com/YerevaNN/BioRelEx/releases/tag/1.0alpha7) in Python. The JSON file is a list of objects, one per sentence. This is how I try to do it:

 def _read(self, file_path):
        with open(cached_path(file_path), "r") as data_file:
            for line in data_file.readlines():
                if not line:
                    continue
                 items = json.loads(lines)
                 text = items["text"]
                 label = items.get("label")

My code is failing on items = json.loads(line). It looks like the data is not formatted as the code expects it to be, but how can I change it?

Thanks in advance for your time!

Best,

Julia

2 Answers2

1

With json.load() you don't need to read each line, you can do either of these:

import json

def open_json(path):
    with open(path, 'r') as file:
        return json.load(file)

data = open_json('./1.0alpha7.dev.json')

Or, even cooler, you can GET request the json from GitHub

import json
import requests

url = 'https://github.com/YerevaNN/BioRelEx/releases/download/1.0alpha7/1.0alpha7.dev.json'
response = requests.get(url)
data = response.json()

These will both give the same output. data variable will be a list of dictionaries that you can iterate over in a for loop and do your further processing.

bherbruck
  • 2,167
  • 1
  • 6
  • 17
0

Your code is reading one line at a time and parsing each line individually as JSON. Unless the creator of the file created the file in this format (which given it has a .json extension is unlikely) then that won't work, as JSON does not use line breaks to indicate end of an object.

Load the whole file content as JSON instead, then process the resulting items in the array.

def _read(self, file_path):
    with open(cached_path(file_path), "r") as data_file:
        data = json.load(data_file)
    for item in data:
        text = item["text"]

label appears to be buried in item["interaction"]

Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171