0

I'm looking for a solution to merge multiples JSONL files from one folder using a Python script. Something like the script below that works for JSON files.

import json
import glob

result = []
for f in glob.glob("*.json"):
    with jsonlines.open(f) as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

Please find below a sample of my JSONL file(only one line) :

{"date":"2021-01-02T08:40:11.378000000Z","partitionId":"0","sequenceNumber":"4636458","offset":"1327163410568","iotHubDate":"2021-01-02T08:40:11.258000000Z","iotDeviceId":"text","iotMsg":{"header":{"deviceTokenJwt":"text","msgType":"text","msgOffset":3848,"msgKey":"text","msgCreation":"2021-01-02T09:40:03.961+01:00","appName":"text","appVersion":"text","customerType":"text","customerGroup":"Customer"},"msgData":{"serialNumber":"text","machineComponentTypeId":"text","applicationVersion":"3.1.4","bootloaderVersion":"text","firstConnectionDate":"2018-02-20T10:34:47+01:00","lastConnectionDate":"2020-12-31T12:05:04.113+01:00","counters":[{"type":"DurationCounter","id":"text","value":"text"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":2423},{"type":"IntegerCounter","id":"text","value":9914},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":976},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"IntegerCounter","id":"text","value":28},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"DurationCounter","id":"text","value":"PT0S"},{"type":"DurationCounter","id":"text","value":"text"},{"type":"IntegerCounter","id":"text","value":1}],"defects":[{"description":"ProtocolDb.ProtocolIdNotFound","defectLevelId":"Warning","occurrence":3},{"description":"BridgeBus.CrcError","defectLevelId":"Warning","occurrence":1},{"description":"BridgeBus.Disconnected","defectLevelId":"Warning","occurrence":6}],"maintenanceEvents":[{"interventionId":"Other","comment":"text","appearance_display":0,"intervention_date":"2018-11-29T09:52:16.726+01:00","intervention_counterValue":"text","intervention_workerName":"text"},{"interventionId":"Other","comment":"text","appearance_display":0,"intervention_date":"2019-06-04T15:30:15.954+02:00","intervention_counterValue":"text","intervention_workerName":"text"}]}}}

Does anyone know how can I handle loading this?

jps
  • 20,041
  • 15
  • 75
  • 79
MFatn
  • 39
  • 2
  • 11

2 Answers2

2

Since each line in a JSONL file is a complete JSON object, you don't actually need to parse the JSONL files at all in order to merge them into another JSONL file. Instead, merge them by simply concatenating them. However, the caveat here is that the JSONL format does not mandate a newline character at the end of file. You would therefore have to read each line into a buffer to test if a JSONL file ends without a newline character, in which case you would have to explicitly output a newline character in order to separate the first record of the next file:

with open("merged_file.json", "w") as outfile:
    for filename in glob.glob("*.json"):
        with open(filename) as infile:
            for line in infile:
                outfile.write(line)
            if not line.endswith('\n'):
                outfile.write('\n')
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Thanks @blhsing for your answer, but unfortunately I don't get the correct data(missing rows)! – MFatn Oct 01 '21 at 15:03
  • Please update the question with a sample of your input files then. – blhsing Oct 02 '21 at 23:03
  • I like this answer but if any of the jsonl files lack a newline at the end, you'll get missing rows. You likely need to process line by line to see if there is a final newline there. – tdelaney Oct 02 '21 at 23:12
  • @tdelaney Ahh thanks. I did not realize that JSONL allows a file to end without a newline character. Updated the answer accordingly then. – blhsing Oct 04 '21 at 04:29
  • @Arvind Unless I misunderstood the OP's question I think the OP means to merge the JSONL files into another JSONL file, rather than into a JSON file. – blhsing Oct 04 '21 at 04:37
  • @blhsing,@tdelaney please find above my question update with one line sample of my JSONL file – MFatn Oct 05 '21 at 13:32
1

You can update a main dict with every json object you load. Like

import json
import glob

result = {}
for f in glob.glob("*.json"):
    with jsonlines.open(f) as infile:
        result.update(json.load(infile)) #merge the dicts

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

But this will overwite similar keys.!

Kris
  • 8,680
  • 4
  • 39
  • 67