0

I've a file with many dictionaries in the following format:

"string":integer

For example:

{"1":1, "2":4, "3":24}{"1":1, "2":6, "3":50}

This file is saved as a .json. Since it is not in proper json format, when I try to read the file:

with(open('filename.json', 'r') as f:
    data = json.load(f)

I get

ValueError: Extra data: line 1 column 101 - line 1 column 748538 (char 100 - 748537)

I saw this answer, but it seems to pertain only to strings. How can I load my file, and subsequently count the number of dictionaries it contains?

EDIT:

In response to a comment, here is some more of the file:

{"1": 1, "2": 4, "3": 24, "4": 57, "5": 184, "6": 166, "7": 115, "8": 33, "9": 13, "10": 3, "11": 1}{"1": 1, "2": 2, "3": 8, "4": 47, "5": 129, "6": 208, "7": 127, "8": 48, "9": 28, "10": 7, "11": 1}{"1": 1, "2": 2, "3": 11, "4": 56, "5": 146, "6": 204, "7": 139, "8": 33, "9": 9, "10": 2, "11": 3}{"1": 1, "2": 1, "3": 0, "4": 6, "5": 16, "6": 69, "7": 196, "8": 153, "9": 107, "10": 36, "11": 16, "12": 3}{"1": 1, "2": 5, "3": 40, "4": 128, "5": 200, "6": 151, "7": 59, "8": 10, "9": 3}{"1": 1, "2": 5, "3": 25, "4": 77, "5": 178, "6": 147, "7": 64, "8": 52, "9": 27, "10": 10, "11": 4}{"1": 1, "2": 1, "3": 12, "4": 37, "5": 132, "6": 210, "7": 144, "8": 50, "9": 11, "10": 5}{"1": 1, "2": 5, "3": 21, "4": 52, "5": 137, "6": 223, "7": 121, "8": 35, "9": 3, "10": 1, "11": 2}{"1": 1, "2": 3, "3": 11, "4": 35, "5": 71, "6": 168, "7": 154, "8": 85, "9": 46, "10": 20, "11": 8, "12": 6, "13": 1}{"1": 1, "2": 10, "3": 43, "4": 120, "5": 217, "6": 151, "7": 45, "8": 8, "9": 4, "10": 1}{"1": 1, "2": 3, "3": 22, "4": 78, "5": 223, "6": 182, "7": 67, "8": 19, "9": 2}{"1": 1, "2": 0, "3": 3, "4": 3, "5": 10, "6": 35, "7": 124, "8": 210, "9": 150, "10": 46, "11": 10, "12": 2, "13": 1}{"1": 1, "2": 4, "3": 22, "4": 69, "5": 206, "6": 206, "7": 69, "8": 15, "9": 4, "10": 1}

Yes, it is NOT valid json. But it is saved as a .json file. I want to read in the dictionaries and count them.

Community
  • 1
  • 1
StatsSorceress
  • 3,019
  • 7
  • 41
  • 82
  • I don't think that is valid json. – roymustang86 Dec 13 '16 at 14:31
  • 1
    Can try to just read it all in and do a `.split()` on `}{` or something – reptilicus Dec 13 '16 at 14:35
  • Can you include first 5 lines of your file so people don't have to guess what might be in there? – Mohammad Yusuf Dec 13 '16 at 14:36
  • Is counting the dictionaries all you want to do? – Patrick Haugh Dec 13 '16 at 14:39
  • 2
    if your json file is the output of another selfmade program, change ist like this [{"1":1, "2":4, "3":24},{"1":1, "2":6, "3":50}]. This is valid json and json.load() does not fail. – Humbalan Dec 13 '16 at 14:44
  • Thanks @Humbalan, but it takes on the order of 5 hours to run the program that results in these dictionaries, and I don't want to have to re-run it! Can I not just read in the dictionaries and count them somehow? – StatsSorceress Dec 13 '16 at 14:46
  • Well, it's *invalid* JSON. If you cannot regenerate it, then you will need to do some preprocessing so that it becomes valid JSON. @niemmi 's answer seems pretty reasonable, given the constraints. – Haroldo_OK Dec 13 '16 at 14:54

4 Answers4

3

Assuming that your data doesn't contain JSON string with }{ in it you could turn them to arrays and then do parsing:

>>> import json
>>> s = '{"1":1, "2":4, "3":24}{"1":1, "2":6, "3":50}'
>>> res = json.loads('[' + s.replace('}{', '},{') + ']')
>>> res
[{u'1': 1, u'3': 24, u'2': 4}, {u'1': 1, u'3': 50, u'2': 6}]
niemmi
  • 17,113
  • 7
  • 35
  • 42
  • The file contains dictionaries, not strings, so I don't know how I would make this type of loading work. I would need to have the string first, and then load the json, I think? – StatsSorceress Dec 13 '16 at 14:57
  • 2
    @StatsSorceress The lines will be read as strings. And once you reformat them into proper json format, json.loads will successfully parse those lines. – Mohammad Yusuf Dec 13 '16 at 14:58
3

Why using json if you don't have json and just want to know how many dicts you have? Try this

with open( 'filename.json', 'r' ) as f :
    data = f.read()

count = len( data.split('}{') )
Humbalan
  • 677
  • 3
  • 14
1

you have to define the encoding

import json
with(open('filename.json', 'r')) as f:
    data = f.read().decode("UTF-8")
print data.count('}')
khelili miliana
  • 3,730
  • 2
  • 15
  • 28
  • 1
    This is a great start, in that it reads the file, but it parses it as a string and then I can't count the number of dictionaries. – StatsSorceress Dec 13 '16 at 14:51
1

You can do it like this also:

with open('filename', 'r') as f:
    a = f.read()
print a.count('}{')+1

Sometimes Bash tools are very powerful. Do consider using them. You can also count like this:

import subprocess

print int(subprocess.check_output("grep -o '}{' /home/yusuf/Desktop/c21 | wc -l", shell=True))+1
Mohammad Yusuf
  • 16,554
  • 10
  • 50
  • 78