1

I was working across a big dataset of approximately a million items and took a dump from a database using streams. However, I mistakenly missed the opening and closing box brackets to indicate a json array and just pushed the objects without it.

Now the problem is, I want to fix the JSON file for it to be processable by a software, however due to the file size (20.01 GB), I am getting buffer and memory issues (expected due to size). Is there a way to fix this file?

PS: I don't want to run such a big and expensive query again on database for a fresh dump.

Syntax of data in current file:

{ "name": "aaron", age: 21 },{ "name": "jen", age: 26 }

Expected Syntax of data in file:

[{ "name": "aaron", age: 21 },{ "name": "jen", age: 26 }]
MehulB
  • 13
  • 3

1 Answers1

0

If you just need to slap brackets on the front and end, use cat:

echo '[' > fixed.json
cat broken.json >> fixed.json
echo ']' >> fixed.json

You could do the same thing with Node, obviously, by reading in the file stream, and prefixing and postfixing the output accordingly. Since it sounds like this is a one-off mistake, a quick fix is likely the more appropriate approach.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • hi @tadman, thanks for the awesome suggestion. I tried this. However, it too failed with buffer error as file is too big.Any other suggestions? – MehulB Mar 24 '23 at 19:32
  • If you got a specific failure, can you describe a specific error message? Is this during the read? – tadman Mar 24 '23 at 19:33
  • Unless your machine has enough memory to read in the file and convert it into JavaScript data structures this is going to be pretty punishing to process. If you're strapped for memory, consider using a more minimal representation, like tuples (`[["aaron",21],["jen",26],...]`) instead of this more formal structure. – tadman Mar 24 '23 at 19:34
  • There is no error message, however after execution of the command `cat broken.json >> fixed.json`, the terminal went to new line and the new file is only 1KB – MehulB Mar 24 '23 at 19:34
  • Are you possibly out of disk space? – tadman Mar 24 '23 at 19:35
  • I have around 64 gigs of RAM and 20 core CPU, disk is 8TB – MehulB Mar 24 '23 at 19:35
  • Then you should be fine. A 20GB file is not going to be a problem if you have sufficient free disk space, but keep in mind you may be on a partition that's close to full. – tadman Mar 24 '23 at 19:36
  • its a new system. Plenty of partition space. – MehulB Mar 24 '23 at 19:37
  • Try on a test file to see if the procedure works, then try on your larger file. I've never seen `cat` fail but for lack of disk space. It's an old, very reliable tool. – tadman Mar 24 '23 at 19:37
  • 1
    Thanks a lot, the second run worked. File is fixed. – MehulB Mar 24 '23 at 19:45