0

I am currently developing a project which uses the Twint python library for web-scraping Twitter. The problem is, the way it saves the scraped data to is invalid in regards to JSON Formatting standards. All of the scraped tweets are saved as objects inside the JSON file, and when I try to parse them I get an error, since they aren't separated with commas and aren't in an array.

{'key1': value1, 'key2': value2,}
{'key1': value1, 'key2': value2,}
{'key1': value1, 'key2': value2,}

as opposed to:

[
  {'key1': value1, 'key2': value2,},
  {'key1': value1, 'key2': value2,},
  {'key1': value1, 'key2': value2,}
]

My question is, can i fix this by writing a script that would wrap the list of objects in an Array and separates the objects with commas?

Mvvvxie
  • 1
  • 1
  • Assuming there are no "unescaped" line breaks anywhere inside the data, replacing every `}` that is followed by a line break with `},` plus line break, and then adding [ and ] around the whole thing, would probably already do ... – CBroe May 04 '22 at 11:29

3 Answers3

1

I think this choice of formatting is on purpose so you can stream the downloaded data into your app instead of loading and parsing it all at once (I assume it could get quite large if you are scraping twitter). Based on your node.js tag i assume you want to do the JSON parsing in the backend, for that there is a variety of packages you could just use, it can also be done with Rx/observables or you could implement it yourself by basically streaming the data until a linebreak \n, then parse and continue streaming. For your own research start looking for JSON streaming on npm, github, the web.

exside
  • 3,736
  • 1
  • 12
  • 19
0

Replace } with },? Or is that to simple as a solution?

mafloh
  • 24
  • 2
0

This should give you an array of parsed javascript objects

const results = data.split("\n").map(line => JSON.parse(line))
Dainius Lukša
  • 966
  • 1
  • 11
  • 22
  • This seemed it would work but threw me the error: undefined:1 SyntaxError: Unexpected end of JSON input at JSON.parse () at file:///C:/dir/project/test.mjs:26:61 at Array.map () at file:///C:/dir/project//test.mjs:26:44 at FSReqCallback.readFileAfterClose [as oncomplete] (node:internal/fs/read_file_context:68:3) – Mvvvxie May 04 '22 at 13:25