Questions tagged [jsonlines]

JSON Lines is a format for storing structured data that may be processed one record at a time. It is a convenient format for storing structured data that may be processed one record at a time. It works well with Unix-style text processing tools and shell pipelines.

This text format is documented at http://jsonlines.org/.

156 questions
0
votes
1 answer

Determine from an input string if in json_newline format

I have a string of json data, though sometimes it is 'regular' json and sometimes the data is in json-lines format. Here is the current way I'm testing to see which format it is: json_newlines = all([_line.strip()[-1].endswith((']', '}')) for _line…
user10503628
0
votes
1 answer

sagemaker deep forecaseting file parsing error

I am trying out Deep AR foreDeep AR Forecastingcasting training algorithm. My training job keeps failing with the following error while parsing the jsonlines file: row: 1) Failure reason ClientError: Error when parsing json (source:…
0
votes
1 answer

How do I read from a json-lines file into a Dataset with an immutable.HashMap?

I have the following classes, case class myClass (a: String, b: Boolean, c: Double, d: HashMap[String, E]) case class E (f: String, g: Int) the following code to load into this from a json file into a…
Exec
  • 407
  • 6
  • 18
0
votes
0 answers

normalizing json lines in pandas

I have a json line file, where each line has some structure which I am trying to (mostly) flatten, thus: with open("/home/igor/data/feed.jsonl") as json_file: thelist2 = [] for line in json_file: …
Igor Rivin
  • 4,632
  • 2
  • 23
  • 35
0
votes
1 answer

Keeping proper JSON structure when using JSONlines to scrape large amounts of data

Recently I've been having to scrape significantly larger amounts of data and changed from using the feed format 'json' to 'jsonlines' to avoid having it all scrambled and duplicated. The issue is that now none of my programs recognize the exported…
0
votes
1 answer

Struggling to reassemble jsonl from stream

I am trying to process jsonlines from an API and I am running into an issue where requests.iter_lines() is not timely. I have to now try to incorporate requests.iter_content(chunk_size=1024*1024). I am trying to work through the logic I would need…
0
votes
1 answer

How to write dataframe data into json file as json non array objects?

I have data in Pandas dataframe and I am able to write the dataframe data into JSON file by calling: df.to_json('filepath', orient='records') This writes data into json file as an array of JSON objects. [{"col 1":"a","col 2":"b"},{"col 1":"c","col…
A Baldino
  • 178
  • 1
  • 11
0
votes
1 answer

Json lines (.jsonl file) & SQL Server 2016

I’ve been going backwards and forwards over this but stumped. I have a file that has multiple JSON lines in it across multiple objects. I've put two lines below. { "mental_health_act_legal_status": [ {"legal_status_classification":…
user1663003
  • 149
  • 1
  • 10
0
votes
1 answer

Matching JSONlines from listings into new JSON list

I am trying to match listings of products in a JSON lines format with products in another file also in JSON format. This is sometimes called Record Linkage, Entity Resolution, Reference Reconciliation, or just matching. The goal is to match product…
0
votes
1 answer

Process and Query big amount of large files in JSON Lines format

Which technology would be best to import large amount of large JSON Line format files (approx 2 GB per file). I am thinking about Solr. Once the data will be imported it will have to be query-able. Which technology would you suggest to import and…
DamianPawski
  • 375
  • 1
  • 10
0
votes
0 answers

Read in Java a text file containing multiple JSON objects with newline separators

I need some help to read in Java a large text file that contains 25000 JSON objects that are delimited by newline. Example: {"productName":"Latitude E2000","Make":"Dell"} {"productName":"Latitude E2500","Make":"Dell"}…
k.t.
  • 1
  • 1
0
votes
1 answer

define parent in elasticsearch-dsl-py

I'm trying to use Elasticsearch-dsl-py to index some data from a jsonl file with many fields. ignoring the less general parts, the code looks like this: es = Elasticsearch() for id,line in enumerate(open(jsonlfile)): jline = json.loads(line) …
Ori5678
  • 499
  • 2
  • 5
  • 15
0
votes
1 answer

Is it problematic to have a JSON Lines file that has mixed JSON structures?

I would like to know whether, if a JSON Lines file is structured like this: {"structure1":"some kind of file header"} {"structure2": [ {"key1":"aaa", "key2":"bbb"}, {"key1":"one", "key2":"two"}] {"structure2": [ {"key1":"xxx", "key2":"yyy"},…
rstruck
  • 1,174
  • 4
  • 17
  • 27
-1
votes
1 answer

cannot import name '_jsonl' from partially initialized module 'jsonl'

I am trying to install the python jsonl module on an Ubuntu system. Tried on two different computers and get the same exact error: pip install jsonl Collecting jsonl Downloading jsonl-1.6.tar.gz (9.0 kB) ERROR: Command errored out with exit…
SomebodySysop
  • 141
  • 1
  • 1
  • 4
-1
votes
1 answer

TypeError: list indices must be integers or slices, not str with a jsonl file

I have a jsonl file, which looks like this: with open('myfile.jsonl', 'r') as f: dicts = json.load(f) # it's a list of dictionaries [{'date': '2018-12-11', 'base_currency': 'EUR', 'target_currency': 'USD', 'exchange_rate': 1.1379}, …
Bluetail
  • 1,093
  • 2
  • 13
  • 27
1 2 3
10
11