Questions tagged [jsonlines]

JSON Lines is a format for storing structured data that may be processed one record at a time. It is a convenient format for storing structured data that may be processed one record at a time. It works well with Unix-style text processing tools and shell pipelines.

This text format is documented at http://jsonlines.org/.

156 questions
0
votes
1 answer

How to Index Json files in azure cognitive search having more than 40 complex fields

I have 1GB of Json files which I am trying to index using Azure Cognitive Search. On the very last step while creating an indexer I am getting an error saying, "The request is invalid. Details : Invalid index: The index contains 54 complex…
Khan
  • 3
  • 2
0
votes
2 answers

Extract nested array from JSONL file

I am extracting extra fields from a JSONL file using json2csv.py (compiled using twarc), and am having trouble extracting some text fields that are held within an array. This is the array, and I want to be able to pull out just the hashtag…
0
votes
2 answers

gzip a list of nested dictionaries

I have a group of .jsonl.gz files. I can read them using the script: import json import gzip with gzip.open(filepath, "r") as read_file: # file path ends with .jsonl.gz try: # read gzip file which contains a list of json files…
0
votes
0 answers

GCP - is there any api exposed to create jsonl file

Please help with GCP - automl api which can be used in c# code, such that we can pass gcp bucket location and output received is jsonl (same as jsonl received when we import document to dataset using google console) Thanks
0
votes
1 answer

Speed up parsing of gzipped jsonlines files

I have about 5,000 .gzip files (~1MB each). Each of these files contains data in a jsonlines format. Here's what it looks like: {"category_id":39,"app_id":12731} {"category_id":45,"app_id":12713} {"category_id":6014,"app_id":13567} I want to parse…
Superbman
  • 787
  • 1
  • 8
  • 24
0
votes
0 answers

Convert JSON lines file to R dataframe

I have a sample extract in the form of json lines that contains single object and around 100 rows. There are about 800 items per row. Here is a sample of the data: Row 1 - {"Id":"User1","OwnerId":"OwnerID1","IsDeleted":false,"Name":"SampleName1",…
0
votes
4 answers

Extracting text from json file and saving into text file

import json file= open('webtext.txt','a+') with open('output-dataset_v1_webtext.test.jsonl') as json_file: data= json.load(json_file) for item in data: file.write(item) print(item) >>>…
Faisal
  • 151
  • 3
  • 10
0
votes
1 answer

Get all values of a specific key based on another key specific value

I have a jsonlines format file with more than 1 mln lines (let's say BIG.json). I want to filter this file based on some key/value dependencies (explained below). All lines are, of course, structured in the same way, here are 5 consecutive lines of…
0
votes
2 answers

Write filtered json values to csv

I am looping through a json line files where i am just filtering for sender id and status nd outputting this to the terminal. There are multiple_sender id which are within a list whilst the sender is are just a string. I want to be able to write the…
0
votes
2 answers

Jsonlines file resulting in KeyError Python

I have a json file which i am loading in order to filter through a certain key called "sender_id". I can seem to filter through any other keys but when it comes to filtering for "sende_id" it results in a KeyError: 'sender_id' My python script is as…
0
votes
0 answers

Writing Numpy Arrays into a jsonlines file

I want to save numpy arrays to a jsonlines file. Using the code below: import jsonlines with jsonlines.open('output.jsonl', mode='w') as writer: writer.write({'a':np.array([2,3])}) with jsonlines.open('output.jsonl', mode='a') as writer: …
pouria babvey
  • 145
  • 2
  • 12
0
votes
1 answer

JsonLinesItemExporter outputs an array in each field

I'm using JsonLinesItemExporter to export some data and instead of {"name": "Color TV", "price": "1200"} {"name": "DVD player", "price": "200"} scrapy is writing the following to file: {"name": ["Color TV"], "price": ["1200"]} {"name": ["DVD…
Fernando César
  • 681
  • 1
  • 8
  • 16
0
votes
1 answer

How to parse jsonlines file using pandas

I am new to python and trying to parse data from a file that contains millions of lines. Tried to go old school to parse it using excel but it fails. How can I parse the information efficiently and export them into an excel file so that it is easier…
Erene
  • 3
  • 3
0
votes
2 answers

reshape jq nested file and make csv

I've been struggling with this one for the whole day which i want to turn to a csv. It represents the officers attached to company whose number is "OC418979" in the UK Company House API. I've already truncated the json to contain just 2 objects…
Tytire Recubans
  • 967
  • 10
  • 27
0
votes
1 answer

Python: How to write jsonline without overwriting?

I have a piece of code, it process thousands of files in a directory, for each file, it generate an object (dictionary) with part of its key-value as: { ........ 'result': [...a very long list...] } if I process all the files, save result…
Jie Hu
  • 87
  • 2
  • 9