How to convert CSV file to multiline JSON?

Question

Here's my code, really simple stuff...

import csv
import json

csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')

fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
out = json.dumps( [ row for row in reader ] )
jsonfile.write(out)

Declare some field names, the reader uses CSV to read the file, and the filed names to dump the file to a JSON format. Here's the problem...

Each record in the CSV file is on a different row. I want the JSON output to be the same way. The problem is it dumps it all on one giant, long line.

I've tried using something like for line in csvfile: and then running my code below that with reader = csv.DictReader( line, fieldnames) which loops through each line, but it does the entire file on one line, then loops through the entire file on another line... continues until it runs out of lines.

Any suggestions for correcting this?

Edit: To clarify, currently I have: (every record on line 1)

[{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"},{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}]

What I'm looking for: (2 records on 2 lines)

{"FirstName":"John","LastName":"Doe","IDNumber":"123","Message":"None"}
{"FirstName":"George","LastName":"Washington","IDNumber":"001","Message":"Something"}

Not each individual field indented/on a separate line, but each record on it's own line.

Some sample input.

"John","Doe","001","Message1"
"George","Washington","002","Message2"

i'm not sure your code does *exactly* what you say; it should produce `[{..row..},{..row..},...]` not `{..row..}{..row..}..`. That is to say, the output looks like it will be a json array of json objects, not a stream of unconnected json objects. — SingleNegationElimination, Oct 31 '13 at 12:42

score 174 · Accepted Answer · answered Oct 31 '13 at 12:49

174

The problem with your desired output is that it is not valid json document,; it's a stream of json documents!

That's okay, if its what you need, but that means that for each document you want in your output, you'll have to call json.dumps.

Since the newline you want separating your documents is not contained in those documents, you're on the hook for supplying it yourself. So we just need to pull the loop out of the call to json.dump and interpose newlines for each document written.

import csv
import json

csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')

fieldnames = ("FirstName","LastName","IDNumber","Message")
reader = csv.DictReader( csvfile, fieldnames)
for row in reader:
    json.dump(row, jsonfile)
    jsonfile.write('\n')

answered Oct 31 '13 at 12:49

SingleNegationElimination

151,563
33
264
304

1

Perfect! Sorry you had to do a bit of mind reading to get it, and thanks for the corrections/clarifications. This is exactly what I was looking for. – BeanBagKing Oct 31 '13 at 13:03
4

but problem is outfile is not a valid json – MONTYHS Mar 07 '14 at 14:48
1

@MONTYHS: The first sentance of this answer explains that outfile is not a json document; and what it is instead. Are you having a different problem from the person who asked this question? – SingleNegationElimination Mar 07 '14 at 16:44
@SingleNegationElimination But, in output `json` file, it also includes JSON of headers from `csv` if they are present. What can be done to remove it? – abhi1610 Aug 09 '16 at 10:55
6

@abhi1610: if you're expecting a header in the input, you should construct the `DictReader` without giving a `fieldnames` argument; it will then read the first line to get the fieldnames from the file. – SingleNegationElimination Aug 09 '16 at 14:44
2

And it is good to add encoding for your files `csvfile = open('file.csv', 'r',encoding='utf-8')` and `jsonfile = open('file.json', 'w',encoding='utf-8')` – Marek Bernád Oct 11 '18 at 10:15

score 40 · Answer 2 · answered Feb 02 '17 at 12:13

40

You can use Pandas DataFrame to achieve this, with the following Example:

import pandas as pd
csv_file = pd.DataFrame(pd.read_csv("path/to/file.csv", sep = ",", header = 0, index_col = False))
csv_file.to_json("/path/to/new/file.json", orient = "records", date_format = "epoch", double_precision = 10, force_ascii = True, date_unit = "ms", default_handler = None)

answered Feb 02 '17 at 12:13

Naufal

1,203
14
12

was concise and does the job – Akhil Apr 12 '21 at 04:05

score 16 · Answer 3 · edited Jun 20 '20 at 09:12

16

import csv
import json

file = 'csv_file_name.csv'
json_file = 'output_file_name.json'

#Read CSV File
def read_CSV(file, json_file):
    csv_rows = []
    with open(file) as csvfile:
        reader = csv.DictReader(csvfile)
        field = reader.fieldnames
        for row in reader:
            csv_rows.extend([{field[i]:row[field[i]] for i in range(len(field))}])
        convert_write_json(csv_rows, json_file)

#Convert csv data into json
def convert_write_json(data, json_file):
    with open(json_file, "w") as f:
        f.write(json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))) #for pretty
        f.write(json.dumps(data))


read_CSV(file,json_file)

Documentation of json.dumps()

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 06 '18 at 19:58

Laxman

209
2
5

1

Correct answer, imho. – JohnnyFromBF Oct 03 '20 at 11:14
1

You must chose this line: `f.write(json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))) #for pretty` or this: `f.write(json.dumps(data))` - not both! – S. A. Dec 11 '20 at 18:40
worked perfectly which returns a json array – Akhil Apr 12 '21 at 03:59

score 12 · Answer 4 · answered Nov 25 '15 at 22:25

12

I took @SingleNegationElimination's response and simplified it into a three-liner that can be used in a pipeline:

import csv
import json
import sys

for row in csv.DictReader(sys.stdin):
    json.dump(row, sys.stdout)
    sys.stdout.write('\n')

answered Nov 25 '15 at 22:25

Lawrence I. Siden

9,191
10
43
56

1

Does anyone remember awful perl oneliner's? `python -c 'import json;import csv;out=open("jsonfilename","w");c=csv.DictReader(open("csvfilename"));[(json.dump(x,out),out.write("\n")) for x in c]` – John Jul 28 '21 at 22:13

Snork S · Answer 5 · 2016-08-23T14:57:43.840

You can try this

import csvmapper

# how does the object look
mapper = csvmapper.DictMapper([ 
  [ 
     { 'name' : 'FirstName'},
     { 'name' : 'LastName' },
     { 'name' : 'IDNumber', 'type':'int' },
     { 'name' : 'Messages' }
  ]
 ])

# parser instance
parser = csvmapper.CSVParser('sample.csv', mapper)
# conversion service
converter = csvmapper.JSONConverter(parser)

print converter.doConvert(pretty=True)

Edit:

Simpler approach

import csvmapper

fields = ('FirstName', 'LastName', 'IDNumber', 'Messages')
parser = CSVParser('sample.csv', csvmapper.FieldMapper(fields))

converter = csvmapper.JSONConverter(parser)

print converter.doConvert(pretty=True)

I think you should have, at least, explicitly mention that you're using a third-party module, `csvmapper`, to do this (and maybe where to get it) as opposed to something built-in. — martineau, Jun 30 '18 at 16:54

score 4 · Answer 6 · answered Aug 18 '16 at 15:50

I see this is old but I needed the code from SingleNegationElimination however I had issue with the data containing non utf-8 characters. These appeared in fields I was not overly concerned with so I chose to ignore them. However that took some effort. I am new to python so with some trial and error I got it to work. The code is a copy of SingleNegationElimination with the extra handling of utf-8. I tried to do it with https://docs.python.org/2.7/library/csv.html but in the end gave up. The below code worked.

import csv, json

csvfile = open('file.csv', 'r')
jsonfile = open('file.json', 'w')

fieldnames = ("Scope","Comment","OOS Code","In RMF","Code","Status","Name","Sub Code","CAT","LOB","Description","Owner","Manager","Platform Owner")
reader = csv.DictReader(csvfile , fieldnames)

code = ''
for row in reader:
    try:
        print('+' + row['Code'])
        for key in row:
            row[key] = row[key].decode('utf-8', 'ignore').encode('utf-8')      
        json.dump(row, jsonfile)
        jsonfile.write('\n')
    except:
        print('-' + row['Code'])
        raise

score 3 · Answer 7 · edited Oct 31 '13 at 03:19

3

Add the indent parameter to json.dumps

 data = {'this': ['has', 'some', 'things'],
         'in': {'it': 'with', 'some': 'more'}}
 print(json.dumps(data, indent=4))

Also note that, you can simply use json.dump with the open jsonfile:

json.dump(data, jsonfile)

edited Oct 31 '13 at 03:19

satoru

31,822
31
91
141

answered Oct 31 '13 at 03:17

Wayne Werner

49,299
29
200
290

Not quite what I'm looking for. I edited my original question to clarify and show the desired output. Thank you for the tip though, this may come in handy later. – BeanBagKing Oct 31 '13 at 12:43

score 3 · Answer 8 · answered Feb 08 '21 at 20:08

Use pandas and the json library:

import pandas as pd
import json
filepath = "inputfile.csv"
output_path = "outputfile.json"

df = pd.read_csv(filepath)

# Create a multiline json
json_list = json.loads(df.to_json(orient = "records"))

with open(output_path, 'w') as f:
    for item in json_list:
        f.write("%s\n" % item)

score 2 · Answer 9 · answered Jul 07 '16 at 17:10

How about using Pandas to read the csv file into a DataFrame (pd.read_csv), then manipulating the columns if you want (dropping them or updating values) and finally converting the DataFrame back to JSON (pd.DataFrame.to_json).

Note: I haven't checked how efficient this will be but this is definitely one of the easiest ways to manipulate and convert a large csv to json.

score 0 · Answer 10 · answered Mar 05 '14 at 19:43

As slight improvement to @MONTYHS answer, iterating through a tup of fieldnames:

import csv
import json

csvfilename = 'filename.csv'
jsonfilename = csvfilename.split('.')[0] + '.json'
csvfile = open(csvfilename, 'r')
jsonfile = open(jsonfilename, 'w')
reader = csv.DictReader(csvfile)

fieldnames = ('FirstName', 'LastName', 'IDNumber', 'Message')

output = []

for each in reader:
  row = {}
  for field in fieldnames:
    row[field] = each[field]
output.append(row)

json.dump(output, jsonfile, indent=2, sort_keys=True)

score 0 · Answer 11 · answered Jan 21 '21 at 18:40

def read():
    noOfElem = 200  # no of data you want to import
    csv_file_name = "hashtag_donaldtrump.csv"  # csv file name
    json_file_name = "hashtag_donaldtrump.json"  # json file name

    with open(csv_file_name, mode='r') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        with open(json_file_name, 'w') as json_file:
            i = 0
            json_file.write("[")
            
            for row in csv_reader:
                i = i + 1
                if i == noOfElem:
                    json_file.write("]")
                    return

                json_file.write(json.dumps(row))

                if i != noOfElem - 1:
                    json_file.write(",")

Change the above three parameter, everything will be done.

score -2 · Answer 12 · answered Oct 31 '13 at 12:03

-2

import csv
import json
csvfile = csv.DictReader('filename.csv', 'r'))
output =[]
for each in csvfile:
    row ={}
    row['FirstName'] = each['FirstName']
    row['LastName']  = each['LastName']
    row['IDNumber']  = each ['IDNumber']
    row['Message']   = each['Message']
    output.append(row)
json.dump(output,open('filename.json','w'),indent=4,sort_keys=False)

answered Oct 31 '13 at 12:03

MONTYHS

926
1
7
30

When I try to use this I get "KeyError: 'FirstName'". It doesn't seem like the key is being added. I'm not sure exactly what you're trying to do here, but I don't think the output matches what I'm looking for since you use the same indent=4 as Wayne. What output should I expect? I edited my original post to clarify what I'm looking for. – BeanBagKing Oct 31 '13 at 12:41
The key error is most likely because this code does not pass a headers argument to `DictReader`, so it's guessing the field names from the first line of the input file: John, Doe, 5, "None" instead of "FirstName, lastname," and so on... – SingleNegationElimination Oct 31 '13 at 12:45
Better option, this one actually parses the CSV for the desired fields (not just in order, as in the marked answer) – GarciadelCastillo Mar 05 '14 at 19:41
I get an error saying `TypeError: expected string or buffer` – CodyBugstein Jun 25 '15 at 15:41

How to convert CSV file to multiline JSON?

12 Answers12

Linked

Related