Convert Large CSV into multiple JSON arrays of fixed records (eg. 100 records json arrays ) using Shell

Question

How do i convert a large CSV into JSON Arrays of fixed record set ( JSON arrays of 100 records ) through SHELL script or command line?

Eg. of Input CSV file:

identifier,type,locale
91617676848,MSISDN,es_ES
91652560975,MSISDN,es_ES
91636563675,MSISDN,es_ES

Expected output:

1.json  (json array having 100 array records)
  [
  {
    "identifier": "91617676848",
    "type": "MSISDN",
    "locale": "es_ES"
  },
  .
  .
  .
  .
  {
    "identifier": "91652560975",
    "type": "MSISDN",
    "locale": "es_ES"
  }
  ]


  2.json (json array having 100 array records)
  [
  {
    "identifier": "91636563675",
    "type": "MSISDN",
    "locale": "es_ES"
  },
  .
  .
  .
  .

  {
    "identifier": "91636563999",
    "type": "MSISDN",
    "locale": "es_ES"
  }
  ]

score 0 · Answer 1 · answered Nov 07 '19 at 20:45

I created a simple php script (i called it converter.php).

You can call as is: php converter.php test.csv where test.csv contains and default csv data with the first line as header.

<?php
    // open the file passed as parameter
    // Ex: php converter.php test.csv
    // Where test.csv contains the data passed on question
    if (($handle = fopen($argv[1], 'r')) !== false) {
            $count = 0;
            $lines = [];
            while (($data = fgetcsv($handle, 0, ',', '\'')) !== false) {
                    if ($count == 0) {
                        $headers = $data;
                    } else {
                        $lines[] = array_combine($headers, $data);
                    }
                    $count++;
            }
            // Here, separate in array of arrays with 100 elements on each
            // On test i used 2 on second parameter of array_chunk to test with your toy data
            $groups = array_chunk($lines, 100);
            foreach ($groups as $key => $group) {
                    file_put_contents('json_data-'.$key.'.json', json_encode($group));
            }
    }

I run locally and i separated the files by two elements to test it and resulted in two files saved locally, named json_data-<key>.json

And the results are here:

json_data-0.json:

[ {"identifier":"91617676848","type":"MSISDN","locale":"es_ES"},{"identifier":"91652560975","type":"MSISDN","locale":"es_ES"} ]
json_data-1.json:

[ {"identifier":"91636563675","type":"MSISDN","locale":"es_ES"} ]

score 0 · Answer 2 · answered Nov 08 '19 at 03:54

Would you please try an awk solution:

awk -v bs=10 '
    NR == 1 {
        cols = split($0, header, ",")
        next
    }
    {
        if ((NR - 1) % bs == 1) {
            file = sprintf("%d.json", ++n)
            print "[\n  {" > file
        } else {
            print ",\n  {" >> file
        }
        split($0, a, ",")
        for (i = 1; i <= cols; i++) {
            printf("    \"%s\": \"%s\"", header[i], a[i]) >> file
            print (i < cols) ? "," : "" >> file
        }
        printf "%s", "  }" >> file
    }
    (NR - 1) % bs == 0 {
        print "\n]" >> file
        close(file)
    }
    END {
        if ((NR - 1) % bs != 0) print "\n]" >> file
    }
' input.csv

The variable bs holds a number of arrays per file.
It processes the input file line by line and has so many conditional branches to produce proper json files. Sigh.

score 0 · Answer 3 · answered Nov 08 '19 at 03:55

Using bash implementation, task can be completed by repeatedly slicing the line range from the file (2-101, 102-201, ...) until the end of the file. Code below uses sed to extract line, and csvjson to format each block into JSON.

You can replace any of your favorite tools (there are few csv to json alternatives).

Code slightly more verbose that needed.

#! /bin/sh
csv=$1
lines=$(wc -l < $csv)
blocks=$((1+(lines-1)/100))
for (( i=1 ; i <= blocks ; i++ )) ; do
    sed -ne "1p;$((i*100-98)),$((i*100+1))p" $csv | csvjson -i2 > $i.json
done

Assuming reasonable file size, reprocessing the input file will not incur lot of overhead

Convert Large CSV into multiple JSON arrays of fixed records (eg. 100 records json arrays ) using Shell

3 Answers3