37

In working with Elasticsearch on AWS EC2, I just hit an issue with bulk indexing. The ES _bulk endpoint requires the files to be basically JSON serial strings with \n terminators on each string; and what I have built using various web APIs and file pre/processing is pretty JSON ie., easily human readable.

Is there a simple shell script method to get all the pretty JSON simply concatenated into strings, without loading up some Java libraries or whatever? I can add tokens to the basic file during pre-processing to tag the desired \n breaks if that helps parsing, but if anyone has a tip on the toolset I would be grateful. I have a feeling there are scripts out there and I know there are some libraries, but I have not found any simple command line tools to do the unpretty printing so far.

GreenGiant
  • 4,930
  • 1
  • 46
  • 76
sidgeeder
  • 419
  • 1
  • 4
  • 5

4 Answers4

63

You can try the great jq tool for parsing JSON in the shell. To de-pretty print with jq, you can use either method below:

cat pretty-printed.json | jq -c .
jq -c . pretty-printed.json

the -c (or --compact-output) tells it to not pretty print (which is the default). The "." tells it to return the JSON content "as is" unmodified other than the reformatting. It gets dumped back to stdout, so you can redirect output or pipe it to something else.

P.S. I was looking to address the same problem and came to this option.

David
  • 3,223
  • 3
  • 29
  • 41
4

The answer from D_S_toowhite was not a direct answer but it set me thinking in the right way i.e., the problem is to remove all the white space. I found a very simple way to remove all white space using command line tool tr:

tr -d [:space:] inputfile

The :space: tag removes all white space, tabs, spaces, vertical tabs etc. So a pretty JSON input like this:-

{
    "version" : "4.0",
    "success" : true,
    "result" :
    {
            "Focus" : 0.000590008,
            "Arc" : 12
    }
}

becomes this JSON serial string:

{"version":"4.0","success":true,"result":{"Focus":0.000590008,"Arc":12}}

I still have to solve the \n terminator but I think that is trivial now at least in my special case, just append after closing bracket pair using sed.

Alexander Farber
  • 21,519
  • 75
  • 241
  • 416
sidgeeder
  • 419
  • 1
  • 4
  • 5
  • 5
    The problem with this approach is that if the data contained any important whitespace (ie, one of the values was a string that contained a sentence) it will get incorrectly stripped. – benkc Oct 02 '18 at 19:07
1

You can try find/replace using regexp:

  1. Find what: "^\s{2,}" replace to ""
  2. Find what "\n" replace ""

See this: https://github.com/dzhibas/SublimePrettyJson/issues/17

D_S_toowhite
  • 643
  • 5
  • 17
1

jsonlint is easy to get up and running in the command line with the help of npm, and a simple way to print out 'no fluff' JSON is to give it an indentation character of "".

jsonlint -t ""

As a bonus for command line users, I use this all the time to take paste buffers (on a Mac) and convert them into something else, for instance:

Swap buffer contents for a JSON linted 'compressed' format:

pbpaste | jsonlint -t "" | pbcopy

Swap buffer contents for a pretty printed JSON linted format:

pbpaste | jsonlint | pbcopy

You could also pipe file contents to an ugly (and JSON linted) version of the file:

cat data-pretty.json | jsonlint -t "" > data-ugly.json
ryanm
  • 2,239
  • 21
  • 31