0

How do I convert these lists of text strings into json

Text strings:

start filelist:
/download/2017/download_2017.sh
/download/2017/log_download_2017.json
/download/2017/log_download_2017.txt
start wget:
2017-05-15 20:42:00 URL:http://web.site.com/downloads/2017/file_1.zip [1024/1024] -> "file_1.zip" [1]
2017-05-15 20:43:21 URL:http://web.site.com/downloads/2017/file_2.zip [2048/2048] -> "file_2.zip" [1]

JSON output:

{
"start filelist": [
    "download_2017.sh",
    "log_download_2017.txt",
    "log_download_2017.json",
  ],
}
{
"start wget": [
    "2017-05-15 20:43:01 URL:http://web.site.com/downloads/2017/file_1.zip [1024/1024] -> "file_1.zip" [1]",
    "2017-05-15 20:43:21 URL:http://web.site.com/downloads/2017/file_2.zip [2048/2048] -> "file_2.zip" [1]",
  ],
}

Appreciate any options and approaches

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Gabe
  • 226
  • 3
  • 13
  • Starting one copy of `tee` per line of your script is **crazy** inefficient, and moreover, it means that no single program can be responsible for generating a single, consistent, syntactically-valid JSON document. – Charles Duffy May 16 '17 at 01:26
  • @CharlesDuffy thank you, this is good to know. Historically, for relatively simple logging operations where log overhead was not a big consideration, I would just redirect the output to a txt log file. Here, I would now like to redirect the output to two log files: txt and json. Possibly with the next iteration involving a third log file: xml. Please could explain what is the inefficiency? Is the overhead of tee so high that it should be avoided? What would be alternative approaches? – Gabe May 16 '17 at 01:30
  • 1
    It costs literally hundreds or thousands of times the performance cost of an `echo` to set up a pipeline running external commands. Every pipeline consists of `mkfifo()`s, `fork()s`, and -- if external commands are being run -- `exec()`s. Moreover, any time you run `>>file`, that command opens the file for output before it starts, and flushes and closes it when it ends -- much more expensive than just opening the file once and leaving it open for multiple command executions. – Charles Duffy May 16 '17 at 02:53
  • BTW, would you consider splitting the follow-on question about how to stream the output from multiple commands into `jq` into a separate question? If the answer by @peak adequately addresses the core of the issue, then that should be accepted; and the content outside its scope should have somewhere else to be addressed. – Charles Duffy May 16 '17 at 03:30
  • Got it. Thanks again, pretty sure @peak addressed the core issue of the question and you addressed the implementation of the shell script to support this. Just testing now will confirm soon. – Gabe May 16 '17 at 03:36

1 Answers1

1

Here's a jq-only solution, which produces valid JSON along the lines of your example:

foreach (inputs,null) as $line ({};
   if $line == null then .emit = {(.key): .value}
   elif $line[-1:] == ":"
   then (if .key then {emit: {(.key): .value}} else null end)
        + { key : $line[0:-1] }
   else {key, value: (.value + [$line])}
   end;
   .emit // empty )

Invocation:

jq -n -R -f program.jq input.txt

Please note the -n option in particular.

Caveats

If the input does not begin with a "key" line, then the above jq program will report an error and terminate. If more fault-tolerance is required, then the following variant might be of interest:

foreach (inputs,null) as $line ({};
   if $line == null then .emit = {(.key|tostring): .value}
   elif $line[-1:] == ":"
   then (if .key then {emit: {(.key): .value}} else null end)
        + { key : $line[0:-1] }
   else {key, value: (.value + [$line])}
   end;
   .emit // empty )
peak
  • 105,803
  • 17
  • 152
  • 177
  • You might consider a shebang (ie. `#!/usr/bin/env jq -nRf`), to let this just be run as `./program input.txt` or `./program – Charles Duffy May 15 '17 at 22:21
  • Thank you. This is an elegant solution. Using the proceeding write to the text file enables the loop to process the text file for the write to json. It works well for the first array 'start filelist" `ls -1 `pwd`/* | tee -a $logfilename.txt | jq -n -R -f json_log_array.jq $logfilename.txt >> $logfilename.json`, however it seems to having a problem with second array 'start wget' 'wget -nv http://web.site.com/downloads/2017/file_1.zip 2>&1 | tee -a $logfilename.txt | jq -n -R -f json_log_array.jq $logfilename.txt >> $logfilename.json' and I am receiving some strange outputs in the json file. – Gabe May 16 '17 at 01:17
  • @Gabe, don't do multiple separate redirections. Pipe all your input commands into a *single* instance of the script given in this answer. – Charles Duffy May 16 '17 at 01:18
  • @CharlesDuffy thank you, got it now. It took a little research and reviewing, but now making sense. Your suggestion below worked nicely in combination the program.jq – Gabe May 16 '17 at 04:39