-3

I'd like to parse this JSON file.

to get something like this with the 2nd column as Canonical SMILES and 3rd column as Isomeric SMILES.

5317139<TAB><TAB>CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1<TAB>CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1

Could anybody show me how to do it in the best way in jq?

peak
  • 105,803
  • 17
  • 152
  • 177
user1424739
  • 11,937
  • 17
  • 63
  • 152

1 Answers1

0

The following jq script (run with the -r command-line option) meets the stated requirements, assuming that the occurrence of <TAB><TAB> is a typo:

def getString($TOCHeading):
  .. | objects | select( .TOCHeading == $TOCHeading)
  | .Information[0].Value.StringWithMarkup[0].String;

.Record
| [.RecordNumber,
   getString("Canonical SMILES"),
   getString("Isomeric SMILES")]
| @tsv

This script produces:

5317139 CCCC=C1C2=C(C3C(O3)CC2)C(=O)O1  CCC/C=C\\1/C2=C(C3C(O3)CC2)C(=O)O1
peak
  • 105,803
  • 17
  • 152
  • 177
  • Both "StringWithMarkup" are arrays. How to make the code robust in case there are more than one such strings (for example, show an error and return non-zero)? – user1424739 Oct 31 '19 at 12:38
  • The jq solution is already robust in a certain sense. Please clarify the new requirements, or maybe ask a new SO question. – peak Oct 31 '19 at 12:41