1

I have following text file to process

 **parent**
 father = erik
 mother = rita
 *son*
 name = john
 age = 13
 *daughter*
 name = lili
 age = 24
 status = student

 **parent**
 father = boby
 mother = christa
 *son*
 name = tim
 age = 2

 **parent**
 father = leo
 mother = victoria
 *daughter*
 name = kim
 age = 36
 occupation = singer
 haircolor = blond

and need to have a JSON format as follows:

{"parent": [
             { "father": "erik",
               "mother": "rita", 
               "son": {
                   "name": "john",
                   "age": "13"
               },
               "daughter": {
                   "name": "lili",
                   "age": "24",
                   "occupation": "student"
               }
             },
             { "father": "boby",
               "mother": "christa",
               "son": {
                   "name": "tim",
                   "age": "2"
               }
             },
             { "father": "leo",
               "mother": "victoria",
               "daughter": {
                   "name": "kim",
                   "age": "36",
                   "occupation": "singer",
                   "haircolor": "blond"
               }
             }
            ]
  }

My question is how to write the code in nawk or awk to do that. Points to consider:

  • not for every parents (father and mother) son or daughter exist
  • son or daughter could have or not different parameters, which are not present in other children, i.e. occupation, weight, haircolor
6axter82
  • 569
  • 2
  • 8
  • 19
  • Note that your desired output precludes a family from having more than one son or more than one daughter: the son and daughter elements should be lists. – glenn jackman Aug 14 '15 at 13:45

1 Answers1

1

I'd use a language like perl instead, where I can build up a datastructure in the native language, then encode it as JSON

perl -MJSON -ne '
  BEGIN {$root = {parent=>[]}}
  if (/^[*][*]parent/) {$unit = "family"; $family = {}; next;}
  if (/^[*]son/)       {$unit = "son"; $son = {}; next;}
  if (/^[*]daughter/)  {$unit = "daughter"; $daughter = {}; next;}
  if (/(\w+)\s*=\s*(\w+)/) {${$unit}->{$1} = $2;}
  sub add_family {
    $family->{son} = $son if $son; 
    $family->{daughter} = $daughter if $daughter;
    push @{$root->{parent}}, $family; 
    undef $son; 
    undef $daughter;
    undef $family;
  }
  if (/^$/) {add_family}
  END {
    add_family if $family;
    print to_json($root, {pretty=>1}), "\n";
  }
' file
{
   "parent" : [
      {
         "son" : {
            "name" : "john",
            "age" : "13"
         },
         "daughter" : {
            "name" : "lili",
            "status" : "student",
            "age" : "24"
         },
         "father" : "erik",
         "mother" : "rita"
      },
      {
         "son" : {
            "name" : "tim",
            "age" : "2"
         },
         "father" : "boby",
         "mother" : "christa"
      },
      {
         "father" : "leo",
         "daughter" : {
            "age" : "36",
            "occupation" : "singer",
            "name" : "kim",
            "haircolor" : "blond"
         },
         "mother" : "victoria"
      }
   ]
}
glenn jackman
  • 238,783
  • 38
  • 220
  • 352