0

I am trying to convert LDIF to JSON using awk.

Can't figure out how to print before and after each multi-line record. Can print in BEGIN and END one time each or before and after each line. But never before and after each record.

Actual LDIF input to awk is:

dn: CN=foo
objectClass: top

dn: CN=bar
objectClass: top

To convert to JSON awk needs output to look like this:

{
  "dn": "CN=foo",
  "objectClass": "top"
}
{
  "dn": "CN=bar",
  "objectClass": "top"
}

Script 1 wraps each line of the record with curly braces.

BEGIN {                                                                                                                                                        
        RS="\n\n#";
        FS=": ";
}
print "{"
{
        print "\""$1"\": \""$2"\",";
}
print "}"

Script 2 wraps the set of all records with on set of curly braces:

BEGIN {                                                                                                                                                        
        RS="\n\n#";
        FS=": ";
        print "{"
}
{
        print "\""$1"\": \""$2"\",";
}
END{
        print "}"
}

Seems like awk only has BEGIN, END and implicit loop over records (single or multi-line). I can't figure out how to print something before and after each multi-line record. Is this possible in awk? Is there a better way to convert LDIF to JSON?

What would an awk script, not a one liner, look like that does the LDIF to JSON conversion?

user261502
  • 71
  • 1
  • 8

1 Answers1

0

idk what you thought that awk script was going to do but to get the expected output you posted from the input you posted all you need is:

$ awk '!/^[{}]/{print ( (NR-1)%2 ? "{" ORS $0 : $0 ORS "}" )}' file
{
record1 line1
record1 line2
}
{
record2 line1
record2 line2
}

Update: given your updated input printing before/after a record is even simpler:

$ awk -v RS= '{print "{" ORS $0 ORS "}"}' file
{
dn: CN=foo
objectClass: top
}
{
dn: CN=bar
objectClass: top
}

and to get the output you showed in your question would be:

$ cat tst.awk
BEGIN { RS=""; FS="\n" }
{
    print "{"
    for (i=1; i<=NF; i++) {
        tag = val = $i
        sub(/:.*/,"",tag)
        sub(/[^:]+:[[:space:]]*/,"",val)
        printf "  \"%s\": \"%s\"%s\n", tag, val, (i<NF ? "," : "")
    }
    print "}"
}

$ awk -f tst.awk file
{
  "dn": "CN=foo",
  "objectClass": "top"
}
{
  "dn": "CN=bar",
  "objectClass": "top"
}
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I love a good one liner as much as the next guy, but when you're trying to learn a tool, they often end up being unreadable. Also, I guess I should have posted more realistic data as this answer doesn't work with the LDIF output. So I guess my question was less goal oriented and more philosophical in nature about how to write an awk script that prints before and after a record. – user261502 Jun 09 '19 at 21:27
  • Then the script I posted answered your question as it showed how to do that, right? I updated it to print { before and } after each record given your new input. – Ed Morton Jun 10 '19 at 01:40