awk remove unwanted records and consolidate multiline fields to one line record in specific order

Question

I have an output file that I am trying to process into a formatted csv for our audit team.

I thought I had this mastered until I stumbled across bad data within the output. As such, I want to be able to handle this using awk.

MY OUTPUT FILE EXAMPLE

Enter password ==>
o=hoster

ou=people,o=hoster

ou=components,o=hoster

ou=websphere,ou=components,o=hoster

cn=joe-bloggs,ou=appserver,ou=components,o=hoster
cn=joe
sn=bloggs
cn=S01234565
uid=bloggsj

cn=john-blain,ou=appserver,ou=components,o=hoster
cn=john
uid=blainj
sn=blain

cn=andy-peters,ou=appserver,ou=components,o=hoster
cn=andy
sn=peters
uid=petersa
cn=E09876543

THE OUTPUT I WANT AFTER PROCESSING

joe,bloggs,s01234565;uid=bloggsj,cn=joe-bloggs,ou=appserver,ou=components,o=hoster
john,blain;uid=blainj;cn=john-blain,ou=appserver,ou=components,o=hoster
andy,peters,E09876543;uid=E09876543;cn=andy-peters,ou=appserver,ou=components,o=hoster

As you can see:

we always have a cn= variable that contains o=hoster
uid can have any value
we may have multiple cn= variables without o=hoster

I have acheived the following:

cat output | awk '!/^o.*/ && !/^Enter.*/{print}' | awk '{getline a; getline b; getline c; getline d;  print  $0,a,b,c,d}' | awk -v srch1="cn=" -v repl1="" -v srch2="sn=" -v repl2="" '{ sub(srch1,repl1,$2); sub(srch2,repl2,$3); print $4";"$2" "$3";"$1 }'

Any pointers or guidance is greatly appreciated using awk. Or should I give up and just use the age old long winded method a large looping script to process the file?

Akshay Hegde · Accepted Answer · 2014-01-07T13:24:16.553

You may try following awk code

$ cat file
Enter password ==>
o=hoster

ou=people,o=hoster

ou=components,o=hoster

ou=websphere,ou=components,o=hoster

cn=joe-bloggs,ou=appserver,ou=components,o=hoster
cn=joe
sn=bloggs
cn=S01234565
uid=bloggsj

cn=john-blain,ou=appserver,ou=components,o=hoster
cn=john
uid=blainj
sn=blain

cn=andy-peters,ou=appserver,ou=components,o=hoster
cn=andy
sn=peters
uid=petersa
cn=E09876543

Awk Code :

awk      '
   function out(){
                   print s,u,last
                   i=0; s=""
                 }
        /^cn/,!NF{ 
                   ++i      
                   last = i == 1 ? $0 : last
                   s = i>1 && !/uid/ && NF ? s ? s "," $NF : $NF : s
                   u = /uid/ ? $0 : u
                 }
         i && !NF{
                   out()
                 }
              END{
                   out()
                 }
          ' FS="=" OFS=";" file

Resulting

joe,bloggs,S01234565;uid=bloggsj;cn=joe-bloggs,ou=appserver,ou=components,o=hoster
john,blain;uid=blainj;cn=john-blain,ou=appserver,ou=components,o=hoster
andy,peters,E09876543;uid=petersa;cn=andy-peters,ou=appserver,ou=components,o=hoster

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk

Fantastic and straight forward. I've tried this against a variety of reports and it works perfectly. My awk needs some improving! Many thanks. — maddop, Jan 07 '14 at 13:25

score 1 · Answer 2 · answered Jan 07 '14 at 12:48

This awk script works for your sample and produces the sample output:

BEGIN { delete cn[0]; OFS = ";" }
function print_info() {
    if (length(cn)) {
        names = cn[1] "," sn
        for (i=2; i <= length(cn); ++i) names = names "," cn[i]
        print names, uid, dn
        delete cn
    }
}
/^cn=/ {
    if ($0 ~ /o=hoster/) dn = $0
    else {
        cn[length(cn)+1] = substr($0, index($0, "=") + 1)
        uid = $0; sub("cn", "uid", uid)
    }
}
/^sn=/ { sn = substr($0, index($0, "=") + 1) }
/^uid=/ { uid = $0 }
/^$/ { print_info() }
END { print_info() }

This should help you get started.

Dimitre Radoulov · Answer 3 · 2014-01-07T13:07:27.977

1

awk '$1 ~ /^cn/ {
  for (i = 2; i <= NF; i++) {
    if ($i ~ /^uid/) {
    u = $i 
    continue
    }
    sub(/^[^=]*=/, x, $i)
    r = length(r) ? r OFS $i : $i
    }
    print r, u, $1 
    r = u = x
  }' OFS=, RS= infile

I assume that there is an error in your sample output: in the 3d record the uid should be petersa and not E09876543.

edited Jan 07 '14 at 13:07

answered Jan 07 '14 at 13:01

Dimitre Radoulov

27,252
4
40
48

score 0 · Answer 4 · answered Jan 08 '14 at 11:51

0

You might want look at some of the "already been there and done that" solutions to accomplish the task.

Apache Directory Studio for example, will do the LDAP query and save the file in CSV or XLS format.

-jim

answered Jan 08 '14 at 11:51

jwilleke

10,467
1
30
51

awk remove unwanted records and consolidate multiline fields to one line record in specific order

4 Answers4