0

I have a huge text file of the following format. I want to manipulate this file to fetch the number of occurrence of the department field. Each section has a field called department: As a result of my program, I need a CSV file of as mentioned in the Expected output section. I appreciate if the solution uses sed or head/tail or awk. The file is really huge. I have about 50,000+ lines of code. So an effective method is much appreciated.

Input format:


# Person1 Perosn2, AADDC Users, dummydata.somecompany.com
dn: CN=Person1 Perosn2,OU=AADDC Users,DC=dummydata,DC=somecompany,DC=com
objectClass: top
department: 234ABC
name: Person1 Perosn2
objectGUID:: MbCDVZpKbEWRxDUA5iN5IA==
userPrincipalName: abcdef@dummydata.somecompany.com
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=dummydata,DC=somecompany
 ,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 132173602593105876
preferredLanguage: en-US
msDS-AzureADMailNickname: abcdef


# Person1 Perosn2, AADDC Users, dummydata.somecompany.com
dn: CN=Person1 Perosn2,OU=AADDC Users,DC=dummydata,DC=somecompany,DC=com
objectClass: top
department: 234ABC
name: Person1 Perosn2
objectGUID:: MbCDVZpKbEWRxDUA5iN5IA==
userPrincipalName: abcdef@dummydata.somecompany.com
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=dummydata,DC=somecompany
 ,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 132173602593105876
preferredLanguage: en-US
msDS-AzureADMailNickname: abcdef

# Person3 Perosn4, AADDC Users, dummydata.somecompany.com
dn: CN=Person1 Perosn2,OU=AADDC Users,DC=dummydata,DC=somecompany,DC=com
objectClass: top
department: XYZ012
name: Person1 Perosn2
objectGUID:: MbCDVZpKbEWRxDUA5iN5IA==
userPrincipalName: abcdef@dummydata.somecompany.com
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=dummydata,DC=somecompany
 ,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 132173602593105876
preferredLanguage: en-US
msDS-AzureADMailNickname: abcdef


Expected output

234ABC,2
XYZ012,1

what I did:

I used this command to grep the file. grep '^department: *' file.txt

But I am not sure if there is a way to get the expected output using single commands like sed, grep etc.

2 Answers2

0

Could you please try following.

awk '
BEGIN{
  OFS=","
}
{
  gsub(/\r/,"")
}
/department:/{
  string=$NF
  sub(/ +$/,"",string)
  if(!a[string]++){
    b[++count]=string
  }
  ++val[string]
}
END{
  for(i=1;i<=count;i++){
    print b[i],val[b[i]]
  }
}
'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

This might work for you (GNU sed):

sed -En 's/^department: //;T;G;/^(\S+\n)(\S+\n)*\1/!P;h' file

Ignore lines that do not begin department:. Store the remainder of the line in the hold space and if it is unique to other lines in the hold space, print it.

potong
  • 55,640
  • 6
  • 51
  • 83