3

INPUT FILE :

5,,OR1,1000,Nawras,OR,20160105T05:30:17+0400,20181231T23:59:59+0400,,user,,aaa8016058f008ddceae6329f0c5d551,50293277591,,,30001,C
5,,OR1,1000,Nawras,OR,20160105T05:30:17+0400,20181231T23:59:59+0400,20160217T01:45:18+0400,,user,aaa8016058f008ddceae6329f0c5d551,50293277591,,,30001,H
5,,OR2,2000,Nawras,OR,20160216T06:30:18+0400,20191231T23:59:59+0400,,user,,f660818af5625b3be61fe12489689601,50328589469,,,30002,C
5,,OR2,2000,Nawras,OR,20160216T06:30:18+0400,20191231T23:59:59+0400,20160216T06:30:18+0400,,user,f660818af5625b3be61fe12489689601,50328589469,,,30002,H
5,,OR1,1000,Nawras,OR,20150328T03:00:13+0400,20171230T23:59:59+0400,,user,,22bf18b024e1d4f42ac79943062cf576,50212935879,,,10001,C
5,,OR1,1000,Nawras,OR,20150328T03:00:13+0400,20171230T23:59:59+0400,20150328T03:00:13+0400,,user,22bf18b024e1d4f42ac79943062cf576,50212935879,,,10001,H
0,,OR5,5000,Nawras,OR,20160421T02:45:16+0400,20191231T23:59:59+0400,,user,,c7c501ac92d85a04bb26c575929e9317,50329769192,,,11001,C
0,,OR5,5000,Nawras,OR,20160421T02:45:16+0400,20191231T23:59:59+0400,20160421T02:45:16+0400,,user,c7c501ac92d85a04bb26c575929e9317,50329769192,,,11001,H
0,,OR1,1000,Nawras,OR,20160330T02:00:14+0400,20181231T23:59:59+0400,,user,,d4ea749306717ec5201d264fc8044201,50285524333,,,11001,C

DESIRED OUTPUT :

5,,OR1,1000,UY,OR,20160105T05:30:17+0400,20181231T23:59:59+0400,20160217T01:45:18+0400,,user,aaa8016058f008ddceae6329f0c5d551,50293277591,,,30001,H 
5,,OR2,2000,UY,OR,20160216T06:30:18+0400,20191231T23:59:59+0400,20160216T06:30:18+0400,,user,f660818af5625b3be61fe12489689601,50328589469,,,30002,H    
5,,OR1,1000,UY,OR,20150328T03:00:13+0400,20171230T23:59:59+0400,20150328T03:00:13+0400,,user,22bf18b024e1d4f42ac79943062cf576,50212935879,,,10001,H    
0,,OR5,5000,UY,OR,20160421T02:45:16+0400,20191231T23:59:59+0400,20160421T02:45:16+0400,,user,c7c501ac92d85a04bb26c575929e9317,50329769192,,,11001,H
0,,OR1,1000,UY,OR,20160330T02:00:14+0400,20181231T23:59:59+0400,,user,,d4ea749306717ec5201d264fc8044201,50285524333,,,11001,C*

CODE USED :

for i in `cat file | awk -F, '{print $13}' | sort | uniq`
do
grep $i file | tail -1 >> TESTINGGGGGGG_SV
done

This took much time as the file has 300 million records and which has 65 million uniq records at 13th column .

So i would require a output which can traverse 13th column value - last occurrence in file as the output .

choroba
  • 231,213
  • 25
  • 204
  • 289
Govins
  • 41
  • 1
  • 6

2 Answers2

4

awk to the rescue!

awk -F, 'p!=$13 && p0 {print p0} {p=$13; p0=$0} END{print p0}' file

expects sorted input.

Please post the timing if you can successfully run the script.

If sorting is not possible, another option is

tac file | awk -F, '!a[$13]++' | tac

reverse the file, take the first entry for $13 and reverse the results back.

karakfa
  • 66,216
  • 7
  • 41
  • 56
0

Here's a solution that should work:

awk -F, '{rows[$13]=$0} END {for (i in rows) print rows[i]}' file

Explanation:

  • rows is an associative array indexed by field 13 $13, the element of the array indexed by $13 gets overwritten every time there's a duplicate of field 13; its value is the whole line $0.

But this is inefficient in terms of memory because of the space needed to save the array.

An improvement to the above solution that's still not using sorting is to just save the line numbers in the associative array:

awk -F, '{rows[$13]=NR}END {for(i in rows) print rows[i]}' file|while read lN; do sed "${lN}q;d" file; done

Explanation:

  • rows as before but the values are the line numbers and not the whole lines
  • awk -F, '{rows[$13]=NR}END {for(i in rows) print rows[i]}' file outputs a list of row numbers containing the sought lines
  • sed "${lN}q;d" fetches line number lN from file
user2314737
  • 27,088
  • 20
  • 102
  • 114
  • 1
    Have you though about how much memory your program will use? 65 million unique records. If each record is 50 bytes it will become around 3 GB of raw data, without counting what AWK needs to keep the array structured. Calculate it yourself `perl -le 'print 65_000_000 * 50 / 1024 / 1024 / 1024'` – Andreas Louv Jun 10 '16 at 14:36