5

I read the question: Compare consecutive rows in awk/(or python) and random select one of duplicate lines . Now I have some additional question: How should I change the code, if I want to do this comparison not only for the x-value, but also for the y-value or more columns? Maybe something like

if ($1 != prev) && ($2 != prev)  ???

In other words: I want to compare if the x-value AND the y-value of the current line is the same as the x-value AND the y-value of the next consecutive lines.

The data:

#x   y     z
1    1    11        
10   10   12       
10   10   17       
4    4    14
20   20   15        
20   88   16     
20   99   17
20   20   22
5    5    19
10   10   20

The output should look like:

#x   y     z
1    1    11        
10   10   17       
4    4    14
20   20   15        
20   88   16        
20   99   17    
20   20   22    
5    5    19
10   10   20

or (due to random selection)

#x   y     z
1    1    11        
10   10   12       
4    4    14
20   20   15        
20   88   16        
20   99   17    
20   20   22    
5    5    19
10   10   20

The code from the above link, that does the stuff for the x-values, but NOT for the y-values in an AND condition:

$ cat tst.awk
function prtBuf(        idx) {
    if (cnt > 0) {
        idx = int((rand() * cnt) + 1)
        print buf[idx]
    }
    cnt = 0
}

BEGIN { srand() }
$1 != prev { prtBuf() }
{ buf[++cnt]=$0; prev=$1 }
END { prtBuf() }
Cœur
  • 37,241
  • 25
  • 195
  • 267
Jojo
  • 75
  • 5

1 Answers1

2

This should do it:

function prtBuf(idx) {
    if (cnt > 0) {
        idx = int((rand() * cnt) + 1)
        print buf[idx]
    }
    cnt = 0
}

BEGIN { srand() }
$1 != prev1 || $2 != prev2 { prtBuf() }
{ buf[++cnt]=$0; prev1=$1; prev2=$2 }
END { prtBuf() }
Andrzej Pronobis
  • 33,828
  • 17
  • 76
  • 92
  • Yes, this does it! Good job! It is also easy to change, if somebody wants to do this comparsion for even more colums. Example for 3 colums: BEGIN { srand() } $1 != prev1 || $2 != prev2 || $3 != prev3 { prtBuf() } { buf[++cnt]=$0; prev1=$1; prev2=$2; prev3=$3 } END { prtBuf() } – Jojo Jul 23 '16 at 18:16