uniq
is a tool that enables once to filter lines in a file such that only unique lines are shown. uniq
has some support to specify when two lines are "equivalent", but the options are limited.
I'm looking for a tool/extension on uniq
that allows one to enter a regex. If the captured group is the same for two lines, then the two lines are considered "equivalent". Only the "first match" is returned for each equivalence class.
Example:
file.dat
:
foo!bar!baz
!baz!quix
!bar!foobar
ID!baz!
Using grep -P '(!\w+!)' -o
, one can extract the "unique parts":
!bar!
!baz!
!bar!
!baz!
This means that the first line is considered to be "equivalent" with the third and the second with the fourth. Thus only the first and the second are printed (the third and fourth are ignored).
Then uniq '(!\w+!)' < file.dat
should return:
foo!bar!baz
!baz!quix