-2

My file contains data as indicated below:

{ "any1", "aaa.bbb.ccc.1.ddd", "var1" }
{ "any2", "aaa.bbb.ccc.1.eee", "toto" }
{ "an42", "aaa.bbb.ccc.1.fff", "titi" }
{ "an47", "aaa.bbb.ccc.2.eee", "var3" }
{ "any7", "aaa.bbb.ccc.2.ddd", "var12" }
{ "a789", "aaa.bbb.ccc.2.fff", "var14" }
{ "any1", "xxx.yyy.zzz.1.ddd", "var1" }
{ "any2", "xxx.yyy.zzz.1.eee", "toto" }
{ "an42", "xxx.yyy.zzz.1.fff", "titi" }

I want to extract the all indexes of the prefix "aaa.bbb.ccc"

So the command should return

linux# command
1
2

How I can make that with sed, awk, grep, sort?

fedorqui
  • 275,237
  • 103
  • 548
  • 598
MOHAMED
  • 41,599
  • 58
  • 163
  • 268

3 Answers3

4

You can for example say:

$ grep -Po '(?<=aaa\.bbb\.ccc\.)\d*' file | sort -u
1
2

Step by step

Get the digit after aaa\.bbb\.ccc\. (note we escape the dots to match the dot itself, not any character):

$ grep -Po '(?<=aaa\.bbb\.ccc\.)\d*' file
1
1
1
2
2
2

sort them and find the unique values:

$ grep -Po '(?<=aaa\.bbb\.ccc\.)\d*' file | sort -u
1
2

Alternative with sed

If you don't have the -P option in your grep, you can use sed:

$ sed -nr 's/^.*aaa\.bbb\.ccc\.([0-9]+).*$/\1/p' file
1
1
1
2
2
2
$ sed -nr 's/^.*aaa\.bbb\.ccc\.([0-9]+).*$/\1/p' file | sort -u
1
2
fedorqui
  • 275,237
  • 103
  • 548
  • 598
0
sed -n '/.*aaa\.bbb\.ccc\.\([0-9]\{1,\}\).*/ {s//\1/;H;}
   $!d
   s/.*//;H;x
:a
   s/\(\n[^[:cntrl:]]*\)\(.*\)\1\n/\1\2\
/
   ta
   s/.\(.*\)./\1/p' YourFile

for fun and in 1 (posix) sed, not sorted. (GNU sed allow a online version)

NeronLeVelu
  • 9,908
  • 1
  • 23
  • 43
0

An awk alternative:

$ awk -F\. '/aaa.bbb.ccc.[0-9]+/{b=$(NF-1);if (!(b in a)){ print b}a[b]++}' infile

Steps:

  1. Set the FS separator to dot
  2. Look for the wanted pattern
  3. Store index value in b variable
  4. Use an associative array a to mark the printed index keys.
  5. If b not in a print the index ( key of a )
Juan Diego Godoy Robles
  • 14,447
  • 2
  • 38
  • 52