How to extract indexes from strings?

Question

My file contains data as indicated below:

{ "any1", "aaa.bbb.ccc.1.ddd", "var1" }
{ "any2", "aaa.bbb.ccc.1.eee", "toto" }
{ "an42", "aaa.bbb.ccc.1.fff", "titi" }
{ "an47", "aaa.bbb.ccc.2.eee", "var3" }
{ "any7", "aaa.bbb.ccc.2.ddd", "var12" }
{ "a789", "aaa.bbb.ccc.2.fff", "var14" }
{ "any1", "xxx.yyy.zzz.1.ddd", "var1" }
{ "any2", "xxx.yyy.zzz.1.eee", "toto" }
{ "an42", "xxx.yyy.zzz.1.fff", "titi" }

I want to extract the all indexes of the prefix "aaa.bbb.ccc"

So the command should return

linux# command
1
2

How I can make that with sed, awk, grep, sort?

do you also need to check the leading double quote or just `aaa.bbb.ccc.`? — fedorqui, Apr 30 '15 at 09:23

fedorqui · Accepted Answer · 2015-04-30T09:32:48.003

4

You can for example say:

$ grep -Po '(?<=aaa\.bbb\.ccc\.)\d*' file | sort -u
1
2

Step by step

Get the digit after aaa\.bbb\.ccc\. (note we escape the dots to match the dot itself, not any character):

$ grep -Po '(?<=aaa\.bbb\.ccc\.)\d*' file
1
1
1
2
2
2

sort them and find the unique values:

$ grep -Po '(?<=aaa\.bbb\.ccc\.)\d*' file | sort -u
1
2

Alternative with `sed`

If you don't have the -P option in your grep, you can use sed:

$ sed -nr 's/^.*aaa\.bbb\.ccc\.([0-9]+).*$/\1/p' file
1
1
1
2
2
2
$ sed -nr 's/^.*aaa\.bbb\.ccc\.([0-9]+).*$/\1/p' file | sort -u
1
2

edited Apr 30 '15 at 09:32

answered Apr 30 '15 at 09:17

fedorqui

275,237
103
548
598

1

`(?<=aaa\.bbb\.ccc\.)` ? – Kent Apr 30 '15 at 09:17
grep: The -P option is not supported: libpcre.so.3 is not available – MOHAMED Apr 30 '15 at 09:24
1

Thanks a lot. it works with sed and I used `sort -u` instead of `sort | uniq` – MOHAMED Apr 30 '15 at 09:30
@MOHAMED that is a good addition. I just edited the answer to use `sort -u`, thanks! – fedorqui Apr 30 '15 at 09:33
Now if the "aaa.bbb.ccc" is defined in variable. `var="aaa.bbb.ccc"`, what should be the `sed` command with the `var` – MOHAMED Apr 30 '15 at 09:33
if we use `sed -nr "s/^.*$var([0-9]+).*$/\1/p"`, does the `'.'` (of the `var`) are treated as real `'.'`? or they are treated as any char regexp? – MOHAMED Apr 30 '15 at 09:39
@MOHAMED why don't you do a test with some data? I did and yes, you have to escape them: `var="aaa\.bbb\.ccc\."`. – fedorqui Apr 30 '15 at 10:08

score 0 · Answer 2 · answered Apr 30 '15 at 09:38

0

sed -n '/.*aaa\.bbb\.ccc\.\([0-9]\{1,\}\).*/ {s//\1/;H;}
   $!d
   s/.*//;H;x
:a
   s/\(\n[^[:cntrl:]]*\)\(.*\)\1\n/\1\2\
/
   ta
   s/.\(.*\)./\1/p' YourFile

for fun and in 1 (posix) sed, not sorted. (GNU sed allow a online version)

answered Apr 30 '15 at 09:38

NeronLeVelu

9,908
1
23
43

Juan Diego Godoy Robles · Answer 3 · 2015-05-04T07:13:23.127

0

An awk alternative:

$ awk -F\. '/aaa.bbb.ccc.[0-9]+/{b=$(NF-1);if (!(b in a)){ print b}a[b]++}' infile

Steps:

Set the FS separator to dot
Look for the wanted pattern
Store index value in b variable
Use an associative array a to mark the printed index keys.
If b not in a print the index ( key of a )

edited May 04 '15 at 07:13

answered Apr 30 '15 at 09:38

Juan Diego Godoy Robles

14,447
2
38
52

How to extract indexes from strings?

3 Answers3

Step by step

Alternative with sed

Alternative with `sed`