Count total number of pattern between two pattern (using sed if possible) in Linux

Question

I have to count all '=' between two pattern i.e '{' and '}' Sample:

{
100="1";
101="2";
102="3";
}; 
{
104="1,2,3";
};
{
105="1,2,3";
};

Expected Output:

3
1
1

Can there be more than one group of braces per line? If so, what would the expected output be? — Benjamin W., Jan 01 '16 at 23:06

glenn jackman · Accepted Answer · 2016-01-02T13:37:49.520

2

A very cryptic perl answer:

perl -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ge'

The tr function returns the number of characters transliterated.

With the new requirements, we can make a couple of small changes:

perl -0777 -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ges'

-0777 reads the entire file/stream into a single string
the s flag to the s/// function allows . to handle newlines like a plain character.

edited Jan 02 '16 at 13:37

answered Jan 01 '16 at 23:15

glenn jackman

238,783
38
220
352

Look out for `=` inside the quotes: `{105="1,2=,3";}`. – Walter A Jan 02 '16 at 00:09
That's not part of the requirements: "count all '=' between '{' and '}'" – glenn jackman Jan 02 '16 at 00:27
Thanks @glennjackman, i am not getting expected output if a new line comes between these pattern. I have updated my question . – Shashwat Shekhar Shukla Jan 02 '16 at 09:19
i add one extra pipe and it worked thanks tr -d '\n'|perl -nE 's/\{(.*?)\}/ say ($1 =~ tr{=}{=}) /ge' – Shashwat Shekhar Shukla Jan 02 '16 at 10:48

choroba · Answer 2 · 2016-01-01T23:19:09.713

Perl to the rescue:

perl -lne '$c = 0; $c += ("$1" =~ tr/=//) while /\{(.*?)\}/g; print $c' < input

-n reads the input line by line
-l adds a newline to each print
/\{(.*?)\}/g is a regular expression. The ? makes the asterisk frugal, i.e. matching the shortest possible string.
The (...) parentheses create a capture group, refered to as $1.
tr is normally used to transliterate (i.e. replace one character by another), but here it just counts the number of equal signs.
+= adds the number to $c.

repzero · Answer 3 · 2016-01-02T00:11:16.743

0

Awk is here too

grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'

example

echo '{100="1";101="2";102="3";}; 
{104="1,2,3";};
{105="1,2,3";};'|grep -o '{[^}]\+}'|awk -v FS='=' '{print NF-1}'

output

3
1
1

edited Jan 02 '16 at 00:11

answered Jan 01 '16 at 23:56

repzero

8,254
2
18
40

1

How about `= {100="x=x=x";} =` ? I think this is 1 pattern. – Walter A Jan 02 '16 at 00:01
thanks @repzero but it seems -v is not valid command in mac. Error: awk: invalid -v option – Shashwat Shekhar Shukla Jan 02 '16 at 09:24

score 0 · Answer 4 · answered Jan 02 '16 at 09:41

First some test input (a line with a = outside the curly brackets and inside the content, one without brackets and one with only 2 brackets)

echo '== {100="1";101="2";102="3=3=3=3";} =; 
a=b
{c=d}
{}'

Handle line without brackets (put a dummy char so you will not end up with an empty string)

sed -e 's/^[^{]*$/x/'

Handle line without equal sign (put a dummy char so you will not end up with an empty string)

sed -e 's/{[^=]*}/x/'

Remove stuff outside the brackets

sed -e 's/.*{\(.*\)}/\1/'

Remove stuff inside the double quotes (do not count fields there)

sed -e 's/"[^"]*"//g'

Use @repzero method to count equal signs

awk -F "=" '{print NF-1}'

Combine stuff

echo -e '{100="1";101="2";102="3";};\na=b\n{c=d}\n{}' | 
   sed -e 's/^[^{]*$/x/' -e 's/{[^=]*}/x/' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' | 
   awk -F "=" '{print NF-1}'

The ugly temp fields x and replacing {} can be solved inside awk:

echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' | 
   sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' | 
   awk -F "=" '{if (NF>0) c=NF-1; else c=0; print c}'

or shorter

echo -e '= {100="1";101="2=2=2=2";102="3";};\na=b\n{c=d}\n{}' |
   sed -e 's/^[^{]*$//' -e 's/.*{\(.*\)}/\1/' -e 's/"[^"]*"//g' |
   awk -F "=" '{print (NF>0) ? NF-1 : 0; }'

ekim · Answer 5 · 2021-10-31T20:46:55.550

No harder sed than done ... in.

Restricting this answer to the environment as tagged, namely:
linux shell unix sed wc
will actually not require the use of wc (or awk, perl, or any other app.).

Though echo is used, a file source can easily exclude its use.
As for bash, it is the shell.

The actual environment used is documented at the end.

NB. Exploitation of GNU specific extensions has been used for brevity
    but appropriately annotated to make a more generic implementation.
    Also brace bracketed { text } will not include braces in the text.
    It is implicit that such braces should be present as {} pairs but
    the text src. dangling brace does not directly violate this tenet.
This is a foray into the world of `sed`'ng to gain some fluency in it's use for other purposes.
The ideas expounded upon here are used to cross pollinate another SO problem solution in order
to aquire more familiarity with vetting vagaries of vernacular version variances. Consequently
this pedantic exercice hopefully helps with the pedagogy of others beyond personal edification.

To test easily, at least in the environment noted below, judiciously highlight the appropriate 
code section, carefully excluding a dangling pipe |, and then, to a CLI command line interface
drag & drop, copy & paste or use middle click to enter the code.

The other SO problem. linux - Is it possible to do simple arithmetic in sed addresses?

# _______________________________ always needed ________________________________

echo -e '\n
\n   = = = {\n              }  = = =                each = is outside the braces
\na\nb\n   {                }                       so therefore are not counted
\nc\n      { = = = = = = =  }                       while the ones here do count
           {\n100="1";\n101="2";\n102="3";\n};                  
\n         {\n104="1,2,3";\n};                                  
a\nb\nc\n  {\n105="1,2,3";\n};                                  
           {   dangling brace ignored junk =  = = \n'  |

# _____________ prepatory conditioning needed for final solutions _____________

sed                ' s/{/\n{\n/g;                               
                     s/}/\n}\n/g; '       |  # guarantee but one brace to a line

sed -n '/{/               h;                 # so sed addressing can "work" here
        /{/,/}/           H;                 # use hHold buffer for only { ... }
            /}/   { x; s/[^=]*//g; p } '  |  # then make each {} set a line of =
# ____ stop code hi-lite selection in  ^--^ here  include quote not pipe ____

# ____ outputs the following exclusive of the shell " # " comment quotes _____
   # 
   # 
   # =======
   # ===
   # =
   # =
# _________________________________________________________________________

# ____________________________ "simple" GNU solution ____________________________

sed -e '/^$/  { s//0/;b };                    # handle null data as 0 case: next!
                s/=/\n/g;                     # to easily count an = make it a nl
                s/\n$//g;                     # echo adds an extra nl - delete it
                s/.*/echo "&" | sed -n $=/;   # sed = command w/ $ counts last nl
                e '                           # who knew only GNU say you ah phoo

   # 0
   # 0
   # 7
   # 3
   # 1
   # 1
# _________________________________________________________________________

# ________________________ generic incomplete "solution" ________________________

sed -e '/^$/ { s//echo 0/;b };                # handle null data as 0 case: next!
               s/=$//g;                       # echo adds an extra nl - delete it
               s/=/\\\\n/g;                   # to easily count an = make it a nl
               s/.*/echo -e & | sed -n $=/; '                 

# _______________________________________________________________________________

The paradigm used for the algorithm is instigated by the prolegomena study below.
The idea is to isolate groups of = signs between { } braces for counting.
These are found and each group is put on a separate line with ALL other adorning characters removed.
It is noted that sed can easily "count", actually enumerate, nl or \n line ends via =.

The first "solution" uses these sed commands:

print
branch w/o label starts a new cycle
h/Hold for filling this sed buffer
exchanage to swap the hold and pattern buffers
= to enumerate the current sed input line
substitute s/.../.../; with global flag s/.../.../g;

and most particularly the GNU specific

evaluate (execute can not remember the actual mnemonic but irrelevantly synonymous)

The GNU specific execute command is avoided in the generic code. It does not print the answer but instead produces code that will print the answer. Run it to observe. To fully automate this, many mechanisms can be used not the least of which is the sed write command to put these lines in a shell file to be excuted or even embed the output in bash evaluation parentheses $( ) etc.

Note also that various sed example scripts can "count" and these too can be used efficaciously.

The interested reader can entertain these other pursuits.

prolegomena: concept from counting # of lines between braces

sed -n '/{/=;/}/=;'

to

sed -n  '/}/=;/{/=;'              | 
sed -n  'h;n;G;s/\n/ - /;
         2s/^/ Between sets of {} \n the nl # count is\n     /;
         2!s/^/     /;
         p'

testing "done in":

linuxuser@ubuntu:~$  lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.2 LTS
Release:        18.04
Codename:       bionic

linuxuser@ubuntu:~$  sed --version   ----->  sed (GNU sed) 4.4

score -2 · Answer 6 · answered Oct 29 '21 at 21:34

-2

And for giggles an awk-only alternative:

echo '{
> 100="1";
> 101="2";
> 102="3";
> }; 
> {
> 104="1,2,3";
> };
> {
> 105="1,2,3";
> };' | awk 'BEGIN{RS="\n};";FS="\n"}{c=gsub(/=/,""); if(NF>2){print c}}'
3
1
1

answered Oct 29 '21 at 21:34

tink

14,342
4
46
50

Count total number of pattern between two pattern (using sed if possible) in Linux

6 Answers6

Linked