-1

I have to search in a file with 300 000 lines of code with grep for several constructions.

First Question

I need to find codes like the first code below, i am looking for an alternating + - construction, the other characters are seen as delimiters, also +-+ or -+- is correct. We start searching after the ], like in the examples below

++[>++>+++>+<<<-]>++++++++.---.+.>.<------.+.>.>. ∈ γ, (correct, it is alternating)

++[>++>+++>+<<<-]>+++++.>++++++.>++.++++.-----.>. not ∈ γ (so incorrect +* is followed by +*)

Second Question

I need to find codes like the first code below, I am looking for an odd number of occurences of - between a consecutive pair of <> and an empty <> is seen as an even number so incorrect.

++[>++>+++>+<<<-]>+.>++++++++.<-.----.+++++++.>>. ∈ δ (correct, odd times - between <> )

++[>++>+++>+<<<-]>+++.>++++++.<<-.-.>>--.<---.>>. not ∈ δ (incorrect, even times - between <> )

Note, only grep is allowed, we may not use a texteditor, which I did first.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
fangio
  • 1,746
  • 5
  • 28
  • 52

2 Answers2

0

I got this:

sed 's/.*]//' file | tr -d "><" | tr -s "+-" | tr -d "." | egrep "\+\+|\-\-"

That does this:

  1. ignore everything before first closing square bracket

  2. delete all > and < since your description accords them no significance

  3. squeeze all + and - to single occurrence

  4. delete all dots

  5. looks for either ++ or -- in what's left

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • that is indeed correct, but I may only use egrep. I had the same for this one, but it was incorrect because sed,tr,tr is also used besides egrep. – fangio Jan 03 '15 at 17:31
0

For both regex, use egrep with Perl option if available.
And you can probably remove the \r\n from the classes if you expect a single line.

Question 1:

 #  \][^-+\]\r\n]*(?:[-]+[^-+\]\r\n]*)?[+]+[^-+\]\r\n]*[-]+(?:[^-+\r\n]*[+]+[^-+\]\r\n]*[-]*)*[^-+\]\r\n]*$

 \]                                 # ]

 [^-+\]\r\n]*                       # Not - + ] or newline

 (?: [-]+  [^-+\]\r\n]* )?          # Optional - .

 [+]+ [^-+\]\r\n]* [-]+             # Required + . -

 (?:
      [^-+\r\n]* 
      [+]+ [^-+\]\r\n]* [-]*        # Optional +
 )*

 [^-+\]\r\n]*                       # Not - + ] or newline
 $

Question 2:

 # ^(?![^\r\n]*<(?:[^-<>\r\n]*[-][^-<>\r\n]*[-])*[^-<>\r\n]*>)[^\r\n]*<[^-<>\r\n]*[-](?:[^-<>\r\n]*[-][^-<>\r\n]*[-])*[^-<>\r\n]*>

 ^ 
 (?!               # Not an even sequence
      [^\r\n]* 
      <      
      (?:
           [^-<>\r\n]* 
           [-] 
           [^-<>\r\n]* 
           [-] 
      )*
      [^-<>\r\n]* 
      >
 )

 [^\r\n]*     

 <                 # First odd sequence
 [^-<>\r\n]* 
 [-] 
 (?:
      [^-<>\r\n]* 
      [-] 
      [^-<>\r\n]* 
      [-] 
 )*
 [^-<>\r\n]* 
 >