Deleting multiple lines from recursive pcregrep search

Question

In the header of every file in my project I have a block of comments that were generated by VCC with revision history for the file. We have moved away from VCC and no longer want to have the revision history in the file since it is obsolete.

I currently have a search pcregrep search that returns the exact results I'm looking for:

pcregrep -rM '(\$Rev)(?s)(.+?(?=\*\*))' *

I have tried piping the results into an xargs sed along with many other attempts to remove all the returned lines from the files but I get various errors including "File name too long"

I want to delete the entire block

Show sample input and your desired output for that sample input. — Cyrus, May 19 '15 at 19:24
I don't know sed/awk/grep etc.. But why don't you just use a simple prog via a macro lang or C# that you can just plop down a regex find/replace? This `I have a block of comments` says it all. That's the foundation. You have to remove the entire block, not just lines containing the content of interest. If you do that, you end up deleting the trailing `*/` part, leaving it naked forcing all the following code to be commented until the next `*/` it finds. — , May 19 '15 at 23:22

score 0 · Answer 1 · 2015-05-20T00:46:19.993

Since you are talking about C++ files, you can't just find comments,
you have to parse comments because literal strings could contain comment
delimiters.

This has been done before, no use re-inventing the wheel.
A simple grep is not going to cut it. You need a simple macro or C# console app
that has better capabilities.

If you want to go this route, below is a regex for you.
Each match will either match group 1 (comments block) or group 2 (non-comments).

You need to either rewrite a new string, via appending the results of each match.
Or, use a callback function to do the replacement.

Each time it matches group 2 just append that (or return it if callback) unchanged.

When it matches group 1, you have to run another regular expression on the
contents to see if the comment block contains the Revision information.
If it does contain it, don't append (or return "" if callback) its contents.
If it doesn't, just append it unchanged.

So, its a 2 step process.

Pseudo-code:

// here read in the source sting from file.
string strSrc = ReadFile( name );
string strNew = "";
Matcher CmtMatch, RevMatch;

while ( GloballyFind( CommentRegex, strSrc, CmtMatch ) )  
{
   if ( CmtMatch.matched(1) )
   {
       string strComment = Match.value(1);
       if ( FindFirst( RevisionRegex, strComment, RevMatch ) )
           continue;
        else
            strNew += strComment;
    }
    else
       strNew += Match.value(2);
 }
 // here write out the new string.

The same could be done via a ReplaceAll() using a callback function, if
using a Macro language. The logic goes in the callback.

Its not as hard as it looks, but if you want to do it right I'd do it this way.
And then, hey, you got a nifty utility to be used again.

Here is the regex expanded=, formatted and compressed.
(constructed with RegexFormat 6 (Unicode))

 # raw:  ((?:(?:^\h*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:\h*\n(?=\h*(?:\n|/\*|//)))?|//(?:[^\\]|\\\n?)*?(?:\n(?=\h*(?:\n|/\*|//))|(?=\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\\s]*)
 # delimited:  /((?:(?:^\h*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:\h*\n(?=\h*(?:\n|\/\*|\/\/)))?|\/\/(?:[^\\]|\\\n?)*?(?:\n(?=\h*(?:\n|\/\*|\/\/))|(?=\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^\/"'\\\s]*)/     
 # Dbl-quoted: "((?:(?:^\\h*)?(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/(?:\\h*\\n(?=\\h*(?:\\n|/\\*|//)))?|//(?:[^\\\\]|\\\\\\n?)*?(?:\\n(?=\\h*(?:\\n|/\\*|//))|(?=\\n))))+)|(\"(?:\\\\[\\S\\s]|[^\"\\\\])*\"|'(?:\\\\[\\S\\s]|[^'\\\\])*'|[\\S\\s][^/\"'\\\\\\s]*)"     
 # Sing-quoted: '((?:(?:^\h*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:\h*\n(?=\h*(?:\n|/\*|//)))?|//(?:[^\\\]|\\\\\n?)*?(?:\n(?=\h*(?:\n|/\*|//))|(?=\n))))+)|("(?:\\\[\S\s]|[^"\\\])*"|\'(?:\\\[\S\s]|[^\'\\\])*\'|[\S\s][^/"\'\\\\\s]*)'


    (                                # (1 start), Comments 
         (?:
              (?: ^ \h* )?                     # <- To preserve formatting
              (?:
                   /\*                              # Start /* .. */ comment
                   [^*]* \*+
                   (?: [^/*] [^*]* \*+ )*
                   /                                # End /* .. */ comment
                   (?:                              # <- To preserve formatting 
                        \h* \n                                      
                        (?=
                             \h*                  
                             (?: \n | /\* | // )
                        )
                   )?
                |  
                   //                               # Start // comment
                   (?: [^\\] | \\ \n? )*?           # Possible line-continuation
                   (?:                              # End // comment
                        \n                               
                        (?=                              # <- To preserve formatting
                             \h*                          
                             (?: \n | /\* | // )
                        )
                     |  (?= \n )
                   )
              )
         )+                               # Grab multiple comment blocks if need be
    )                                # (1 end)

 |                                 ## OR

    (                                # (2 start), Non - comments 
         "
         (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
         "
      |  '
         (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
         ' 
      |  [\S\s]                           # Any other char
         [^/"'\\\s]*                      # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)
    )                                # (2 end)

Incase you want something simpler -
This is the same regex without multiple comment block capture or format preserving. Same grouping and replacement principle applies.

 # Raw:  (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)


     (                                # (1 start), Comments 
          /\*                              # Start /* .. */ comment
          [^*]* \*+
          (?: [^/*] [^*]* \*+ )*
          /                                # End /* .. */ comment
       |  
          //                               # Start // comment
          (?: [^\\] | \\ \n? )*?           # Possible line-continuation
          \n                               # End // comment
     )                                # (1 end)
  |  
     (                                # (2 start), Non - comments 
          "
          (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
          "
       |  '
          (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
          ' 
       |  [\S\s]                           # Any other char
          [^/"'\\]*                        # Chars which doesn't start a comment, string, escape,
                                           # or line continuation (escape + newline)
     )                                # (2 end)

Deleting multiple lines from recursive pcregrep search

1 Answers1