0

I'm looking at trying to strip out C comments from our patch files and have looked at numerous regexes, but if we remove lines from our patches - it would break them.

How would you write a regex or sed command to search diff patch files for comments and replace comment lines with blank spaces.

This works sed regex works for C files, but for patches I need something different:

sed '/^\/\*/,/\*\//d'

An example patch exerpt would be:

@@ -382,7 +391,109 @@
        return len;
 }

+/**********************************************************************************
+ * Some patch
+ * Author: Mcdoomington
+ * Do somethimg
+ * 
+ * fix me
+ **********************************************************************************/

Anyone have ideas?

Edit:

Using this filter:

sed -e 's,^+ \*.*,+ \/\/Comment removed,' mypatch.patch > output


+/**********************************************************************************
+ //Comment removed
+ //Comment removed
+ //Comment removed

How do I add a if line ends with \ to skip it?

Edit: Solution

While not the cleanest way - I used sed with a jury-rigged regex.

sed -e '/[^\*\/]$/{N;s,^+ \* .*,+ \* Comment removed,;}' patch > output
sed -e '/[^\*\/]$/{N;s,^+\\\* .*,+ \/\* Comment removed,;}' patch > output

Note the second command can be a bit too greedy, but for the purposes of sanitizing comments - this works!

How it works:

1.) First command To determine if this is the end of a comment /[^*/]$/ determines if it is / then {N;s,^+\* .,+ /* Comment removed,;}' finds +* (whatever) and replaces it with * Comment removed.

2.) Second command To determine if this is the end of a comment /[^*/]$/ determines if it is / then {N;s,^+\* .,+ /* Comment removed,;}' finds + * (whatever) and replaces it with * Comment removed.

mcdoomington
  • 518
  • 1
  • 6
  • 20
  • Are the patch files incoming, or are you generating them? Is it all comments that you want to move, or just block comments between functions? – Jonathan Leffler May 11 '12 at 17:15
  • The patch files have been already created and I was looking for an easy way to remove the comments or at least blank them out. Creating new patches is a huge task due to the staging environment and they number about 30+. – mcdoomington May 11 '12 at 17:27
  • 1
    There's a point where regexes simply aren't smart or flexible enough to do what you want; this is one of those cases. You need to be able to recognize `/*`, `*/`, and `//` tokens and parse the file accordingly. Personally, I'd just hand-hack my own filter for a job like this; shouldn't take more than a couple of hours. – John Bode May 11 '12 at 17:31

3 Answers3

1

Regular expressions are wonderful, but not that wonderful.

I would remove the comments before creating the patch.

If you can't do this, I would apply the patch. Remove the comments from both patched and unpatched files then re-create the patch.

So starting with x.h we edit it to x1.h and create a patch:

diff -u x.h x1.h > patch

Then we publish the patch to someone who has x.h.

cp x.h xnc.h
sed -e '/^\/\*/,/\*\//d' -i xnc.h
patch x.h patch
cp x.h xnc2.h
sed -e '/^\/\*/,/\*\//d' -i xnc2.h
diff -u xnc.h xnc2.h > patchnc

should create the comment-free patch.

But if I have patched and unpatched source trees, then

find unpatched -exec sed -e ':^/\*:,:\*/:d' -i "{}" \;
find patched -exec sed -e ':^/\*:,:\*/:d' -i "{}" \;
diff -urN unpatched patched > patch
Julian
  • 1,522
  • 11
  • 26
  • This is a bit of a time consuming process unfortunately, do you know of a way that could at least automate a good portion of this? – mcdoomington May 11 '12 at 17:28
  • you should be able to script it. Why not do it before making the patch? – Julian May 11 '12 at 17:32
  • 1
    @mcdoomington: Do you want all comments removed, or just the ones that appear in the patch? What do you want to do about a patch that only affects part of a block comment? There are C comment stripper programs (I have a couple that I wrote; I won't be the only person with such), and I'd be inclined to revise the patch generation process so that the old and new source don't have the unwanted comments before the patches are created. – Jonathan Leffler May 11 '12 at 17:35
  • I agree it is scriptable, and you can use numerous available tools to do so. The problem lays with the source files which are in a kernel src tarball then its a matter of working my way through them - something quick and dirty ;) Edit: The reason is that this is for a GPL release - someone has requested material and it has to be cleaned. – mcdoomington May 11 '12 at 17:37
  • So you just have to walk the two trees with your sed and then create the patch – Julian May 11 '12 at 17:47
1

I just used a quick and dirty hackjob that canned most of the comments using

sed -e '/[^\*\/]$/{N;s,^+ \* .*,+ \* Comment removed,;}' patch > output
sed -e '/[^\*\/]$/{N;s,^+\\\* .*,+ \/\* Comment removed,;}' patch > output
mcdoomington
  • 518
  • 1
  • 6
  • 20
0

I would not use regular expressions. In general they work within a line. And your file will hold comments which run over multiple lines.

I would write a simple parser in C/C++ or Java.

Start with state 0.

In state 0 just read character by character (and output it) until you find a sequence of /*

Then switch to state 1.

In state 1 just read character by character (and DO NOT output it) until you find a sequence of */

the.real.gruycho
  • 608
  • 3
  • 17
stefan bachert
  • 9,413
  • 4
  • 33
  • 40