groovy: How to remove lines from a file that begin with a non unique string

Question

*** Edited example to show order is not a factor

I have a file with the content:

ABC-123 BLA bla</br>
ABC-123 lala lala</br>
ABC-234 AAA</br>
ABC-123 CCC</br>
ABC-567 ddd</br>
ABC-234 BBB</br></br>

I would like to remove the lines that have a preceding line that begins with the same string and end up with a file containing (or a String containing):

ABC-123 BLA bla 
ABC-234 AAA
ABC-567 ddd

Currently my code just saves the contents of the file into a string:

if (new File('description.txt').length() > 0 ) {
    description = new File('description.txt').text
}

I'd like to either update the file or save the first 'non unique' lines plus unique lines in the description string.

score 0 · Answer 1 · answered Mar 17 '18 at 04:08

0

Groovy uses the java regex engine, but my example on regex101 is almost identical to what you'll need - possibly just using double backslashes instead of single. You'll also need to use a multi-line support modifier (?m) at the beginning.

/(^\S+ )([^\n]*\n)\1([^\n]*\n)/

Explanation:

(^\S+ ) matches non-whitespaces at the beginning of a line, followed by a space. It's captured as group #1

([^\n]*\n) matches everything else up to a newline

\1 is a backreference to match whatever matched in group #1

([^\n]*\n) again the rest of the second line

Then you replace what was matched with $1$2, the first and second capture groups.

answered Mar 17 '18 at 04:08

Brian Stephens

5,161
19
25

Although it works for the provided example, It wouldn't work for cases like [THIS](https://regex101.com/r/SGTgBW/3) and [THIS](https://regex101.com/r/SGTgBW/4) – Gurmanjot Singh Mar 17 '18 at 07:07
@Gurman: you're right. I was interpreting "preceding line" as "immediately preceding line". Depending on the desired outcome, you could always sort the lines first, then run through multiple iterations of the regex until they're unique. But then there's not much point in using regex in the first place. And if the original order matters, I don't think this can be done with regex. – Brian Stephens Mar 17 '18 at 14:23

groovy: How to remove lines from a file that begin with a non unique string

1 Answers1