2

I am trying to migrate data which consists of a lot of separate text files. One step is to delete all lines in the text files, which are not used anymore. The lines are key-value-pairs. I want to delete everything in a file except those lines with certain keys. I do not know the order of the keys inside of the file.

The keys I want to keep are e.g. version, date and number.

I found this question Remove all lines except matching pattern line best practice (sed) and tried the accepted answer. My sed command is

sed '/^(version=.*$)|(date=.*$)|(number=.*$)/!d' file.txt

with a !d after the address to delete all lines NOT matching the pattern.

Example of the regex: https://regex101.com/r/LKfxpP/2

but it keeps deleting all lines in my file. Where is my mistake? I assume I am wrong with my regex, but whats the error here?

htz
  • 1,037
  • 1
  • 13
  • 37

2 Answers2

1

You may use

sed '/^\(version\|date\|number\)=/!d' file.txt > newfile.txt

The BRE POSIX pattern here matches

  • ^ - start of a line
  • \(version\|date\|number\) - a group matching
    • version - a version string
    • \| - or
    • date - a date string
    • \| - or
    • number - a number string
  • = - a = char.

Or, use a POSIX ERE syntax enabled with -E option:

sed -E '/^(version|date|number)=/!d' file.txt > newfile.txt

Here, the alternation operator | and capturing parentheses do not need escaping.

See an online demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 1
    Thank you for your answer. The different kinds of pattern are the part I was missing. The POSIX ERE syntax is exactly what I can use in my case. – htz Nov 09 '18 at 13:54
1

Using awk:

awk -F= '$1 !~ /version|date|number/' file.txt

The field separator is set to = and the first field must not match the given string.

oliv
  • 12,690
  • 25
  • 45