How to remove multi-line blocks of text of varying sizes from a file given the first and last lines and a substring?

Question

I have an xml file listing several games and their metadata, like so:

<?xml version="1.0"?>
<gameList>
    <game>
        <path>./Besiege.desktop</path>
        <name>Besiege</name>
        <desc>Long description of game</desc>
        <releasedate>20150128T000000</releasedate>
        <developer>Spiderling Studios</developer>
        <publisher>Spiderling Studios</publisher>
        <genre>Strategy</genre>
        <players>1</players>
    </game>
<A bunch of other entries>
    <game>
        <path>./67000.The Polynomial.txt</path>
        <name>The Polynomial - Space of the music</name>
        <desc>Long description of game</desc>
        <releasedate>20101015T000000</releasedate>
        <developer>Dmytry Lavrov</developer>
        <publisher>Dmitriy Uvarov</publisher>
        <genre>Shooter, Music</genre>
        <players>1</players>
        <favorite>true</favorite>
    </game>
<Another bunch of entries>
</gameList>

I want to remove every entry that contains the substring ".desktop" and leave all the rest. But just removing the line which contains this string isn't enough, I want to remove the whole block from <game> to </game>.

I know that in Linux, with bash, there are several ways to remove a fixed number of lines before or after a given string. But by comparing the two entries above, you can see that they don't always have the same number of fields. The descriptions inside the "<desc>" tags also vary from one to four paragraphs separated by empty lines. I have not found any solutions that deal with a variable number of lines around a target substring.

I thought there would be an easy way to split the text into blocks from the opening <game> tag to the closing </game> tag so that I could operate on them in a similar way to how one normally does with lines, in which case a simple while loop that tested for the presence of the substring and deleted the block if true, or something similar, would solve my problem. Well, I've been banging my head against grep, sed and awk and I've tried to set a convenient value for IFS so that it would only end lines at "</game>" and I am growing increasingly frustrated because I'm almost at the point where it would have been faster to do this manually. But then I'd remain ignorant.

I'm only just beginning to learn Bash so there is so much that I don't know, and I feel like this is the sort of thing that someone more knowledgeable could do with a single-liner but I'm completely stumped. So thank you for your time and please point me in the right direction.

please update the question to show your (`sed`, `grep`, `awk`) coding attempts and the (wrong) output generated by your code; also update the question to show the (correct) expected result — markp-fuso, Nov 19 '22 at 20:00
Try something like xmlstarlet, it's a command line XML/XSLT toolkit — micke, Nov 19 '22 at 20:07
@markp-fuso I was overwriting the same script file with new attempts so I don't have an actual log of what was done. I could try to update the post with some examples from memory, but I don't know how useful that would be; I only got as far as trying to divide the output of reading the file into blocks in different ways. If you think I should edit the post anyway I will, but I don't really see it being a benefit to others with the same problem. — Calibre, Nov 20 '22 at 15:06
@micke xmlstarlet worked, as per your answer and [KamilCuk's](https://stackoverflow.com/questions/74503169/how-to-remove-multi-line-blocks-of-text-of-varying-sizes-from-a-file-given-the-f/74504024#74504024). Thank you very much! — Calibre, Nov 20 '22 at 15:08

KamilCuk · Accepted Answer · 2022-11-20T09:50:04.533

3

Do not use line tools to edit XML files. Do not use Bash to edit XML files. Use XML tools to edit XML files. Write a program in python or Perl or other capable programming language with an XML library to edit XML.

The following with xmlstarlet is quite simple:

$ xmlstarlet ed -d '/gameList/game[ contains(path, ".desktop") ]' input.xml
<?xml version="1.0"?>
<gameList>
  <game>
    <path>./67000.The Polynomial.txt</path>
    <name>The Polynomial - Space of the music</name>
    <desc>Long description of game</desc>
    <releasedate>20101015T000000</releasedate>
    <developer>Dmytry Lavrov</developer>
    <publisher>Dmitriy Uvarov</publisher>
    <genre>Shooter, Music</genre>
    <players>1</players>
    <favorite>true</favorite>
  </game>
</gameList>

edited Nov 20 '22 at 09:50

answered Nov 19 '22 at 21:48

KamilCuk

120,984
8
59
111

Thank you! xmlstarlet worked perfectly. But could I ask you why there isn't a way to do this in bash (other than calling xmlstarlet or similar in a script)? Is it intentional by the shell's creators, like it simply falls out of the scope of what they intended bash to be capable of, or is it a limitation of how it was designed? Thanks again, in any case. – Calibre Nov 20 '22 at 15:03
@Calibre assume `bash` had a builtin XML parser ... why not add parsers for HTML, json, ymal, csv, text, etc ... where does it end? what about other shells (`csh`, `sh`, `dash`) ... should they have builtin support for a multitude of parsers, too? shells are good at managing processes, IO, and variables but for 'specialized' processing (eg, parsers) why re-invent the wheel when you can call a binary (`xmlstarlet`) designed for a specific task? ... – markp-fuso Nov 20 '22 at 19:22
... switch between `bash`, `csh`, `sh` and `dash`? go ahead ... no need to worry about each shell's XML parser implementation because they all have the ability to call a common XML parser (eg, `xmlstarlet`) – markp-fuso Nov 20 '22 at 19:24
`Is it intentional by the shell's creators, like it simply falls out of the scope of what they intended bash to be capable` Yes. It's a "shell". Literally, it's meant to run other programs. It should be lightweight, small and work. But answering a question "why is there no XML parser _library_ written in bash", the answer is that no one has written it. Give it a shot, you'll know why. – KamilCuk Nov 20 '22 at 21:02

How to remove multi-line blocks of text of varying sizes from a file given the first and last lines and a substring?

1 Answers1