I have xml files that contain scrolling lyrics for karaoke songs that we are acquiring from another company. I’m in need of removing each <pg> tag that contain multiline phrases like:
8
BAR
INSTRUMENTAL
BREAK
They are always on their own separate page within a <pg> tag. The company told us the common words that appear every time are BAR & BREAK. This will avoid actual lyrics from being deleted within the remaining page tags (hopefully). There may be multiple instances of these tags throughout the xml as well. I need find and delete all of them.
I’m able to select the opening <pg and all the code up until the next opening <pg one at a time with this regex in Notepad++:
(<pg)(.+?)(?=<pg)
Is there a way to add code to locate both words BAR and BREAK to the above regex and only have those full tags found and deleted (multiple times within a file)? Then I can switch to Find In Files for a bulk search and replace routine?
Below is an example of 3 <pg> tags consecutively. I need the 2nd complete tag found and deleted, then continue on to delete another full <pg> tag if found until it reaches the end of the file. (rinse and repeat)
I have about 24 files to test with 7000 to follow. I’m hoping the common denominator of words to select between the <pg> tags are always BAR and BREAK.
Thank you so much for any help and advice.
<pg id=“lyrics.16” t=“157.09,15.88”>
<ln>
<lyr s="I’M " t=“161.28,.24”/>
<lyr s="ON " t=“161.52,.43”/>
<lyr s="MY " t=“161.95,.37”/>
<lyr s="OWN " t=“162.32,1.05”/>
</ln>
<ln>
<lyr s="I’M " t=“164.57,.26”/>
<lyr s="ON " t=“164.83,.42”/>
<lyr s="MY " t=“165.25,.43”/>
<lyr s="OWN " t=“165.68,1.07”/>
</ln>
<ln>
<lyr s="I’M " t=“167.91,.24”/>
<lyr s="ON " t=“168.15,.38”/>
<lyr s="MY " t=“168.53,.42”/>
<lyr s="OWN " t=“168.95,.62”/>
</ln>
<ln>
<lyr s="NO " t=“169.57,.48”/>
<lyr s="NO " t=“170.05,.19”/>
<lyr s="NO " t=“170.24,.41”/>
<lyr s="NO " t=“170.65,.43”/>
<lyr s="NO " t=“171.08,.56”/>
</ln>
<ln>
<lyr s="YEAH " t=“171.64,.23”/>
<lyr s="EH " t=“171.87,.42”/>
<lyr s="YEAH " t=“172.29,.58”/>
</ln>
</pg>
<pg id=“lyrics.17” t=“172.97,7.93”>
<ln>
<lyr s="8 " t=“174.16,.21”/>
<lyr s="BAR " t=“174.37,.24”/>
</ln>
<ln>
<lyr s="INSTRUMENTAL " t=“174.61,4.52”/>
</ln>
<ln>
<lyr s="BREAK " t=“179.13,1.67”/>
</ln>
</pg>
<pg id=“lyrics.18” t=“180.9,9.72”>
<count c=“pt.1” t=“184.92,1.27” n=“4”/>
<ln>
<lyr s="WOAH " t=“186.55,.25”/>
<lyr s="OH " t=“186.8,.39”/>
<lyr s="WOAH " t=“187.19,.41”/>
</ln>
<ln>
<lyr s="I " t=“187.6,.21”/>
<lyr s="CAN’T " t=“187.81,.38”/>
<lyr s="LET " t=“188.19,.28”/>
<lyr s="YOU " t=“188.47,.38”/>
<lyr s="GO " t=“188.85,.6”/>
</ln>
<ln>
<lyr s="MY " t=“189.45,.44”/>
<lyr s="LITTLE " t=“189.89,.6”/>
<lyr s="GIRL " t=“190.49,.03”/>
</ln>
</pg>
I'm unable to create the additional part of the Notepad++ search needed and I'm asking for advice.