I have a few hundred .txt
files in a directory that have the following format:
<DOC>
<DOCNO> 33 </DOCNO>
<SOURCE> URL v.01 </SOURCE>
<URL> www.url.com/extension.html </URL>
<DATE> 2019/12/29/ </DATE>
<TIME> </TIME>
<AUTHOR> </AUTHOR>
<HEADLINE>
The title is here
</HEADLINE>
<TEXT>
Text that I want
</TEXT>
</DOC>
I would like to manipulate every single file so that the file would only contain the text between the <TEXT>
and </TEXT>
tags (i.e.Text that I want
)
I have tried the following code but it does not seem to do what I need:
find /root/Desktop/data/data -type f | xargs sed -n '/<TEXT/,/<\/TEXT/p'
How can I do this using a bash script (preferably using sed
)?