How to find information inside a xml tag using grep?

Question

I am working on a bash script to extract some information from a xml file. I'm using grep for this.

To find the information I need, I run:

grep -oP "<title>(.*)</title>" temp.xml

I get a list of matches and this includes the <title> tag.

How can I get a list containing only the text inside the title tag but without the title tag using grep?

It has to be a quick scripting job, I wouldn't like to spend ages on it. Can you recommend a good xpath command line tool? — filype, May 28 '12 at 09:30
Looks like I've got xpath5.12 installed here already. No manual entry though — filype, May 28 '12 at 09:32
Any of them will suffice. Your XPath would be as simple as possible '//title/text()' — toniedzwiedz, May 28 '12 at 09:33
possible duplicate of [Extraction of data from a simple XML file](http://stackoverflow.com/questions/2222150/extraction-of-data-from-a-simple-xml-file) — tripleee, Jun 10 '15 at 07:14

score 43 · Answer 1 · answered May 28 '12 at 10:50

43

Since you already use grep -P, why don't you use its features?

grep -oP '(?<=<title>).*?(?=</title>)'

In the general case, XPath is the correct solution, but for toy scenarios, yes Virginia, it can be done.

answered May 28 '12 at 10:50

tripleee

175,061
34
275
318

1

but now grep -P is obsolete – Bharat Pahalwani Jul 07 '14 at 06:32
2

@Bharat Obsolete?? Can you provide a reference? – tripleee Jul 07 '14 at 07:48
i found that [here](http://stackoverflow.com/questions/16658333/grep-p-no-longer-works-how-can-i-rewrite-my-searches) – Bharat Pahalwani Jul 07 '14 at 07:55
4

The fact that OSX chose to remove useful functionality hardly indicates that the feature is obsolete. There is no indication that it will be removed from GNU `grep` which is easy to install on OSX if you need it, and standard on most other platforms these days. – tripleee Jun 10 '15 at 07:13
Are the ?<= called look behind or something in regex? I need to learn that – filype Jul 22 '16 at 21:24
1

[`man perlre`](http://perldoc.perl.org/perlre.html#Extended-Patterns) - `(?<=pattern)` is a lookbehind assertion and `(?=pattern)` is a lookahead assertion. – tripleee Jul 23 '16 at 06:04
1

I also don't have access to XPath on the unix system I'm using, so this is the best answer for me – deccles Mar 24 '20 at 02:37

toniedzwiedz · Accepted Answer · 2018-03-02T10:25:31.727

9

I can't see why you'd want to use grep for this, while it can be solved with a trivial XPath expression:

//title/text()

There are many command line tools for XPath and they're usually bundled with the OS.

Answers to this question on Stack Overflow list a number of such tools.

The problem with grep here is that it's a generic tool for text processing and it's not aware of any XML structure. For a very simple scenario, you can get it working. If the document is complex or if you're using this in a script that will survive months or years and not just a one-off job, you may end up feeling sorry for the results.

XPath makes it easy to tell the difference between similarly named tags that appear in different contexts in a document.

<article>
    <author>
        <name>Jon Doe</name>
        <title>Chief Editor</title>
    </author>
    <title>On the Benefits of grep</title>
    <publicationDate>2018-02-12</publicationDate>
    <text>blah blah blah</text>
</article>

Extracting the title of the article represented by this document with grep would fail if you used any of the other answers posted here. You could technically write the regular expression to get what you need but it's a lot easier with XPath.

/article/title/text()

If you know you're dealing with a trivial document and the format doesn't change or if it's a one time job where you can quickly validate the results, you can go for grep as explained by others.

edited Mar 02 '18 at 10:25

answered May 28 '12 at 09:55

toniedzwiedz

17,895
9
86
131

Examples of commands that support XPath are xgrep (http://wohlberg.net/public/software/xml/xgrep), xmlgrep (http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep) or sgrep (http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html). – Claudi Sep 05 '14 at 06:49
6

What didn't you understand in the (clear) question that ends with : "using grep" ? – Moonchild Feb 12 '15 at 16:46
3

What did you not understand in the answer providing a useful answer to a question that addresses the core of the problem as opposed to assumptions made by the OP. Why is it bothering you? – toniedzwiedz Feb 12 '15 at 17:18
See also http://stackoverflow.com/questions/15461737/how-to-execute-xpath-one-liners-from-shell for a catalog of XPath tools for U*x. – tripleee Jun 10 '15 at 08:48
3

Ask a question about oranges and the accepted answer is about bananas. Nice. Here's a tip: _tips go in comments_, not answers. – Christian Mar 01 '18 at 16:44
1

I'm working on a server that doesn't have xpath, nor xmlstarlet, but it has grep. This is why I'm looking for a grep answer and cannot use xpath. – Katie Mar 01 '18 at 23:44
@Kayvar then feel free to use any of the other answers to this question or install the right tool for the job on your server. Sure you can hack this with `grep`, I just think it's valuable to point out that it's not a robust solution and, depending on the XML format in question and the use case, such a solution may blow up in your face. The OP seems to agree. – toniedzwiedz Mar 02 '18 at 10:15

score 6 · Answer 3 · answered May 28 '12 at 09:10

6

It's not the best solution, I would search for XML lib in bash but you can do:

grep -oP "<title>(.*)</title>" temp.xml | cut -d ">" -f 2 | cut -d "<" -f 1

answered May 28 '12 at 09:10

hovanessyan

30,580
6
55
83

That's my solution for it too. – filype May 28 '12 at 09:29

score 3 · Answer 4 · answered Jan 16 '19 at 06:17

3

grep -oP "<foo>(.*)</foo>" "XML.xml" | sed -n 's/.*<foo>\([^<]*\)<\/foo>.*/\1/p' >> "foo.txt"

answered Jan 16 '19 at 06:17

NoviceSoundz

31
1

score 1 · Answer 5 · edited May 23 '17 at 12:25

1

You could install xgrep using xpath as suggested in Tom's answer

man xgrep

edited May 23 '17 at 12:25

Community

1
1

answered Feb 11 '13 at 15:25

Yannick

102
1
6

score 1 · Answer 6 · answered Feb 27 '21 at 02:01

1

You can use anyone of the below commands to get the values between the tags.

grep -oP '(>).*?(?=</title>)' test.xml | cut -d ">" -f 2
grep -oP '(?<=title>).*(?=</title)' test.xml
awk -F "[><]" '/title/{print $3}' test.xml

answered Feb 27 '21 at 02:01

ARGStackOvaFlo

185
1
4
16

score 0 · Answer 7 · edited Feb 17 '23 at 09:40

0

Use the gawk tool ex:

gawk 'BEGIN { RS="<[^>]+>" } { print RT, $0 }' myfile

edited Feb 17 '23 at 09:40

S.B

13,077
10
22
49

answered Feb 14 '23 at 21:40

Javarious Madison

1

How to find information inside a xml tag using grep?

7 Answers7

Linked