Searching for a tag, then saving text between tag as a variable

Question

I'm quite new to Java, but how would I go about searching a file for a tag, then everything between the tags, like a string of text, would be assigned to a variable.

For example, I'd have <title>THE TITLE</title>, But then I wanted to save the string "THE TITLE" to a variable called title1, or something.

How should I go about doing that? Thank you.

I editted the question for you, is that what you meant? – amit Aug 17 '11 at 13:38 — amit, Aug 17 '11 at 13:38

score 5 · Accepted Answer · answered Aug 17 '11 at 13:40

5

If you use regular expressions, then you just use a capture group:

Pattern p = Pattern.compile("<title>([^<]*)</title>", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(theText);
if (m.find()) {
    String thisIsTheTextYouWant = m.group(1);
    ....

answered Aug 17 '11 at 13:40

Ernest Friedman-Hill

80,601
10
150
186

score 2 · Answer 2 · edited May 23 '17 at 11:55

2

You should not use regex to parse HTML: RegEx match open tags except XHTML self-contained tags

Try jsoup http://jsoup.org/cookbook/extracting-data/attributes-text-html

String html = "<title>THE TITLE</title>";
Document doc = Jsoup.parse(html);
Element title = doc.select("title").first();
String result = title.text();

edited May 23 '17 at 11:55

Community

1
1

answered Aug 17 '11 at 13:50

bpgergo

15,669
5
44
68

Note that he's not parsing the whole document; he's grabbing the text of specific elements. Using a regex is going to be way more efficient if he's, say, indexing web pages by their titles. If he's writing a web browser, then yeah, he needs a parser. But people are way too quick to introduce dependencies like this when they're not necessary. – Ernest Friedman-Hill Aug 17 '11 at 14:04
@Ernest, I agree partly: Using a regex is going to be way more efficient in special cases. E.g. if OP wants to process html files from one specific source at one specific time. But if OP will process html files from all different sources or through longer period of time, then a regexp solution will fail sooner or later - there is so much tumbler out there. It is not just my _opinion_, it is my experience, I did much screenscraping. You want something quick and dirty? Go for regexp. Want something robust and long-lasting? Go for a HTML parser. – bpgergo Aug 17 '11 at 14:14

Searching for a tag, then saving text between tag as a variable

2 Answers2

Linked