0

Like if I have the tags

<td class="cit-borderleft cit-data">437</td>
<td class="cit-borderleft cit-data">394</td>
<td class="cit-borderleft cit-data">12</td>
<td class="cit-borderleft cit-data">**12**</td>

But I need to match number 12 in the last tag. I am using the regex expression "<td class=\"cit-borderleft cit-data\">(.*?)</td>" but it will match all four of the tags.

  • 2
    Is `**` really part of your input, or did you try to make that `12` bold in your example? Also is there any reason why you don't want to use proper HTML/XML parser and decided to use regex? – Pshemo Jul 30 '16 at 15:20
  • 1
    Parsing HTML with regular expressions will just end in tears, use a proper HTML parser as suggested in one of the answers. – greg-449 Jul 30 '16 at 15:36

4 Answers4

2

Don't use regex. Use proper XML/HTML parser like jsoup. If you simply want to get text from last element of type td with classes cit-borderleft cit-data you can use

String html = 
        "<table>" +
        "<td class=\"cit-borderleft cit-data\">437</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">394</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">12</td>\r\n" + 
        "<td class=\"cit-borderleft cit-data\">**12**</td>" +
        "</table>";
Document doc = Jsoup.parse(html);
Element last = doc.select("td.cit-borderleft.cit-data").last();
System.out.println(last.text());

Output: **12**

If you then want to remove these * simply call replace("*","") on that string and you will get new one without asterisks.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
0

Try this:

<td class=\"cit-borderleft cit-data\">\*\*(.*?)\*\*<\/td>

or simple way, this:

\*\*(\d+)\*\*
Aleksandr Podkutin
  • 2,532
  • 1
  • 20
  • 31
0

Based on your attempt

<td class=\"cit-borderleft cit-data\">(.*?)<\/td>(?![\s\S]*<\/td>)

Demo
added this part (?![\s\S]*<\/td>)

(?!             # Negative Look-Ahead
  [\s\S]        # Character in [\s\S] Character Class
  *             # (zero or more)(greedy)
  <             # "<"
  \/            # "/"
  td>           # "td>"
)               # End of Negative Look-Ahead
alpha bravo
  • 7,838
  • 1
  • 19
  • 23