-1

I am trying to scrap a table content from a URL using java but the scraper is apparently not working correctly. I Used the java docs on inputstreamReader and other online examples but was not able to figure out what my problem is. The problem is that, the inputstreamReader is skipping two columns of every even row in the table while getting the last column. Every odd row produces the desired results. Below is my code and outputenter image description here.

The source table looks like this: enter image description here

Lastly, the output looks like this: enter image description here

In html term, each column in a row is a tag which is read in as lines. Since two columns are skipped does it mean that the inputStreamReader is skipping two line? I was thinking it would be a regEx problem but that couldn't be the cause because the rest of the output is correct. I want to be able to output or read in all rows and columns correctly to be able to proceed.

user3422517
  • 93
  • 2
  • 9
  • Double check your regular expressions...be sure they take into account variations in syntax for each table entry (eg plausible spaces). – copeg Jun 05 '15 at 23:49

1 Answers1

0

Price patterns are different in the odd and even rows.

Odd rows:

    <tr>
        <td>16:00:52</td>
        <td>$&nbsp;82.14&nbsp; </td>
        <td>763</td>
    </tr>

Even rows:

    <tr>
        <td>16:00:52 </td>
        <td>$&nbsp;82.14 &nbsp;</td>
        <td>8,116</td>
    </tr>

The pattern that matches both cases is:

String preicePattern = "<td>\\$&.+;(\\d{1,4}\\.\\d{1,4}) *&";
  • Hi Saka1029, your example didn't work for me but i was able to solve the problem by using: String preicePattern = "\\$&.+;(\\d{1,4}\\.\\d{1,4}) *&"; – user3422517 Jun 06 '15 at 03:21