0

I am using Jsoup to scrap some data. In my document, I have something like:

  <script type="text/javascript">
ta.store('mapsv2.geoName', 'Marseille');
ta.store('mapsv2.map_addressnotfound', 'Address not found');         ta.store('mapsv2.map_addressnotfound3', 'We couldn\'t find that location near {0}.  Please try another search.');       </script> 
  <script type="text/javascript">
window.mapDivId = 'map0Div';
window.map0Div = {
lat: 43.295246,
lng: 5.364188,
zoom: null,
locId: 5039388,
geoId: 187253,

My code:

   Document attractionDoc = Jsoup.connect(url).timeout(100000).get();
   System.out.println("attractionDoc "+attractionDoc.toString());

But I don't know how to catch the number after lat: and lng:

Thanks for your help!

Jose
  • 1,159
  • 1
  • 9
  • 23
  • I believe you'll have to write up a regex for that. e.g. Retrieve the textual contents of the script tags, check if the contents contain the words "lat" and "lng" and then parse them out via regex. I'd write up an answer myself but I'm not that comfortable with regex unfortunately. – Ceiling Gecko Feb 02 '15 at 10:04

1 Answers1

1

JSoup does not parse embedded Javascript, so there is no easy way of getting the object members lat and lng from the window.map0Div object.

But as indicated by @Ceiling Gecko, you can parse the contents of the script tag with other techniques, e.g. regular expressions.

Assuming you have the script contents as a String called content you may use something like:

Pattern p = Pattern.compile("window.map0Div\\s*=\\s*\\{.*lat:\\s*([0-9.]+),.*lng:\\s*([0-9.]+),");
Matcher m = p.matcher(content);
if (m.find()){
    String lat = m.group(1);
    String lng = m.group(2);
    //do whatever you need to do with the info
}

Here is a fiddle with the regex: http://fiddle.re/1p0yd6

luksch
  • 11,497
  • 6
  • 38
  • 53