2

I like to get the urls that are in anchor tag definitions from html strings. The html is structured fairly well but the string that I am trying to collect contains addresses for google maps and can be very different. I am trying to get all matching urls using preg_match_all.

<tr><td><a href="http://maps.google.com/maps?q=4165 E LIVE OAK AVE,">map</a></td></tr>
<tr><td><a href="http://maps.google.com/maps?q=8000 SUNSET BLVD, LOS ANGELES,">map</a></td></tr>
<tr><td><a href="http://maps.google.com/maps?q=30600 THOUSAND OAKS BLVD, AGOURA,">map</a></td></tr>
<tr><td><a href="http://maps.google.com/maps?q=9090 19TH ST, ALTA LOMA,">map</a></td></tr>
<tr><td><a href="http://maps.google.com/maps?q=185 W ALTADENA DR, ALTADENA,">map</a></td></tr>
<tr><td><a href="http://maps.google.com/maps?q=620 E MOUNT CURVE AVE,">map</a></td></tr>
Cœur
  • 37,241
  • 25
  • 195
  • 267
Bruce Lim
  • 745
  • 6
  • 22

1 Answers1

1

Try the following regular expression:

/http:\/\/maps.google.com\/maps\?q[^"]+(?=")/

But the page may contain similar URLs outside the HTML structure you've presented, then it's better to use a more complicated regexp:

/(?<=<tr><td><a href=")http:\/\/maps.google.com\/maps\?q[^"]+(?=">map<\/a><\/td><\/tr>)/
Aleksei Zyrianov
  • 2,294
  • 1
  • 24
  • 32
  • Thanks what you gave me was what I need to shorten down what I was trying to use before to grab all urls. Here's what I have now:
    `code` /\b(?:(?:https?|ftp):\/\/|www\.)[^"]+(?=")/ `code`
    – Bruce Lim Apr 21 '13 at 21:17