0

I'm not validating emails. What I want to do is find (and then change) 3 separate types of "email" content in a (html) string:

  1. a plain email: eg user@test.com
  2. a mailto href: eg <a href="mailto:user@test.com">user@test.com</a>
  3. an aliased href: eg <a href="mailto:user@test.com">user's email</a>

I'm then going to transform each example into a custom html string that will then be modified by JS (anti-spam harvesting via Spamspan):

<span class="spamspan">
<span class="u">user</span>
@
<span class="d">example.com</span>
(<span class="t">Spam Hater</span>)
</span>

So you can see I also have to find these types of input, parse the email into user, domain and (optionally) a display value. I'm struggling at the moment with regexes to find these emails... parsing them should be straightfoward in PHP.

Edit: At the moment, I'm locked into PHP4. Will take a look at http://php-html.sourceforge.net/ for parsing HTML.

starmonkey
  • 3,147
  • 2
  • 20
  • 15

1 Answers1

1

You need a HTML parser and an email regex.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • I had to use http://php-html.sourceforge.net/ (which is what simplehtmldom is based off) due to the server running PHP4 (alas!). Key points: preg_match_all(), substr_replace() and some regexes. – starmonkey Jan 25 '10 at 05:41
  • Just ran into a bug - my regex for "plain" emails means that emails inside form fields are converted... I'll need to skip these :) – starmonkey Jan 27 '10 at 21:53
  • My solution was to use a regex to pull input fields out of the string and replace them with a string "token" (which is left alone by email regexes), then re-sub the original content back in after my email processing is completed. I really need to upgrade the server to PHP5! :) – starmonkey Jan 27 '10 at 23:29