Sanitizing email addresses: filter_var() / FILTER_SANITIZE_EMAIL vs htmlentities()

Question

If a user enters an email address (and it validates), I think I am correct that it, in common with all user entered data, should be sanitized before outputting it in HTML in case the email address contains malicious code. I am confused as to whether using filter_var() with the FILTER_SANITIZE_EMAIL flag is considered a good way of doing this or whether that function is intended for some other purpose.

I would have thought that if an email validates but actually changes when sanitized with the above approach there would be a problem that now the email would be different when displayed than the valid email address that was entered by the user. If someone tried to use the displayed version to send an email it would presumably not be sent to the intended person.

In view of this why not just use htmlentities($email) to display the validated email addresses which I think will display it as it was entered but safely as key characters like < have been encoded with safe html entities?

If it were this simple I imagine filter_var and FILTER_SANITIZE_EMAIL would not be used so I would like to know if I am misunderstanding the situation or have missed some aspect which I should know about.

There is a related question, although about URLs, which is relevant and interesting but which does not address the issue about sanitization changing the actual value of the entered data. It is also very old and thinking on the subject may have changed.

score 2 · Answer 1 · answered Feb 15 '23 at 00:07

2

Instead of using FILTER_SANITIZE_EMAIL consider using FILTER_VALIDATE_EMAIL, see: Validate filters. This will reasonably accurately tell you whether the e-mail address could actually exist. If that's not the case you reject the email address.

Another thing I sometimes do is checking the domain in the email address, after the previous check of course. It goes something like this:

[$username, $domain] = explode('@', $inputEmail);
$domain = idn_to_ascii($domain);
$validDomain = (checkdnsrr($domain, "MX") || checkdnsrr($domain, "A"));

This will return the correct result most of the time. This does need a properly functioning DNS system and network.

My main point is that you shouldn't try to sanitize an email address, you should validate it before you accept it as input, store it, use it or output it.

answered Feb 15 '23 at 00:07

KIKO Software

15,283
3
18
33

I am validating them as you say with `FILTER_VALIDATE_EMAIL` and rejecting if they don't validate. It seems theoretically possible that a valid email could still have malicious code in it given the range of characters emails are allowed to include. So I thought I needed to sanitize the valid emails before outputting to HTML. It may be overkill but then why does PHP have FILTER_SANITIZE_EMAIL if there is not a use case for it? – user3425506 Feb 15 '23 at 00:38
1

@user3425506 `FILTER_SANITIZE_EMAIL` is used to remove all illegal characters from an email address. I don't use it, there's no need, so I don't know how it copes with "strange" characters that could be valid. If you are afraid of malicious code you can use `htmlentities()` in webpages and always [use prepared statements with bound variables](https://www.php.net/manual/en/security.database.sql-injection.php) in database queries. – KIKO Software Feb 15 '23 at 00:53
That is good to hear as it is what I was beginning to think was the best plan. – user3425506 Feb 15 '23 at 09:53

Sanitizing email addresses: filter_var() / FILTER_SANITIZE_EMAIL vs htmlentities()

1 Answers1