Encoder.HtmlEncode encodes Farsi characters

Question

I want to use the Microsoft AntiXss library for my project. When I use the Microsoft.Security.Application.Encoder.HtmlEncode(str) function to safely show some value in my web page, it encodes Farsi characters which I consider to be safe. For instance, it converts لیست to لیست. Am I using the wrong function? How should I be able to print the user input in my page safely?

I'm currently using it like this:

<h2>@Encoder.HtmlEncode(ViewBag.UserInput)</h2>

score 1 · Answer 1 · answered Jul 04 '14 at 15:35

1

I think I messed up! Razor view encodes the values unless you use @Html.Raw right? Well, I encoded the string and it encoded it again. So in the end it just got encoded twice and hence, the weird looking chars (Unicode values)!

answered Jul 04 '14 at 15:35

Alireza Noori

14,961
30
95
179

score 0 · Answer 2 · answered Jul 04 '14 at 16:31

If your encoding (lets assume that it's Unicode by default) supports Farsi it's safe to use Farsi, without any additional effort, in ASP.NET MVC almost always.

First of all, escape-on-input is just wrong - you've taken some input and applied some transformation that is totally irrelevant to that data. It's generally wrong to encode your data immediately after you receive it from the user. You should store the data in pure view to your database and encode it only when you display it to the user and according to the possible vulnerabilities for the current system. For example the 'dangerous' html characters are not 'dangerous' for SQL or android etc. and that's one of the main reasons why you shouldn't encode the data when you store it in the server. And one more reason - when you html encode the string you got 6-7 times more characters for your string. This can be a problem with server constraints for strings length. When you store the data to the sql server you should escape, validate, sanitize your data only for it and prevent only its vulnerabilities (like sql injection).

Now for ASP.NET MVC and razor you don't need to html encode your strings because it's done by default unless you use Html.Raw() but generally you should avoid it (or html encode when you use it). Also if you double encode your data you'll result in corrupted output :)

I Hope this will help to clear your mind.

Well, your first assumption is wrong. I didn't apply it immediately after input, as you can see, I'm trying to encode before *outputting* the user input. That's the exact usage of this method. You should encode users' input, otherwise you're going to encode your own code which is silly! As for the second part, I mentioned why I was wrong to do this in my own answer, but thank you very much for yours. — Alireza Noori, Jul 04 '14 at 19:21

Encoder.HtmlEncode encodes Farsi characters

2 Answers2