1

I have xml file with TAG like this:

<Question>dzia&amp;#322;owa</Question>

I'm reading this file using XmlTextReader and for this TAG I get something like this:

dzia&#322;owa

How to replace html entity numbers inside my xml to get something like this: "działowa"?

Przemysław Michalski
  • 9,627
  • 7
  • 31
  • 37
  • Why is the content of your tag escaped twice? Fix the problem, if possible. – dtb Oct 19 '10 at 11:28
  • probably I'll have to make unescape the entities by name - after that - I'll have text like "działowa" and now - how to change entity number "ł" to valid text 'ł'? – Przemysław Michalski Oct 19 '10 at 11:36

2 Answers2

1

The only HTML entity in your sample is &amp;. You've then got some normal text that says #322;. You either want

<Question>dzia&amp;&#322;owa</Question>

which would give "dzia&łowa" (probably not what you want)

or

<Question>dzia&#322;owa</Question>

which would give "działowa"

Graham Clark
  • 12,886
  • 8
  • 50
  • 82
  • ł is entity number for char 'ł' I would like to get 'działowa' – Przemysław Michalski Oct 19 '10 at 11:30
  • @UGEEN: Yes, but in your question you *don't* have this entity - something has correctly encoded the special character as the HTML entity number, but then the ampersand (&) in the entity has been encoded again. You don't want this double encoding, you just need `ł`, *not* `&#322`. – Graham Clark Oct 19 '10 at 12:41
  • I need first to decode &#322 to ł and then decode ł to 'ł' char. 2-steps decoding i think - I don't see a better way. – Przemysław Michalski Oct 19 '10 at 13:51
0

I think I solved part of the problem (encoding &#number; to char):

public static string EntityNumbersToEntityValues(string s)
        {
            Match match = Regex.Match(s, @"&#(\d+);", RegexOptions.IgnoreCase);
            while(match.Success)
            {
                string v = match.Groups[1].Value;
                string c = char.ConvertFromUtf32(int.Parse(v));
                s = Regex.Replace(s, string.Format("&#{0};", v), c);
                match = match.NextMatch();
            }           
            return s;
        }
Przemysław Michalski
  • 9,627
  • 7
  • 31
  • 37