0

This might not be a programming question, but I could not find any answer for it on Google.

I have some text mining task and doing data cleaning at the moment. I have come across some mystery characters far to often which are not in readable format.

These characters are: &#x003b2 , &#x00025 and so on.

All of these starts with a specific pattern and hence I believe they represent some encoding which is not readable to Excel.

Is there any way to convert them? I need to know what exactly these characters mean in order to know if I should remove them or not.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Keval Shah
  • 393
  • 1
  • 4
  • 14

2 Answers2

3

Those are probably Unicode characters written as HTML entities in hexadecimal format.

  • &#x003b2 is the "GREEK SMALL LETTER BETA" (β).
  • &#x00025 is the "PERCENT SIGN" (%).
Tunaki
  • 132,869
  • 46
  • 340
  • 423
  • I am not sure about Unicode ,but the percent sign and beta character makes perfect sense in my data since these characters are always after some numbers. – Keval Shah Nov 23 '15 at 17:53
2

They look like formatted hexadecimal values (possibly unicode if you're working with characters). You may know them as 0x003B2 and 0x00025, or many many other ways.

Kayaman
  • 72,141
  • 5
  • 83
  • 121