3

I am processing textual data. Some words are wrapped in special characters I do not understand. Their hex codes are D5 and E5:

xxd data:

0000b60: 6520 5375 7072 656d 6520 436f 7572 7420  e Supreme Court 
0000b70: 616e 6420 736f 6d65 2073 7461 7465 7320  and some states 
0000b80: 6861 7665 2072 6563 6f67 6e69 7a65 6420  have recognized 
0000b90: 7468 6174 2022 7468 6520 d546 6966 7468  that "the .Fifth    # Here
0000ba0: e520 416d 656e 646d 656e 7420 646f 6573  . Amendment does    # and here.
0000bb0: 206e 6f74 2070 7265 636c 7564 6520 7468   not preclude th
0000bc0: 6520 696e 6665 7265 6e63 6520 7768 6572  e inference wher
0000bd0: 6520 7468 6520 7072 6976 696c 6567 6520  e the privilege 
0000be0: 6973 2063 6c61 696d 6564 2062 7920 6120  is claimed by a 
0000bf0: 7061 7274 7920 746f 2061 2063 6976 696c  party to a civil

Does anyone have an idea what encoding could these characters come from and what could they mean?

choroba
  • 231,213
  • 25
  • 204
  • 289
  • 1
    Any info about the origin of the text? Was it really plain text? It’s difficult to imagine what text characters could appear there. I thought of single quotes, presumably as munged by incorrect character code conversion, but that does not sound plausible. But they might be some internal codes used e.g. to indicate start of bolding and end of bolding. This would be program-spefific. – Jukka K. Korpela May 24 '13 at 16:44
  • @JukkaK.Korpela: The data come from http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2008T05 . We own the original texts, too, which do not contain any characters at the positions. – choroba May 24 '13 at 17:50

0 Answers0