0

I'm trying to tweak a subtitle file by removing certain parts. In order to do that I need to be capable of reading a writing all symbols that the file(.txt) contains. Even though most of the characters are easily recognizable there is one that I am unable to detect. That character is a curly(smart) apostrophe.

enter image description here

You can see it being used in the 3rd line's first word.

If anyone has an idea how to recongize this character, please share it. Any help would be appreciated.

BuderBrodas
  • 27
  • 1
  • 5
  • There are many tools that will dump out the contents of the file in hexadecimal. Use that to figure out the byte sequence of the character in question. Mission accomplished. – Sam Varshavchik May 22 '21 at 16:16
  • Could be this one (0x2019): https://www.fileformat.info/info/unicode/char/2019/index.htm. That page also shows the encoding in UTF-8. – Paul Sanders May 22 '21 at 16:22
  • @PaulSanders Could you please, provide a sample of code of how the comparison supposed to look like? My try was: `if(First_Word[5] == '\u2019') cout<<"Found it";` Am I doing it wrong? – BuderBrodas May 22 '21 at 16:34
  • @SamVarshavchik Could you please give an example of such software. In addition lets say I get the byte sequence of that character, how am I supposed to use it? Should I just throw them between a set of single quotation marks? – BuderBrodas May 22 '21 at 16:42
  • Why, [it's `od`, of course](https://man7.org/linux/man-pages/man1/od.1.html). And the way that you're "supposed to use it" is that once you know which character sequence it is, write the appropriate logic for it. You must be aware that, for example, instead of comparing something to `'A'` you can compare it with 65, the ASCII code for `'A'`. Well, all this is would be a sequence of one or more bytes that you will simply need to replace, when reading the file. Every C++ textbook gives several examples of these kinds of things, which textbook are you using? – Sam Varshavchik May 22 '21 at 17:05
  • You could cut + paste a smart quote into your code. Not sure if that would work. Or you can write a mini-program that dumps the hex value of a file that contains only a smart quote, then you'll know what to compare against. – Joseph Larson May 22 '21 at 20:56
  • If you're handing extended characters, you should probably be using wide strings (`std::wstring`). For some inspiration, see [here](https://stackoverflow.com/a/3322458/5743288). – Paul Sanders May 22 '21 at 21:02

0 Answers0