I'm testing an SDK that extracts text from a searchable PDF. One of the SDK's dependencies was recently updated, and it's causing an existing test on Hebrew text to fail. I don't know Hebrew nor enough about how the involved technologies represent right-to-left languages.
The NUnit test asserts that the extracted text matches the C# string "מנבוצץז ".
string hebrewText = reader.ReadToEnd();
Assert.AreEqual("מנבוצץז ", hebrewText);
The rasterized PDF has what I believe are the same characters, but in the opposite order.
The unit test fails with this message:
Expected: "מנבוצץז "
But was: " זץצובנמ"
Although the actual result more closely matches what I see in the rasterized PDF, I'm not completely sure the original test is wrong.
- Are Hebrew characters in a C# string supposed to be read right-to-left like printed Hebrew text?
- Does any part of the .NET stack tamper with the direction of Hebrew strings?
- What about NUnit?
- Are Hebrew characters embedded in a searchable PDF normally supposed to go in the same direction as the rasterized text?
- Anything else I should know before deciding whether to "fix" this unit test?