0

I've done an OCR on a pdf image and extracted the text. The OCR for some reason has converted a single space to a double carriage return line feed.

eg.

"\r\n\r\n"

The following doesn't work as I think my 4 characters are not really a stirng but 4 non printable CHARACTERS.

DocumentData = DocumentData.Replace(@"\r\n\r\n", "");

I only want to replace those 4 non printable characters with a space when they occur together.

How can this be achieved without too much fuss.

Ghasem
  • 14,455
  • 21
  • 138
  • 171
ErickTreetops
  • 3,189
  • 4
  • 27
  • 37

3 Answers3

4

The problem is the usage of the "@". By pre-pending your text with it, the escaping is ignored. Just use -

DocumentData = DocumentData.Replace("\r\n\r\n", " ");
Chris Tomich
  • 133
  • 1
  • 7
2

Is this what you want?

DocumentData = DocumentData.Replace("\r\n\r\n", " "); // <-- change "" to " ", remove @ char
NoName
  • 7,940
  • 13
  • 56
  • 108
0

If you want to ensure it doesn't matter what system you're (or the sender) running on and you'll always catch the non-printable I would utilize Regular Expressions:

DocumentData = Regex.Replace(DocumentData, @"\r\n?|\n|\r|\s+", " ");

Edit: Made the expression a touch more robust and checking for extra whitespaces replacing them with a single which will avoid excessive spacing after replacement so it's specific to this question. My Bad.

Gabe
  • 570
  • 1
  • 4
  • 15
  • Thank you, I did end up using regex but I'm no expert in it. My template was slightly different in that it was @"\r\n\r\n". Only replacing where all 4 non-printable characters occurred at the same time. Won't yours also replace a single "\r\n" and single "\r" and single "\n" occurrences ? what does the "\s+" do ? – ErickTreetops Feb 25 '16 at 23:52
  • @user1413844 - it removes multiple whitespaces from a string. It's there because this regex method forces each character to be found and replaced with a white space, as such you get multiples. But adding `\s+` you resolve this issue as the same command will also check for multiple white spaces and replace them with a single one – Gabe Feb 26 '16 at 13:17