0

I am parsing various .docx documents but the part of my code that splits paragraphs when it encounters "\n" is adding a new line when it encounters this weird symbol (circled in yellow):

enter image description here

could someone tell me what non printable character is this and how can I replace it just with a normal " " space?

(I can't just copy and paste it and use a replace() function because when I do, the char gets interpreted as a \n, but as you can see, if Word were really interpreting that character as an enter, it would've added the inverted P char insted of the weird enter sign (when I click on the show non printable characters button in Word), and it isn't. Hope I explained myself, thanks so much for the help!).

1 Answers1

0

I believe you'll find this character is a line break. In python-docx, the str value of paragraph.text represents a line-break with "\n". You can have those mapped to a space (" ") instead using:

paragraph_text = paragraph.text.replace("\n", " ")
scanny
  • 26,423
  • 5
  • 54
  • 80