0

I am reading a table from a Word file. Below is my code to read the word file:

  import win32com.client as win32


  word = win32.Dispatch("Word.Application")
  word.Visible = 0
  word.Documents.Open(SigLexiconFilePath)
  doc = word.ActiveDocument
  table = doc.Tables(1)

  for i in range(2 , table.Rows.Count+1):
    commandName = table.Cell(Row = i, Column= 0).Range.Text 

All commandName has german characters and 2 non-printable characters in the end of the string. For example :

commandName = Prüf\r\x07

I used below code to remove the non-printable characters but it also removes the german characters from the string.

commandName = ''.join(filter(lambda x: x in string.printable, commandName))
commandName = commandName.strip()

Is there any pythonic way to remove the unnecessary characters from the string? Final output i want is :

commandName = Prüf

Anudocs
  • 686
  • 1
  • 13
  • 54
  • 1
    "``All commandName has [...] and 2 non-printable characters ...``" if that's always the case, just always remove the last 2 characters. – Mike Scotty Nov 06 '19 at 09:23
  • The reason for these characters: This content comes from a table cell. Those are the structural control chacters for the table cell. – Cindy Meister Nov 06 '19 at 10:35

1 Answers1

0

this simple line worked for me :

commandName.split('\r')
Anudocs
  • 686
  • 1
  • 13
  • 54