9

I am having a .rtf file and I want to read the file and store strings into list using python3 by using any package but it should be compatible with both Windows and Linux.

I have tried striprtf but read_rtf is not working.

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

But in this code, the error is: cannot import name 'read_rtf'

Please can anyone suggest any way to get strings from .rtf file in python3?

khelwood
  • 55,782
  • 14
  • 81
  • 108
RajAt SiNha
  • 97
  • 1
  • 1
  • 8

4 Answers4

6

Using rtf_to_text is enough to convert RTFinto a string in Python. Read the content from a RTFfile and then feed it to the rtf_to_text:

from striprtf.striprtf import rtf_to_text

with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)
Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
user17725480
  • 61
  • 1
  • 1
5

Have you tried this?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

For a super large file, try this:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)
Binh
  • 1,143
  • 6
  • 8
3

Try using this:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)
Yang Yushi
  • 725
  • 4
  • 20
1

Reading RTF file and manipulating the data inside that is tricky, it is depending upon the file you have, Hence I have tried all the above nothing worked, finally, the following code worked for me. Hope it will help those who are hunting for the solution.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

If you want to store in a single variable, the following code will solve the problem.

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects\10.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()
Buddhadeb Mondal
  • 187
  • 1
  • 10