-1

I want to delete text of multiple docx file using python language.

Let's say the contents of a file are:

This is line 1
This is line 2
This is line 3
This is line 4

I want to delete the very last line only i.e. This is line 4.

I've tried many code but getting errors.

Try 1:

with open(r"FILE_PATH.docx", 'r+', errors='ignore') as fp:
    # read an store all lines into list
    lines = fp.readlines()
    # move file pointer to the beginning of a file
    fp.seek(0)
    # truncate the file
    fp.truncate()

    # start writing lines except the last line
    # lines[:-1] from line 0 to the second last line
    fp.writelines(lines[:-1])

Above code runs with 0 errors but getting some loss of data in the docx file.

See the relevant screenshots here and here.

1 Answers1

1

You will not get the correct lines from a docx using that method, a docx is not the like a text file. (If you use your current method on a txt file it will work).

Do this and you can see what you are removing:

with open(r"FILE_PATH.docx", 'r+', errors='ignore') as fp:
    # read an store all lines into list
    lines = fp.readlines()
    print(lines[-1]) # or print(lines) to see all the lines

You are not removing This is line 4 you are removing a part of the docx file.

Although there are ways to read a docx without additional libraries, using something like docx2txt or textract might be easier.

There are other questions in stack overflow that address how to read and modify a docx, take a look and you will find a way to adapt your code if a docx is still what you want to work with.

Isaac Rene
  • 386
  • 2
  • 6
  • Got my answer bro. So, I've just edited the string I'm passing to docx file with a line of code - ```result = text[: text.rfind('\n')]``` – Nitin Kumar Oct 21 '22 at 19:35