Remove very last character in file

Question

After looking all over the Internet, I've come to this.

Let's say I have already made a text file that reads: Hello World

Well, I want to remove the very last character (in this case d) from this text file.

So now the text file should look like this: Hello Worl

But I have no idea how to do this.

All I want, more or less, is a single backspace function for text files on my HDD.

This needs to work on Linux as that's what I'm using.

Martijn Pieters · Accepted Answer · 2020-01-24T21:19:26.063

77

Use fileobject.seek() to seek 1 position from the end, then use file.truncate() to remove the remainder of the file:

import os

with open(filename, 'rb+') as filehandle:
    filehandle.seek(-1, os.SEEK_END)
    filehandle.truncate()

This works fine for single-byte encodings. If you have a multi-byte encoding (such as UTF-16 or UTF-32) you need to seek back enough bytes from the end to account for a single codepoint.

For variable-byte encodings, it depends on the codec if you can use this technique at all. For UTF-8, you need to find the first byte (from the end) where bytevalue & 0xC0 != 0x80 is true, and truncate from that point on. That ensures you don't truncate in the middle of a multi-byte UTF-8 codepoint:

with open(filename, 'rb+') as filehandle:
    # move to end, then scan forward until a non-continuation byte is found
    filehandle.seek(-1, os.SEEK_END)
    while filehandle.read(1) & 0xC0 == 0x80:
        # we just read 1 byte, which moved the file position forward,
        # skip back 2 bytes to move to the byte before the current.
        filehandle.seek(-2, os.SEEK_CUR)

    # last read byte is our truncation point, move back to it.
    filehandle.seek(-1, os.SEEK_CUR)
    filehandle.truncate()

Note that UTF-8 is a superset of ASCII, so the above works for ASCII-encoded files too.

edited Jan 24 '20 at 21:19

answered Sep 17 '13 at 18:36

Martijn Pieters

1,048,767
296
4,058
3,343

According to [1] "SEEK_END or 2: seek to the end of the stream; offset must be zero (all other values are unsupported)." 1: https://docs.python.org/3/library/io.html?highlight=newline#io.TextIOBase.seek – zvyn Mar 16 '16 at 03:07
5

@zvyn: You are looking at the wrong documentation. See [`io.IOBase.seek()`](https://docs.python.org/3/library/io.html?highlight=newline#io.IOBase.seek) instead. The file is opened in *binary mode*, not text mode. In text mode the offsets depend on the encoding of the text which can use variable-length bytes; which is why the `TextIOBase.seek()` method doesn't support seeking backwards. But in binary mode we seek by bytes instead and negative offsets from the end are perfectly legal. – Martijn Pieters Mar 16 '16 at 08:35
This seems to take a very long time with large files (i.e. > 10GB). There must be some file reading or copying going on. The truncate command worked better for me but maybe I did something wrong. – shrewmouse Jun 12 '20 at 12:20
@shrewmouse: I've used this on very, very large files without issue. Without details on OS or filesystem, there isn't much I can do to help you debug why you ran into issues. – Martijn Pieters Jun 18 '20 at 19:50

score 11 · Answer 2 · answered Feb 04 '17 at 13:36

Accepted answer of Martijn is simple and kind of works, but does not account for text files with:

UTF-8 encoding containing non-English characters (which is the default encoding for text files in Python 3)
one newline character at the end of the file (which is the default in Linux editors like vim or gedit)

If the text file contains non-English characters, neither of the answers provided so far would work.

What follows is an example, that solves both problems, which also allows removing more than one character from the end of the file:

import os


def truncate_utf8_chars(filename, count, ignore_newlines=True):
    """
    Truncates last `count` characters of a text file encoded in UTF-8.
    :param filename: The path to the text file to read
    :param count: Number of UTF-8 characters to remove from the end of the file
    :param ignore_newlines: Set to true, if the newline character at the end of the file should be ignored
    """
    with open(filename, 'rb+') as f:
        last_char = None

        size = os.fstat(f.fileno()).st_size

        offset = 1
        chars = 0
        while offset <= size:
            f.seek(-offset, os.SEEK_END)
            b = ord(f.read(1))

            if ignore_newlines:
                if b == 0x0D or b == 0x0A:
                    offset += 1
                    continue

            if b & 0b10000000 == 0 or b & 0b11000000 == 0b11000000:
                # This is the first byte of a UTF8 character
                chars += 1
                if chars == count:
                    # When `count` number of characters have been found, move current position back
                    # with one byte (to include the byte just checked) and truncate the file
                    f.seek(-1, os.SEEK_CUR)
                    f.truncate()
                    return
            offset += 1

How it works:

Reads only the last few bytes of a UTF-8 encoded text file in binary mode
Iterates the bytes backwards, looking for the start of a UTF-8 character
Once a character (different from a newline) is found, return that as the last character in the text file

Sample text file - bg.txt:

Здравей свят

How to use:

filename = 'bg.txt'
print('Before truncate:', open(filename).read())
truncate_utf8_chars(filename, 1)
print('After truncate:', open(filename).read())

Outputs:

Before truncate: Здравей свят
After truncate: Здравей свя

This works with both UTF-8 and ASCII encoded files.

score 10 · Answer 3 · answered Oct 15 '18 at 01:45

10

In case you are not reading the file in binary mode, where you have only 'w' permissions, I can suggest the following.

f.seek(f.tell() - 1, os.SEEK_SET)
f.write('')

In this code above, f.seek() will only accept f.tell() b/c you do not have 'b' access. then you can set the cursor to the starting of the last element. Then you can delete the last element by an empty string.

answered Oct 15 '18 at 01:45

metinsenturk

421
7
9

1

Or cleaner to `f.truncate()` instead of `f.write('')` at the end. – Julian Feb 13 '20 at 16:23

dawg · Answer 4 · 2013-09-17T18:59:55.240

6

with open(urfile, 'rb+') as f:
    f.seek(0,2)                 # end of file
    size=f.tell()               # the size...
    f.truncate(size-1)          # truncate at that size - how ever many characters

Be sure to use binary mode on windows since Unix file line ending many return an illegal or incorrect character count.

edited Sep 17 '13 at 18:59

answered Sep 17 '13 at 18:38

dawg

98,345
23
131
206

Coddy · Answer 5 · 2020-01-29T18:10:54.577

3

with open('file.txt', 'w') as f:
    f.seek(0, 2)              # seek to end of file; f.seek(0, os.SEEK_END) is legal
    f.seek(f.tell() - 2, 0)  # seek to the second last char of file; f.seek(f.tell()-2, os.SEEK_SET) is legal
    f.truncate()

subject to what last character of the file is, could be newline (\n) or anything else.

edited Jan 29 '20 at 18:10

answered Jan 24 '20 at 21:56

Coddy

549
4
18

Yes, but you haven't read the whole answer there. Look at the part labelled solution, the last code snippet. What is the *first* thing that that code does? – Martijn Pieters Jan 29 '20 at 17:36
Ah!! Got it, it has to be `f.seek(f.tell() - 2, 0)` – Coddy Jan 29 '20 at 17:44
1

And, more importantly, seeking to the very end first. – Martijn Pieters Jan 29 '20 at 18:07

score 1 · Answer 6 · answered Jul 26 '21 at 10:50

This may not be optimal, but if the above approaches don't work out, you could do:

with open('myfile.txt', 'r') as file:
    data = file.read()[:-1]
with open('myfile.txt', 'w') as file:
    file.write(data)

The code first opens the file, and then copies its content (with the exception of the last character) to the string data. Afterwards, the file is truncated to zero length (i.e. emptied), and the content of data is saved to the file, with the same name. This is basically the same as vins ms's answer, except that it doesn't use the os package, and that is used the safer 'with open' syntax. This may not be recommended if the text file is huge. (I wrote this since none of the above approaches worked out too well for me in python 3.8).

score 0 · Answer 7 · answered Aug 27 '17 at 09:23

0

here is a dirty way (erase & recreate)... i don't advice to use this, but, it's possible to do like this ..

x = open("file").read()
os.remove("file")
open("file").write(x[:-1])

answered Aug 27 '17 at 09:23

vins mv

46
7

Manually `open`ing files is not recommended, `with open` syntax is better. – FatihAkici Jan 24 '20 at 21:24

score 0 · Answer 8 · answered Jun 12 '20 at 11:49

On a Linux system or (Cygwin under Windows). You can use the standard truncate command. You can reduce or increase the size of your file with this command.

In order to reduce a file by 1G the command would be truncate -s 1G filename. In the following example I reduce a file called update.iso by 1G.

Note that this operation took less than five seconds.

chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
  File: update.iso
  Size: 30802968576     Blocks: 30081024   IO Block: 65536  regular file
Device: ee6ddbceh/4000177102d   Inode: 19421773395035112  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (1052727/   chris)   Gid: (1049089/Domain Users)
Access: 2020-06-12 07:39:00.572940600 -0400
Modify: 2020-06-12 07:39:00.572940600 -0400
Change: 2020-06-12 07:39:00.572940600 -0400
 Birth: 2020-06-11 13:31:21.170568000 -0400

chris@SR-ENG-P18 /cygdrive/c/Projects
$ truncate -s -1G update.iso

chris@SR-ENG-P18 /cygdrive/c/Projects
$ stat update.iso
  File: update.iso
  Size: 29729226752     Blocks: 29032448   IO Block: 65536  regular file
Device: ee6ddbceh/4000177102d   Inode: 19421773395035112  Links: 1
Access: (0664/-rw-rw-r--)  Uid: (1052727/   chris)   Gid: (1049089/Domain Users)
Access: 2020-06-12 07:42:38.335782800 -0400
Modify: 2020-06-12 07:42:38.335782800 -0400
Change: 2020-06-12 07:42:38.335782800 -0400
 Birth: 2020-06-11 13:31:21.170568000 -0400

The stat command tells you lots of info about a file including its size.

Remove very last character in file

8 Answers8

Linked

Related