12

I have a huge str of ~1GB in length:

>>> len(L)
1073741824

I need to take many pieces of the string from specific indexes until the end of the string. In C I'd do:

char* L = ...;
char* p1 = L + start1;
char* p2 = L + start2;
...

But in Python, slicing a string creates a new str instance using more memory:

>>> id(L)
140613333131280
>>> p1 = L[10:]
>>> id(p1)
140612259385360

To save memory, how do I create an str-like object that is in fact a pointer to the original L?

Edit: we have buffer and memoryview in Python 2 and Python 3, but memoryview does not exhibit the same interface as an str or bytes:

>>> L = b"0" * 1000
>>> a = memoryview(L)
>>> b = memoryview(L)
>>> a < b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: memoryview() < memoryview()

>>> type(b'')
<class 'bytes'>
>>> b'' < b''
False
>>> b'0' < b'1'
True
vz0
  • 32,345
  • 7
  • 44
  • 77
  • Can't you just do `id(L[10:])` instead of making a new variable? – HarryCBurn Nov 19 '14 at 09:37
  • @Iplodman That still creates a new str slice, using memory, and then discards the temporary slice. – vz0 Nov 19 '14 at 09:38
  • python has a buffer type see http://stackoverflow.com/questions/3422685/what-is-python-buffer-type-for – gabber Nov 19 '14 at 09:44
  • 1
    @gabber There is no buffer() in Python 3. The replacement is memoryview which is not exactly the same, does not behaves like a bytes str. – vz0 Nov 19 '14 at 09:50
  • @vz0 I think I forgot to reply (sorry!) I'm not exactly a Python guru, so thanks! – HarryCBurn Dec 19 '14 at 23:15
  • I am baffled how such a large and "batteries-included" language doesn't provide an out-of-the-box solution to your quite basic and often encountered question! – Vorac Dec 18 '17 at 12:57
  • You asked for a "str-like object" but you've tagged both Python 2 and Python 3. Could you clarify whether you want a **Python 2 str-like object** or a **Python 3 str-like object**? – wim Feb 09 '18 at 19:31

2 Answers2

7

There is a memoryview type:

>>> v = memoryview('potato')
>>> v[2]
't'
>>> v[-1]
'o'
>>> v[1:4]
<memory at 0x7ff0876fb808>
>>> v[1:4].tobytes()
'ota'
wim
  • 338,267
  • 99
  • 616
  • 750
4

If you need to work on a string, use iterators to actually access the data without duplicating the content in memory

Your tool of trade would be itertools.tee and itertools.islice

>>> L = "Random String of data"
>>> p1, p2 = tee(L)
>>> p1 = islice(p1,10,None)
>>> p2 = islice(p2,15,None)
>>> ''.join(p1) # This now creates a copy now
'ing of data'
>>> ''.join(p2) # This now creates a copy now
'f data'

This in literal sense yield a pointer, unlike in C/C++, it is just a forward pointer/iterator

Note Off-course you need to take due diligence in using the forward iterators namely

  1. To save the pointer before advancing. itertools.tee would be useful here as in p1, p_saved = tee(p1)
  2. You can read as a character next(p1) or as a string ''.join(p1), but because python string is not mutable, every time you need a string view, you would be presented as a copy.
  3. As you can read as a single characters, all your algorithms should leverage the iterable capabilities rather than generating the string. For example to compare two itertors, instead of comparing the content ''.join(p1) == ''.join(p2), you need to do the following all(a == b for a, b in izip(p1, p2))
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • @vz0: It seems, my answer is still valid after your edit – Abhijit Nov 19 '14 at 09:54
  • 1
    No it does not. You can not compare < two tee or islice objects. – vz0 Nov 19 '14 at 09:57
  • @vz0: Yes you can. `all(a == b for a, b in izip(p1, p2))` – Abhijit Nov 19 '14 at 10:00
  • That is not the less than "<" operation. – vz0 Nov 19 '14 at 10:04
  • @vz0: For less than operator, replace `==` with `<`. If required use izip_longest instead of izip. I am not getting where you want to take this? – Abhijit Nov 19 '14 at 10:06
  • 4
    This is not a "str-like" object. As my question says, I want to use something that looks like a normal string taken as a slice of a bigger string. With your approach I have to rethink and rewrite all the string operations again in Pure python. In C is as easy as taking a pointer from the start of the char*. – vz0 Nov 19 '14 at 10:09
  • 3
    @vz0: Actually you have to create a wrapper ( a class with all methods of string) and use that wrapper everywhere instead of the string. This seems to be a trivial extension of the idea , so am really wondering now, where is this going to – Abhijit Nov 19 '14 at 10:11
  • @vz0 There is no such thing as a pointer in python, we have only namespaces (mapping names <-> objects) and of course strings are immutable. If you want pointers, use C. – wim Nov 19 '14 at 10:53
  • I think this may not be a good alternative because, literally "This itertool may require significant auxiliary storage". – gatopeich Feb 17 '19 at 23:43
  • If you're going to make a wrapper class, there's no point in using iterators at all: you can make a namedtuple with the original string and two offsets into it. – Clément Dec 12 '20 at 06:20