0

I met a strange problem recently, hope someone here can help me out. I'm using Python2.7 in Ubuntu12.04, both python and OS are 64-bits.

In my code, I need to keep appending incoming data stream to a byte array, I use self.data += incomingdata to implement this, where incomingdata is the data I received from hardware devices. Then I will unpack the byte array some time later to parse the received data. The appending and parsing operations are all protected with lock.

The problem here is, when I use "+=" to append the byte stream, the data seems to be corrupted at some points (not happen consistently). There is no memory usage error, no overflow, etc. I monitored the memory usage of the program, it looks good.

Then, when I change "+=" to cStringIO.write to implement the appending operation, no problem at all, though it seems to be slower than the "+=" operation.

Can anyone tell me what is the exactly difference between cStringIo.write and "+=" when they are used to operate on byte streams? Will the "+=" operation cause any potential problems?

ypeng
  • 11
  • 2

1 Answers1

1

Instead of using += you might have better luck creating a list and appending the data to the end of it. When all the data has been fetched you can then do a ''.join(list) to create a single string. This will preform much better since string concatenations are inefficient.

When you concatenate two strings python has to allocate new memory to store the new string. If you are doing a significant amount of concatenations this can be really slow. As the size of the string grows the amount of time it takes to perform the concatenation will increase, and if you are fetching a large amount of data this way it can overwhelm the processor and cause other operations to be delayed.

I had a similar issue when I built a python process that reassembled the TCP stream. Every packet I captured I was adding to a string using concatenation. Once the string grew over a few MB the packet capturing library I was using started dropping frames because the CPU was spending a lot of time doing string concatenations. Once I switched to using a list and joining the result at the end the problem went away.

The reason that you do not have this problem with cStringIO.write is that it operates by creating a virtual file in memory and appends data to this file without having to reallocate space for a new string each time.

Nathan Villaescusa
  • 17,331
  • 4
  • 53
  • 56