you have 8154741
tuples, that means your list, assuming 8 byte pointers, already contains 62 MB
of pointers to tuples.
Assuming each tuple contains two ascii strings in python2, thats another 124 MB
of pointers for each tuple.
Then you still have the overhead for the tuple and string objects, each object has a reference count, assuming that is a 8 byte integer you have another 186 MB
of reference count storage. That is already 372 MB
of overhead for the 46 MB
of data you would have with two 3 byte long strings in size 2 tuples.
Under python3 your data is unicode and may be larger than 1 byte per character too.
So yes it is expected this type of structure consumes an large amount of excess memory.
If your strings are all of similar length and the tuples all have the same length a way to reduce this is to use numpy string arrays. They store the strings in one continuous memory block avoiding the object overheads. But this will not work well if the strings vary in size a lot as numpy does not support ragged arrays.
>>> d = [("xxx", "yyy") for i in range(8154741)]
>>> a = numpy.array(d)
>>> print a.nbytes/1024**2
46
>>> print a[2,1]
yyy