10

how can I manage a huge list of 100+ million strings? How can i begin to work with such a huge list?

example large list:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"
scott
  • 1,531
  • 2
  • 16
  • 29
  • 1
    The best way to handle 100 million strings is to put them in a database. But enumerating 100 million strings here can be avoided. – Waleed Khan Mar 07 '13 at 23:29
  • Do you really need to **store** all hands, or would an iterator work? – Junuxx Mar 07 '13 at 23:29
  • 1
    I would recommend NOT using strings. It looks like you're representing cards with a 2-byte string. You can much more efficiently use an integer to represent each card – TJD Mar 07 '13 at 23:30
  • i was thinking of converting them to md5 and storing them in hashtable for qucik lookup but im kinda new to python, there are many things i would like to do to them, would storing them on disk in a data file work or would i have to resort to sql db? hmm iterator sounds good i goto learn more about this thanks. ahhh yes nice idea TJD, i can use a array and that is alot better on memory!!! – scott Mar 07 '13 at 23:30

5 Answers5

11

With lots and lots of memory. Python lists and strings are actually reasonably efficient, so provided you've got the memory, it shouldn't be an issue.

That said, if what you're storing are specifically poker hands, you can definitely come up with more compact representations. For example, you can use one byte to encode each card, which means you only need one 64 bit int to store an entire hand. You could then store these in a NumPy array, which would be significantly more efficient than a Python list.

For example:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]

And to speed up that last line a bit: hands[num] = map(cards_to_bytes.__getitem__, hand)

This will only require 7 * 133784560 = ~1gb of memory… And that could be cut down if you pack four cards into each byte (I don't know the syntax for doing that off the top of my head…)

David Wolever
  • 148,955
  • 89
  • 346
  • 502
  • I have 8g memory surly that would be enough to do a len on the list in the above example? did you find it ran for you correctly? may I ask how much memory do you have?, its a good point about the array but i dont wanna use library other then what comes standard with python, maybe i will have to. – scott Mar 07 '13 at 23:39
  • Would doing it with array be much faster then a sql db?, thanks kindly for your example – scott Mar 07 '13 at 23:45
  • Quite significantly faster. If all you need to do is indexed lookups (ex, "give me hand 10"), absolutely nothing will be faster than a large array. – David Wolever Mar 07 '13 at 23:47
  • That said, if that's all you need to do, I believe there is an algorithm for simply returning a particular combination from a set of combinations… I don't know it off the top of my head, but I'm sure you could find it or ask about it. – David Wolever Mar 07 '13 at 23:48
9

If you just want to loop over all possible hands to count them or to find one with a certain property, there is no need to store them all in memory.

You can just use the iterator and not convert to a list:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."

85900584 hand combinations in texas holdem poker.

Junuxx
  • 14,011
  • 5
  • 41
  • 71
3

Another memory-less option which allow you to create a stream of data for processing however you like is to use generators. For example.

Print the total number of hands:

sum (1 for x in combinations(cards, 7))

Print the number of hands with the ace of clubs in it:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)
Andrew Prock
  • 6,900
  • 6
  • 40
  • 60
1

There's often a trade-off between how long you spend coding and how long your code takes to run. If you're just trying to get something done quickly and don't expect it to run frequently, an approach like you're suggesting is fine. Just make the list huge -- if you don't have enough RAM, your system will churn virtual memory, but you'll probably get your answer faster than learning how to write a more sophisticated solution.

But if this is a system that you expect to be used on a regular basis, you should figure out something other than storing everything in RAM. An SQL database is probably what you want. They can be very complex, but because they are nearly ubiquitous there are plenty of excellent tutorials out there.

You might look to a well-documented framework like django which simplifies access to a database through an ORM layer.

Leopd
  • 41,333
  • 31
  • 129
  • 167
0

My public domain OneJoker library has some combinatoric functions that would be handy. It has an Iterator class that can give you information about the set of combinations without storing them or even running though them. For example:

  import onejoker as oj
  deck = oj.Sequence(52)
  deck.fill()

  hands = oj.Iterator(deck, 5)    # I want combinations of 5 cards out of that deck

  t = hands.total                 # How many are there?
  r = hands.rank("AcKsThAd3c")    # At what position will this hand appear?
  h = hands.hand_at(1000)         # What will the 1000th hand be?

  for h in hands.all():           # Do something with all of them
     dosomething(h)               

You could use the Iterator.rank() function to reduce each hand to a single int, store those in a compact array, then use Iterator.hand_at() to produce them on demand.

Lee Daniel Crocker
  • 12,927
  • 3
  • 29
  • 55