5

I need to save stream of elements in a size limited list. There may be duplicate elements in the stream but I need to just keep the unique ones. Also when the size of my list exceeds a specified limit, I need to remove the oldest element and add the new one.

I already have tried set and list. The problem with set is that it is not size limited and if I want to remove the oldest element I have no idea how to retrieve it because set is unordered; however, it solves the problem of uniqueness.

On the other hand list keeps the order of items, but I need to check for possible duplicates whenever I want to insert a new element and this can cost a lot of time. Also list is not size limited as well as set.

My third option could be collections.deque but I don't know if it keeps the order or not. And is there any way to keep the items in collections.deque unique?

These are examples of my codes for list:

ids = list()
for item in stream:
    if item not in ids:
        ids.append(item)
    if len(ids) >= len_limit:
        del ids[0]

and set:

ids = set()
for item in stream:
    ids.add(item)
    if len(ids) >= len_limit:
        ids.remove(list(ids)[0])
Farzin
  • 359
  • 1
  • 4
  • 21

3 Answers3

2

You can write your own class which keeps both deque ans set:

import collections


class Structure:
    def __init__(self, size):
        self.deque = collections.deque(maxlen=size)
        self.set = set()

    def append(self, value):
        if value not in self.set:
            if len(self.deque) == self.deque.maxlen:
                discard = self.deque.popleft()
                self.set.discard(discard)
            self.deque.append(value)
            self.set.add(value)

s = Structure(2)
s.append(1)
s.append(2)
s.append(3)
s.append(3)
print(s.deque)  # deque([2, 3], maxlen=2)
sanyassh
  • 8,100
  • 13
  • 36
  • 70
  • I don't think this is a very good idea, because it needs at more memory compared to one simple list – Farzin Sep 19 '19 at 10:14
  • @FarzinGhanbari but as you said `On the other hand list keeps the order of items, but I need to check for possible duplicates whenever I want to insert a new element and this can cost a lot of time`. This implementation is faster because of checking existance of value in set, not in list. – sanyassh Sep 19 '19 at 10:17
  • @FarzinGhanbari you need to choose between speed and memory consumption. – sanyassh Sep 19 '19 at 10:18
  • Is searching for an item in a set faster than searching it in a list in python? (because both list and set are the same size) – Farzin Sep 19 '19 at 10:20
  • 1
    @FarzinGhanbari yes, searching in a set is faster and is done in O(1). Searching in a list is done in O(n) where n is length of list. – sanyassh Sep 19 '19 at 10:21
  • What about collections.deque? Do they keep the order? – Farzin Sep 19 '19 at 10:21
  • @FarzinGhanbari yes they do, but searching in deque for an element is O(n) as in list. – sanyassh Sep 19 '19 at 10:25
  • 1
    if they do keep the order, isn't it better to use a deque and a set instead of using a list and a set? – Farzin Sep 19 '19 at 10:27
  • 1
    @FarzinGhanbari you are right. I rewrote my implementation to use deque instead of list. It must be more efficient than with list. – sanyassh Sep 19 '19 at 10:32
2

You may want to look into using the orderedset package. It is available via pip or conda. It is a very fast Cpython library that implements an ordered set.

pip install orderedset

or

conda install orderedset -c conda-forge

You can subclass OrderedSet to create an object that has a maximum number of elements.

from orderedset import OrderedSet

class DequeSet(OrderedSet):
    def __init__(self, *args, maxlen=0, **kwargs):
        if not isinstance(maxlen, int):
            raise TypeError('`maxlen` must be an integer.')
        if not maxlen>=0:
            raise ValueError('`maxlen` must not be negative.')
        self._maxlen = maxlen
        if maxlen:
            args = (args[0][-maxlen:],)
        super().__init__(*args, **kwargs)

    @property
    def maxlen(self):
        return self._maxlen

    def _checkpop(self):
        if not self._maxlen:
            return
        while self.__len__() > self._maxlen:
            self.pop(last=False)

    def __getattr__(self, attr):
        self._checkpop()
        return getattr(self, attr)

# this will truncate to the last 3 elements
ds = DequeSet('abcd', maxlen=3) 
ds 
3 returns:
DequeSet(['b', 'c', 'd'])

ds.add('e')
ds
# returns:
DequeSet(['c', 'd', 'e'])
James
  • 32,991
  • 4
  • 47
  • 70
0

I have created a simple queue,with a list. Also i have arranged the conditions in such a way that, there is less number of comparisions

class Queue:
  def __init__(self,size):
    self.elements = []
    self.max_size = size

  def put(self,elem):
    if(elem in self.elements):
      return
    elif(len(self.elements) < self.max_size):
      self.elements.append(elem)
    else:
      self.elements = self.elements[1:]+[elem]

  def __str__(self):
    return self.elements.__str__()


q=Queue(3)
q.put(1)
print(q)
q.put(2)
print(q)
q.put(2)
print(q)
q.put(3)
print(q)
q.put(3)
print(q)
q.put(4)
print(q) 
venkata krishnan
  • 1,961
  • 1
  • 13
  • 20