1

I am using SimPy to simulate a compute cluster. I want to implement a fair-share/proportional scheduling logic in Simpy, wherein the resource slots are proportionally distributed across remaining tasks based on their priority. E.g. if we submit two jobs with a priority of 100 and 10 tasks each, we expect these two jobs to finish roughly at the same time.

I have implemented a basic working version of this logic, however, it is not scalable, especially when we are expecting > 10million tasks/requests.

The main bottleneck is in the manipulation of the put_queue, which involves the following steps:

  1. Selecting the next pending request to process

  2. Deleting this request from the queue

  3. Moving it to the head of the queue

I have included a dummy example below focusing on the main concerns in steps 2 and 3.

My questions are:

  1. Without step 2. above, I get a RuntimeError with the following message <Request() object at 0x1da7348a2e0> has already been triggered, How can I let Simpy know to ignore the request if it has already been processed? This way, I can bypass step 2 and save on computational cost.

  2. Can I better implement steps 2 and 3 to make it scalable?

'''

import simpy
import random

def awesome_proportional_logic(task_queue):
    request_to_process_next = random.choice(task_queue)
    return request_to_process_next

def process_task(env, slots, task_duration=1):
    with slots.request() as req:
        yield req
        print(f"Task started at {env.now}")
        yield env.timeout(task_duration)
        print(f"Task finished at {env.now}")

        if slots.put_queue:
            # Manipulation of put queue
            # Step 1: Select next request to process
            request_to_process_next = awesome_proportional_logic(slots.put_queue)

            # Step 2: Remove it from existing queue - Takes very long
            slots.put_queue.remove(request_to_process_next)

            # Step 3: Add the selected request to head of the queue
            slots.put_queue.insert(0, request_to_process_next)

if __name__=='__main__':
    env = simpy.Environment()
    slots = simpy.Resource(env, capacity = 2)
    N_tasks = 100

    for i in range(N_tasks):
        env.process(process_task(env, slots))

    env.run(until=100)'''
Mehtab Pathan
  • 443
  • 4
  • 15

1 Answers1

1

So here is a solution where I updated the Resource class to skip requests that have already been processed. I do not like circumventing a class checks because I do not know what side effects I may be causing. A better option might be to replace the put queue with a linked list. There is a llink package, but I have not used it so I do not know how well it performs.

"""
    Patch to fix already triggered error that occures
    when a request is copied to the head of the request queue
    but not removed from its original position.  

    The problem is that request is being processed twice.

    the quick solution is to ignore requests that have already been triggered
    by updating the Resource class 
    Note: not sure how robutst this is

    Programmer: Michael R. Gibbs

"""


import simpy
import random
from typing import (
    TYPE_CHECKING,
    ClassVar,
    ContextManager,
    Generic,
    MutableSequence,
    Optional,
    Type,
    TypeVar,
    Union,
)

from simpy.resources.base import  GetType

class My_Resource(simpy.Resource):
    """
        Updated resource class that allows duplicate resource requests
        by ignoring the second request
    """

    def _trigger_put(self, get_event: Optional[GetType]) -> None:
        """
            bypass the checks to see if event has already been triggered
        """

        idx = 0
        while idx < len(self.put_queue):
            put_event = self.put_queue[idx]
            proceed = True
            if not put_event.triggered: # added line to skip processing if already triggered
                proceed = self._do_put(put_event)
            if not put_event.triggered:
                idx += 1
            elif self.put_queue.pop(idx) != put_event:
                raise RuntimeError('Put queue invariant violated')

            if not proceed:
                break


def awesome_proportional_logic(task_queue):
    request_to_process_next = random.choice(task_queue)
    return request_to_process_next

def process_task(env, slots, task_duration=1):
    with slots.request() as req:
        yield req
        print(f"Task started at {env.now}")
        yield env.timeout(task_duration)
        print(f"Task finished at {env.now}")

        if slots.put_queue:
            # Manipulation of put queue
            # Step 1: Select next request to process
            request_to_process_next = awesome_proportional_logic(slots.put_queue)

            # Step 2: Remove it from existing queue - Takes very long
            #slots.put_queue.remove(request_to_process_next)

            # Step 3: Add the selected request to head of the queue
            slots.put_queue.insert(0, request_to_process_next)

if __name__=='__main__':
    env = simpy.Environment()
    slots = My_Resource(env, capacity = 2)
    N_tasks = 100

    for i in range(N_tasks):
        env.process(process_task(env, slots))

    env.run(until=100)
Michael
  • 1,671
  • 2
  • 4
  • 8