Given multiple static data streams, how to design an optimal scheduling policy?

Question

I am trying to find an optimal scheduling policy when multiple static data streams are given. For example,

stream 0: 000---00--000

stream 1: 1111--11--11-11

stream 2: 22-222---2-2

...

Here, "1" means valid data processed in one cycle, "-" means one cycle stall. A dispatcher can only process valid data from one stream in each cycle. When one stream stalls, the dispatcher can always switch to another stream with valid data waiting. Or the dispatcher can make stream switching decisions with other scheduling policies.

For example, a strict round robin policy(follow order stream 0 1 2, even though the current selected stream is stalled or null) is shown below:

DispatchTimeLine: 012012012-1201201-01201201xx1

stream 0 time line:   0xx0xx0---xx0xx0--0xx0xx0xxxx  

stream 1 time line:   x1xx1xx1xx1--1xx1--1xx1-x1xx1 

stream 2 time line:   xx2xx2-x2xx2xx2---xx2-x2xxxxx

In this case, the dispatcher takes 29 cycles to process all the data. If using a greedy policy, a total of 26 cycles can be achieved. ("x" means waiting or idle.)

If the goal is the best performance (least number of total cycles), how to derive an optimal policy for the dispatcher? Is there any theoretical proof that is available for a more general case?

Below is a general description of this problem:

Assume there are N data point streams (0 to N-1), and each data point stream (Di) has its own static data point pattern (such as "valid, valid, valid...stall, stall,...", i.e., an interleaved sequence of "valid" and "stall" points. The number of "valid" is Vi, and the number of "stall" is Si. The temporal ordering within each stream is fixed.) There is no specific constraint for the value of N, Vi, and Si, and the data pattern for each data stream are also fixed during scheduling. In other words, there can be multiple, or a single data stream, and the composition and length of the data stream are not constrained.

Regarding the dispatcher, it can only process one data point from one stream in a cycle. When stream Di is stalled, the dispatcher can select other streams to go, and the stalling time of stream Di can be hidden when the dispatcher is processing other streams. Once the stalling is done for stream Di, it becomes eligible to be selected again. When all streams are in their stalls in a cycle, no data can be processed in this cycle, and a NOP will occupy this cycle in the dispatcher's time line.

The only goal here is the least total processing time (or say highest performance) of the dispatcher. There is no other requirements here, such as fairness among streams.

In intuition, I imagine greedy policy can be optimal in some cases, such as in the numerical example above. But I am not sure whether this policy is the best in all situations. I wonder if it can be proved theoretically? Or is there a systematic method that can aid the process of finding an optimal scheduling policy?

If I understand the model correctly, then the greedy policy is always optimal. Is there some other objective of interest? — David Eisenstat, Mar 25 '14 at 21:05
Thanks for your reply. I added some generalization at the end of the post to describe this problem better. I also think greedy is the best in some cases, but I am not sure if it works best in all cases, and how to prove it? Do you have any idea about it? — Blair, Mar 26 '14 at 00:48
Okay, I think I misunderstood. If stream 0 looks like 0---0, do I have to wait three cycles *after* consuming the first 0 to consume the second 0, regardless of whether the first consumption of a 0 was delayed? — David Eisenstat, Mar 26 '14 at 03:16
Yes. If there is only stream 0 in the system, the dispatcher has to wait for 3 cycles until it can pick the next valid data point to schedule. But if there are multiple streams in the system, for example, once stream 0 gets stuck, the dispatcher can select data points from other streams to process. After 3 cycles of processing data from other streams, stream 0 becomes eligible to be selected again, and that's why I say the stalling time can be hidden when there are always other eligible valid data to process. — Blair, Mar 26 '14 at 15:33
Let me rephrase that question: suppose that I have streams 0---0 and 1. If I start by consuming a 1 and then a 0, how long do I have to wait for the second 0? — David Eisenstat, Mar 26 '14 at 15:48
Sorry for the confusion. In your example, the schedule would be 10---0, i.e., the second 0 needs to wait for 3 cycles. If you start with 0, then the schedule would be like 01--0, i.e., the second 0 still waits for 3 cycles if only look at stream 0. But for the dispatcher, it seems that it only waits for 2 cycles. — Blair, Mar 27 '14 at 17:44
In general, we can say N data streams, and their lengths are just random but are known in advance. For example, I have a real case with 48 streams, and each of them are 1000 in length, but their patterns of "valid" and "stall" are different. — Blair, Mar 27 '14 at 21:32

score 0 · Accepted Answer · answered Mar 27 '14 at 21:56

I'm going to conjecture that finding an optimal schedule, in general, is NP-hard. It seems as though there ought to be a reduction from some partition problem that exploits the trickiness of scheduling when the available tasks are arriving very close to the throughput of the dispatcher. Certainly the various greedy algorithms that I've tried haven't worked out (what I didn't realize earlier is that each stream has a "buffer" of only one element).

There's a fixed-parameter algorithm for k streams each of length n that runs in time O(n^k) via dynamic programming. This is not useful to you when k = 48, so I won't describe it. I would suggest constraint programming instead, which has a chance to exploit the features of your particular input.

Here's one way to formulate your problem as a constraint program. For the j^th item in the i^th stream, let t_ij be the integer time at which processing for that item is finished. To force a stall of length k between the j^th and (j + 1)^th items in the i^th stream, constrain t_i(j+1) - t_ij > k. For all i, constrain t_i1 > 0. Put an all-different constraint on every single one of the variables and minimize their maximum.

There are various software packages that solve constraint programs. They use intelligent constraint propagation and branching strategies and should be much, much, MUCH faster than brute force. I don't feel as though I have enough experience to make specific recommendations, so find one that looks easy to use and reasonably mature.

Thanks!I checked theories,and also find it similar to some NP-hard problems.One similar abstraction may be Pm|prec,Pj=1,Mj|Cmax,i.e.,Pm means m parallel machines,1 of them processes valid points, m-1 of them process stall points(m-1 should be large).Data points from the same data stream are in-order, and thus they have precedence(prec). Every single point has a unit processing time of 1,so Pj=1.The objective function would be minimizing the max make-span time(the overall processing time),Cmax.I think you are right;I'd better go and look for heuristics and software packages to handle this. — Blair, Mar 28 '14 at 18:04

Given multiple static data streams, how to design an optimal scheduling policy?

1 Answers1