Better algorithmic approach to showing trends of data per week

Question

Suppose I have a list of projects with start date and end date. I also have a range of weeks, which varies (could be over months, years, etc) I would like to display a graph showing 4 values per week:

projects started
projects closed
total projects started
total projects closed

I could loop over the range of weekly values, and for each week iterate through my list of projects and calculate values for each of these 4 trends per week. This would have algorithmic complexity O(nm), n is the length of list of weeks, and m is the length of projects list. That's not so great. Is there a more efficient approach, and if so, what would it be?

If it's pertinent, I'm coding in Java

n being the number of projects, the approach you suggested would have an algorithmic complexity of n*52 = O(n). Since you iterate over the list of projects once per week — yurib, Oct 30 '14 at 15:57
no because for each week I would have to iterate through my list of projects. So `O(nm)` actually, I edited my question — Bizmarck, Oct 30 '14 at 15:59
the number of weeks is 52, you can assign it any letter you want, it remains a constant — yurib, Oct 30 '14 at 16:00
it's not necessarily 52,I was trying to keep it simple, but it can vary. Edited my question again — Bizmarck, Oct 30 '14 at 16:02

score 3 · Answer 1 · answered Oct 30 '14 at 15:58

I'm not sure what the difference between "project" and "total" is, but here's a simple O(n log n) way to calculate the number of projects started and closed in each week:

For each project, add its start and end points to a list.
Sort the list in increasing order.
Walk through the list, pulling out time points until you hit a time point that occurs in a later week. At this point, "projects started" is the total number of start points you have hit, and "projects ended" is the total number of end points you have hit: report these counters, and reset them both to zero. Then continue on to process the next week.

Incidentally, if there are some weeks without any projects that start or end, this procedure will skip them out. If you want to report these weeks as "0, 0" totals, then whenever you output a week that has some nonzero total, make sure you first output as many "0, 0" weeks as it takes to fill in the gap since the last nonzero-total week. (This is easy to do just by setting a lastNonzeroWeek variable each time you output a nonzero-total week.)

Dialecticus · Accepted Answer · 2014-10-30T17:27:34.957

While it is true what user yurib has said there is a more efficient solution. Keep two arrays in memory projects_started and projects_ended, both with size 52. Loop through your list of projects and for each project increment corresponding value in both lists. Something like:

projects_started[projects[i].start_week]++;
projects_ended[projects[i].end_week]++;

After the loop you have all the data you need to make a graph. Complexity is O(m).

EDIT: okay, so maximum number of weeks can vary apparently, but if it's smaller than some ludicrous number (more than say a million) then this algorithm still works. Just replace 52 with n. Time complexity is O(m), space complexity is O(n).

EDIT: in order to determine the value of total projects started and ended you have to iterate through the two arrays that you now have and just add up the values. You could do this while populating the graph:

for (int i = 0; i < n)
{
    total_started_in_this_week += projects_started[i];
    total_ended_in_this_week += projects_ended[i];
    // add new item to the graph
}

It's the totals per week that I am not getting here. Each week I need two things: the projects that started (or closed) that week and the total that are open (or closed) that week. To know the total open at any given week, I need to go through my list of projects completely for each week, no? — Bizmarck, Oct 30 '14 at 16:23
Ok, but one variation: to know the total open (or active) in any given week I need to also subtract the total of those closed up to that same week. But I see your point now. Thanks — Bizmarck, Oct 30 '14 at 17:41

score 1 · Answer 3 · answered Oct 30 '14 at 16:05

First of all, I guess that actually performance won't be an issue; this looks like a case of "premature optimization". You should first do it, then do it right, then do it fast.

I suggest you use maps, which will make your code more readable and outsources implementation details (like performance).

Create a HashMap from int (representing the week number) to Set<Project>, then iterate over your projects and for each one, put it into the map at the right place. After that, iterate over the map's key set (= all non-empty weeks) and do your processing for each one.

Better algorithmic approach to showing trends of data per week

3 Answers3