Efficient way of finding item at index in array with joined count array

Question

I have an object that contains two arrays, the first is a slope array:

double[] Slopes = new double[capacity];

The next is an array containing the counts of various slopes:

int[] Counts = new int[capacity];

The arrays are related, in that when I add a slope to the object, if the last element entered in the slope array is the same slope as the new item, instead of adding it as a new element the count gets incremented.

i.e. If I have slopes 15 15 15 12 4 15 15, I get:

Slopes = { 15, 12, 4, 15 }
Counts = {  3,  1, 1,  2 }

Is there a better way of finding the i_th item in slopes than iterating over the Counts with the index and finding the corresponding index in Slopes?

edit: Not sure if maybe my question wasn't clear. I need to be able to access the i_th Slope that occurred, so from the example the zero indexed i = 3 slope that occurs is 12, the question is whether a more efficient solution exists for finding the corresponding slope in the new structure.

Maybe this will help better understand the question: here is how I get the i_th element now:

public double GetSlope(int index)
        int countIndex = 0;
        int countAccum = 0;
        foreach (int count in Counts)
        {
            countAccum += count;
            if (index - countAccum < 0)
            {
                return Slopes[countIndex];
            }
            else
            {
                countIndex++;
            }
        }
        return Slopes[Index];
}

I am wondering if there is a more efficient way?

`from the example the zero indexed i = 3 slope that occurs is 12` I don't understand what you mean by that. — user482594, Feb 02 '12 at 16:49
@user482594 When I add slopes to the structure, I add 15, 15, 15, 12, 4, 15, 15; the slope at index 3 is 12 (0=15 1=15 2=15 3=12). My issue is whether there is a more efficient way requesting the original indexed slope than iterating over counts. — NominSim, Feb 02 '12 at 16:52
I bet it would be much easier if the OP would just post a sample of the code that he has and how he's currently implementing, assigning , and Checking for the i_th position.. — MethodMan, Feb 02 '12 at 16:57
Is performance a necessary consideration here? How many slope values do you need to track? I'm thinking that you might make your code harder to grok by premature optimization... — Paul Sasik, Feb 02 '12 at 17:03
Potentially thousands of slopes per unit, on an array of thousands of units... — NominSim, Feb 02 '12 at 22:23

score 1 · Answer 1 · answered Feb 02 '12 at 16:52

If you are loading the slopes at one time and doing many of these "i-th item" lookups, it may help to have a third (or instead of Counts, depending on what that is used for) array with the totals. This would be { 0, 3, 4, 5 } for your example. Then you don't need to add them up for each look up, it's just a matter of "is i between Totals[x] and Totals[x + 1]". If you expect to have few slope buckets, or if slopes are added throughout processing, or if you don't do many of these look-ups, it probably will buy you nothing, though. Essentially, this is just doing all those additions at one time up front.

score 1 · Accepted Answer · answered Feb 02 '12 at 16:54

1

You could use a third array in order to store the first index of a repeated slope

double[] Slopes = new double[capacity];
int[] Counts = new int[capacity]; 
int[] Indexes = new int[capacity];

With

Slopes  = { 15, 12, 4, 15 }
Counts  = {  3,  1, 1,  2 } 
Indexes = {  0,  3, 4,  5 }

Now you can apply a binary search in Indexes to serach for an index which is less or equal to the one you are looking for.

Instead of having an O(n) search performance, you have now O(log(n)).

answered Feb 02 '12 at 16:54

Olivier Jacot-Descombes

104,806
13
138
188

That's an excellent idea, I wanted something that improved performance without having to write the original slopes in an entire array again. – NominSim Feb 02 '12 at 17:02

score 1 · Answer 3 · answered Feb 02 '12 at 16:54

1

you can always wrap your existing arrays, and another array (call it OriginalSlopes), into a class. When you add to Slopes, you also add to OriginalSlopes like you would a normal array (i.e. always append). If you need the i_th slope, look it up in OriginalSlopes. O(1) operations all around.

edit adding your example data:

Slopes = { 15, 12, 4, 15 }
Counts = {  3,  1, 1,  2 }
OriginalSlopes = { 15, 15, 15, 12, 4, 15, 15 }

answered Feb 02 '12 at 16:54

vlad

4,748
2
30
36

I kind of wanted to avoid the redundancy of storing slopes in two places, but this does increase the efficiency. There's a lot of data being stored however... – NominSim Feb 02 '12 at 17:01
@NominSim you can avoid storing doubles in `OriginalSlopes` by storing the index in `Slopes` where you find the actual value. Instead of my example in the answer, it would be `{ 0, 0, 0, 1, 2, 3, 3 }` – vlad Feb 02 '12 at 17:04
That would work, the issue I personally have is that there will be several thousand slopes entered, of which I don't expect many changes(i.e. why there is a count instead of just listing them), so in the end my solution may have Slopes = { 15 3 4 } Counts= { 950 234 232 } So even just the indices would take up a lot of extra space. – NominSim Feb 02 '12 at 17:08
@NominSim So you're saying that the slopes are often repeated. In this case, Olivier's answer is much more efficient in terms of space, and easily comparable in time. Note, however, that in the worst case scenario ( `Slopes = { 3, 4, 3, 4, ... }` ), you're no better off in space, and worse off in time. – vlad Feb 02 '12 at 17:14
Thanks, I don't expect the worst case scenario to happen that often if at all, the slopes are coming from a product that has been fairly extensively tested to remove that type of behaviour. – NominSim Feb 02 '12 at 17:17

user482594 · Answer 4 · 2012-02-02T23:05:17.460

1

In counts object (or array in your base), you add a variable that has the cumulative count that you have found so far.

Using the binary search with comparator method comparing the cumulative count you would be able to find the slope in O(log N) time.

edit

`Data = 15 15 15 12 4 15 15`
Slopes = { 15, 12, 4, 15 }
Counts = {  3,  1, 1,  2 }
Cumulative count = { 3, 4, 5, 7}

For instance, if you are looking for element at 6th position, when you search into the Cumulative count dataset and find value 5, and know next value is 7, you can be sure that element at that index will have 6th position element as well.

Use binary search to find element in log(N) time.

edited Feb 02 '12 at 23:05

answered Feb 02 '12 at 16:56

user482594

16,878
21
72
108

Not sure I understand what you mean, how would one variable of total count allow me to more efficiently find the i_th slope entered? – NominSim Feb 02 '12 at 17:05
The answer that you selected has array `indexes` which works exactly same as cumulative count. Oh well... – user482594 Feb 02 '12 at 22:54
Can you explain how a cumulative count would work? The indexes works because it shows which slope is covered by the index range, having one cumulative count doesn't help me get into a particular index does it? – NominSim Feb 02 '12 at 22:58

Paul Sasik · Answer 5 · 2012-02-02T17:00:33.423

0

EDIT: You could use a dictionary where the key is the slope and each key's value is a list of corresponding indexes and counts. Something like:

class IndexCount
{
    public int Index { get; set; }
    public int Count { get; set; }
}

Your collection declaration would look something like:

var slopes = new Dictionary<double, List<IndexCount>>();

You could then look up the dictionary by value and see from the associated collection what the count is at each index. This might make your code pretty interesting though. I would go with the list approach below if performance is not a primary concern.

You could use a single List<> of a type that associates the Slopes and Counts, something like:

class SlopeCount
{
    public int Slope { get; set; }
    public int Count { get; set; }
}

then:

var slopeCounts = new List<SlopeCount>();

// fill the list

edited Feb 02 '12 at 17:00

answered Feb 02 '12 at 16:29

Paul Sasik

79,492
20
149
189

1

The same efficiency problem occurs when I want the i_th item in Slopes. – NominSim Feb 02 '12 at 16:33
Paul doesn't your example only allow for utlizing the.Add method how would The OP determine whether he's adding a Slope or a Count..? just curious it would work if you were to create a new instance of SlopeCount then you could get a the Slope and Count.. – MethodMan Feb 02 '12 at 16:44
1

@DJKRAZE: The slope and count are associated with the SlopeCount class but this solution is not any more performant. Though it would make less code and better readability IMHO. – Paul Sasik Feb 02 '12 at 16:46
I got cha.. I was just trying out your example and was not sure if the OP even wanted to track both members.. your example is fine perhaps the question is a bit confusing at first.. +1 – MethodMan Feb 02 '12 at 16:51

score 0 · Answer 6 · answered Feb 02 '12 at 16:32

0

Why not a Dictionary<double, double> with the key being Slopes and the value being counts?

Hmm, double double? Now I need a coffee...

answered Feb 02 '12 at 16:32

Matt Grande

11,964
6
62
89

The issue is that I want to be able to access the i_th Slope that occured, putting them in a Dictionary won't allow that to my knowledge – NominSim Feb 02 '12 at 16:38
A dictionary requires unique keys. In the OP's sample data set the slope value of 15 is repeated. – Paul Sasik Feb 02 '12 at 16:39
Oh, sorry, I misunderstood. I didn't see the 15 was repeated. I think what you're doing already is best, unless there's a way you can store it in your original array. – Matt Grande Feb 02 '12 at 16:42

Efficient way of finding item at index in array with joined count array

6 Answers6