-2

I have multiple identical length collections, one timestamp collection of type List<DateTime> and several data collections of type List<double>. The values at each index position in the List<double correspond to the respective index position in List<DateTime>.

I want to be able to compress the data in all data collections by a TimeSpan that is applied to the List<DateTime> and groups the timestamps into TimeSpan bins and applies the same grouping to each data collection.

Here is how I currently "compress" a time series of time stamps:

var someTimeStamps = new List<DateTime>(); 
                var compression = TimeSpan.FromHours(1).Ticks;
                var compressedTimeStamps = from rawData in someTimeStamps
                    group rawData by rawData.Ticks / numberTicks
                    into tickData
                    select new DateTime(tickData.Key * compression);

How can I adjust the code in order to have the same groupings apply to the data collections List<double> as well? I want to apply a grouping logic of averaging the values within each data group. I am aiming for computational efficiency, memory consumption is not an issue I look to optimize at this point.

For example:

List<DateTime> items: (for simplicity purpose the order of the values below is (year, month, day, hour, minute, second):

(1) 2018, 8, 14, 08, 20, 05 (2) 2018, 8, 14, 08, 45, 25 (3) 2018, 8, 14, 09, 02, 53 (4) 2018, 8, 14, 09, 34, 12 (5) 2018, 8, 14, 09, 44, 12

List<value> items:

(1) 12 (2) 15 (3) 27 (4) 03 (5) 12

Applying a compression of TimeSpan.FromHours(1) the desired outcome for both collections is :

List<DateTime> items:

(1) 2018, 8, 14, 08, 00, 00 (2) 2018, 8, 14, 09, 00, 00

List<double> items (averaging is applied to the items in each group)

(1) 13.5 (avg of 12 and 15) (2) 14 (avg of 27, 3, and 12)

Matt
  • 7,004
  • 11
  • 71
  • 117
  • `Zip` allows you to project two enumerables into a single (new) enumerable. So you create a new anonymous type with two properties - one being the `DateTime` and one being the `double`. Then do whatever you need to do from there. – mjwills Aug 14 '18 at 06:10
  • @mjwills, that makes sense, thanks for elaborating, however, if I wanted to apply the grouping to multiple data series, is there away to do that in one call or would I have to run the projection and grouping multiple times? – Matt Aug 14 '18 at 06:16
  • https://github.com/morelinq/MoreLINQ/blob/master/MoreLinq/EquiZip.cs can be used if you want to `Zip` over more than two series. – mjwills Aug 14 '18 at 06:27
  • Thank you @mjwills, I will take a look. And yes, despite the question only using one data list, my intent is to apply it to multiple data list with one matching length time stamp collection. I used one data collection in my question for simplification purposes as most solutions most likely scale in this particular regards – Matt Aug 14 '18 at 06:50

2 Answers2

1

You can do it by below code

List<DateTime> dateTimes = new List<DateTime>();
dateTimes.Add(new DateTime(2018, 8, 14, 08, 20, 05));
dateTimes.Add(new DateTime(2018, 8, 14, 08, 45, 25));
dateTimes.Add(new DateTime(2018, 8, 14, 09, 02, 53));
dateTimes.Add(new DateTime(2018, 8, 14, 09, 34, 12));
dateTimes.Add(new DateTime(2018, 8, 14, 09, 44, 12));

List<int> ints = new List<int>();
ints.Add(12);
ints.Add(15);
ints.Add(27);
ints.Add(03);
ints.Add(12);



var averages = dateTimes.Select((k, v) => new { k, v })
                        .GroupBy(x => new DateTime(x.k.Year, x.k.Month, x.k.Day, x.k.Hour, 0, 0))
                        .ToDictionary(g => g.Key, g => g.Select(x => ints.ElementAt(x.v)).Average());

Output:

enter image description here

Edit:

If you want your data to be separated into two list like List<DateTime> and List<double> then you can project above dictionary to separated list of keys and values. like

 List<DateTime> dateTimeList = averages.Keys.ToList();
 List<double>  valuesList = averages.Values.ToList();

If I understood you correctly

expand that problem to one time stamp series but multiple data series

var grouped = dateTimes
              .Zip(ints, (k, v) => new { k, v })
              .GroupBy(g => new DateTime(g.k.Year, g.k.Month, g.k.Day, g.k.Hour, 0, 0), g => g.v);

The above code gives you the compression of your datetime and wrt mulptiple data series

Try once may it help you.

er-sho
  • 9,581
  • 2
  • 13
  • 26
  • @Matt Wolf, view the answer might be it help you :) – er-sho Aug 14 '18 at 06:19
  • I need the resulting series in separate collection, a resulting `List` and `List` , and expand that problem to one time stamp series but multiple data series – Matt Aug 14 '18 at 06:20
  • Temporarily transform the multiple collections into a sequence of elements containing the values across those collections, group that, then split it apart. You can also transform the main collection into an "index+element" object, group this by the object, extract the indices from the groups and use that to piece together the groups for the other collections. Other than that you will have to implement the grouping logic yourself. – Lasse V. Karlsen Aug 14 '18 at 06:24
  • @Lasse Vågsæther Karlsen, yes trying to do – er-sho Aug 14 '18 at 06:33
  • @Matt Wolf, please view edit section in answer might be it help you :) – er-sho Aug 14 '18 at 07:04
  • It does not really address my problem. I already posted code that groups time stamps by timespan and other users pointed to Enumerable.Zip. – Matt Aug 14 '18 at 08:22
  • yes `var grouped` gives you the timestamps in groups and multiple data series of each timestamp – er-sho Aug 14 '18 at 09:58
0

I decided to go with a classic iteration over each data point as it only requires one single iteration regardless of the number of data collections (credits to a friend of mine who suggested to profile this approach):

public void CompressData(TimeSpan compression)
    {
        //declare generic buffer value function (bid/ask here)
        var bufferFunction = new Func<int, double>(index => (Bid[index] + Ask[index]) / 2);

        Open = new List<double>();
        High = new List<double>();
        Low = new List<double>();
        Close = new List<double>();
        var lastCompTs = -1L;
        var dataBuffer = new List<double>();
        var timeStamps = new List<DateTime>();

        for (int i = 0; i < TimeStamps.Count; ++i)
        {
            var compTs = TimeStamps[i].Ticks / compression.Ticks;
            if (compTs == lastCompTs)
            {
                //same timestamp -> add to buffer
                dataBuffer.Add(bufferFunction(i));
            }
            else
            {
                if (dataBuffer.Count > 0)
                {
                    timeStamps.Add(new DateTime(compTs * compression.Ticks));
                    Open.Add(dataBuffer.First());
                    High.Add(dataBuffer.Max());
                    Low.Add(dataBuffer.Min());
                    Close.Add(dataBuffer.Last());
                }

                lastCompTs = compTs;
                dataBuffer.Clear();
            }
        }

        if (dataBuffer.Count > 0)
        {
            timeStamps.Add(new DateTime(lastCompTs * compression.Ticks));
            Open.Add(dataBuffer.First());
            High.Add(dataBuffer.Max());
            Low.Add(dataBuffer.Min());
            Close.Add(dataBuffer.Last());
        }

        //assign time series collection
        TimeStamps = timeStamps;
    }
Matt
  • 7,004
  • 11
  • 71
  • 117