1

I want to create an algorithm but not sure how to start.

This algorithm will actually be a method that accepts an array of N objects with some of the attributes, createdAt, value. I will sort the array from older to new (createdAt) and then I have to find out how consistent the available data is, meaning, for every one hour do I have at least 5 records, and for every half an hour 2 records.

Example-testcode:

- (void) normalizeData:(NSArray*)records
{
// sort the records
NSArray* sortedRecords = [records sortWithCreatedAt];

// split all dates in the records, distinct them, and create a dictionary with a key for every date, for value create another dictionary with the hour as key and the records as the value.

NSArray* distinctDates = [sortedRecords valueForKeyPath:@"@distinctUnionOfObjects.createdAt"]; // should only consider month-day-year-hour
NSMutableDictionary* dictionary = [NSMutableDictionary dictionary];
for (NSDate* date in distinctDates)
    {
    NSString* stringDate = [date string]; 
    NSArray* recordsForDate = [sortedRecords valueForKeyPath:[NSString stringWithFormat:@"[collect].{createdAt=%@}.self", stringDate]]; // let's say you got them with this line
    [dictionary setObject:recordsForDate forKey:date];
    }

for (NSDate* keyDate in dictionary)
   {
   NSArray* records = [dictionary objectForKey:keyDate];
   Record* previousRecord = nil;
   for (Records* record in records)
      {
      // I'll have to keep the previous record and compare the time difference with the new
      NSInteger secondsAfterDate = 0;
      if (previousRecord)
         {
         secondsAfterDate = [record.createdAt timeIntervalSinceDate:previousRecord.createdAt];
         // add logic to create trend difference in a model that has for every hour of the records count, the records and suffice description
         // logic if the records count and timespan is suffice.

         }
      previousRecord = record;
      }
   }
}

I would appreciate any contribution to the process in the method.

Also the ultimate goal is to create a return (invoke a block handler) for every result of the records that processed. The logic should end with, 5 records at least per hour and a timespan between them under 15 minutes.

George Taskos
  • 8,324
  • 18
  • 82
  • 147
  • When you mean that you need 5 records per hour, do you mean between 11pm and 12pm, or that 5 separated (in "index" terms in sorted array) values time differences has to be less than an hour? – Larme Apr 01 '15 at 20:45
  • For every hour (11pm - 12pm) I need at least 5 records, with a timespan ~15 minutes. This way I can say I have enough data to process for a change in trend. – George Taskos Apr 01 '15 at 20:53

1 Answers1

0

Take the total length of time of record collection (difference between createdAt of first record and createdAt of last record) and discretize it into bins. Place each object in the appropriate bin. Then used a sliding window with two window sizes (30 minutes and 60 minutes). As you walk along the array, continually evaluate whether the conditions you describe are met.

Note that for the above approach it's important to properly define the bin width as the resolution of your timestamping process. Since you don't indicate this in your post, feel free to comment if this is a problem.

vrume21
  • 561
  • 5
  • 15