3

SImple question but I haven't been able to find a simple answer.

I have a list of data which counts the time in seconds that events occur:

[200.0 420.0 560.0 1100.0 1900.0 2700.0 3400.0 3900.0 4234.2 4800.0 etc..]

I want to count how many events occur each hour (3600 seconds) and create a new list of these counts.

I understand this is called downsampling, but all the information I can find is related to traditional time series.

For the example above the new list would look like:

[7 3 etc..]

Any help would be greatly appreciated.

Harry Munro
  • 304
  • 2
  • 12

3 Answers3

1
all_events = [
    200.0, 420.0, 560.0, 1100.0, 1900.0, 2700.0, 3400.0, 3900.0, 4234.2, 4800.0]

def get_events_by_hour(all_events):
    return [
        len([x for x in all_events if int(x/3600.0) == hour]) 
        for hour in xrange(24)
    ]

print get_events_by_hour(all_events)

Note that all_events should contain events for one day.

Eugene Soldatov
  • 9,755
  • 2
  • 35
  • 43
  • Great, thanks for the help. Any way to remove the hour stamp in the output, that is, the "0:" "1:" part of the output? – Harry Munro Feb 10 '15 at 11:47
1

The act of sampling means taking data f_i (samples) at certain discrete times t_i. The number of samples per time unit gives the sampling rate. Downsampling is a special case of resampling, which means mapping the sampled data onto a different set of sampling points t_i', here onto one with a smaller sampling rate, making the sample more coarse.

Your first list is containing sample points t_i (unit is seconds), and indirectly the number of events n_i which corresponds to the index i, for example n_i = i + 1.

If you reduce the list once in a while, after a periodic time T (unit is seconds), you are resampling to a new set n_i' at times t_i' = i * T. I did not write downsampling, because nothing might happen within an the time T, which means upsampling, because you take more data points now.

For calculation you check if the input list is empty, in that case n' = 0 should go into your output list. Otherwise you have m entries in your input list, measured over time T and you can use the below equation:

n' = m * 3600 / T

The above n' would go into your output list, this is scaled to events per hour.

mvw
  • 5,075
  • 1
  • 28
  • 34
1

The question has the scipy tag, and scipy depends on numpy, so I assume an answer using numpy is acceptable.

To get the hour associated with a timestamp t you can take the integer part of t/3600. Then, to get the number of events in each hour, you can count the number of occurrences of these integers. The numpy function bincount can do that for you.

Here's a numpy one-liner for the calculation. I put the timestamps in a numpy array t:

In [49]: t = numpy.array([200.0, 420.0, 560.0, 1100.0, 1900.0, 2700.0, 3400.0, 3900.0, 4234.2, 4800.0, 8300.0, 8400.0, 9500.0, 10000.0, 14321.0, 15999.0, 16789.0, 17000.0])

In [50]: t
Out[50]: 
array([   200. ,    420. ,    560. ,   1100. ,   1900. ,   2700. ,
         3400. ,   3900. ,   4234.2,   4800. ,   8300. ,   8400. ,
         9500. ,  10000. ,  14321. ,  15999. ,  16789. ,  17000. ])

Here's your calculation:

In [51]: numpy.bincount((t/3600).astype(int))
Out[51]: array([7, 3, 4, 1, 3])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214