0

I am working huge amount of data which I have recently collected for my research. I would like to get to know what time of the day user activity happened. Here is the small amount of my dataset snapshot:

enter image description here

I applied to following code to get to know time interval login activity.

=SUMPRODUCT(--(HOUR(I2:I185143)>=K4*24),--(HOUR(I2:I185143)<L4*24))

The result is written in the Column N. Everything so far, works I believe. However my question is what if I want to analyze in every 10 min rather than every one hour.

for example how many login happened between 8.00 am-8.10 am. My current approach to find every hour interval. I would be glad if you could help me out in excel or matlab.

enter image description here

Amro
  • 123,847
  • 25
  • 243
  • 454
  • Here is another related question: [MATLAB: compute mean of each 1-minute interval of a time-series](http://stackoverflow.com/q/2323031/97160) – Amro May 24 '14 at 13:49

3 Answers3

0

this is relatively simple in excel if you use the array function Frequency()

some steps to take:

i assume that you want to sort by time of the day for every day

so the time column is the data_array, the first parameter for Frequency (check out excel help)

for the bin data you will have to get a with a start time at its head add for the following bins below 10 minutes each (= 1/24/6) to make 6th of an hour out of days

so a total of 144 bins for a full day

then select the column segment next to your bin and array edit it with =Frequency(data_array, bin_array) and don't forget to enter it with

0

Ok, I have got the answer in the following screenshoot.

enter image description here

0

Here is a possible MATLAB solution.

1) Fake data

To test my code, I first had to generate some fake data resembling your actual one. You can skip this section when working with the real data.

len = 50;

%# random dates inbetween these two
dt_start = datenum('2013-06-22 19:17:00');
dt_end = datenum('2013-06-22 19:23:00');
dt = sort(dt_start + rand(len,1)*(dt_end-dt_start));

%# create the columns
ID = repmat(int32(104885), [len 1]);
Date = cellstr(datestr(dt, 'mm/dd/yyyy'));
Time = cellstr(datestr(dt, 'HH:MM:SS'));
EE = rand(len,1);

%# combine columns into a cell array
C = [num2cell(ID), Date, Time, num2cell(EE)];

%# create a "table" for convenience, and export to CSV file
t = cell2table(C, 'VariableNames',{'ID', 'Date', 'Time', 'EE'})
writetable(t, 'data.csv')

%# cleanup
clear len dt_start dt_end dt ID Date Time EE C t

Here is an extract of the data I generated for this example:

t = 
      ID          Date           Time          EE    
    ______    ____________    __________    _________
    104885    '06/22/2013'    '19:17:19'      0.95808
    104885    '06/22/2013'    '19:17:22'      0.72305
    104885    '06/22/2013'    '19:17:31'      0.86481
    104885    '06/22/2013'    '19:17:33'      0.52325
    .
    .
    104885    '06/22/2013'    '19:22:37'       0.5167
    104885    '06/22/2013'    '19:22:39'      0.53815
    104885    '06/22/2013'    '19:22:41'      0.27151
    104885    '06/22/2013'    '19:22:54'      0.37826
    104885    '06/22/2013'    '19:22:59'      0.51215

2) Frequency count in time intervals

I read the data from CSV file, then apply a process similarly to what was used in the post you linked to. In this case, the time interval length is a parameter you can specify (10 minutes interval, 1 hour, etc..)

%# load data from CSV
t = readtable('data.csv', 'Format','%d %s %s %f', 'Delimiter',',', ...
    'ReadVariableNames',true, 'FileType','text');

%# convert date/time columns to serial date number
dt = datenum(strcat(t.Date , {' '}, t.Time), 'mm/dd/yyyy HH:MM:SS');

%# desired interval window-size (expressed in units of days).
%# Here I am using a 2 minutes interval
interval = 2/(24*60);    % (24 hours per day, 60 min per hour)

% bin datetimes into specified intervals
dt_binned = fix(dt/interval)*interval;

% count frequencies in each interval
[dt_unique,~,dt_unique_idx] = unique(dt_binned);
counts = accumarray(dt_unique_idx, 1);
freq = [cellstr(datestr(dt_unique)) num2cell(counts)]

The result:

freq = 
    '22-Jun-2013 19:16:00'    [ 7]
    '22-Jun-2013 19:18:00'    [21]
    '22-Jun-2013 19:20:00'    [12]
    '22-Jun-2013 19:22:00'    [10]

So we had 7 events occurred in the first interval (from 19:16 to 19:18), 21 events in the second event, and so on. You can easily adapt the code to change the interval length.

Community
  • 1
  • 1
Amro
  • 123,847
  • 25
  • 243
  • 454