2

Say that I have a dataset:

Jday = datenum('2009-01-01 00:00','yyyy-mm-dd HH:MM'):1/24:...
    datenum('2009-01-05 23:00','yyyy-mm-dd HH:MM');
DateV = datevec(Jday);
DateV(4,:) = [];
DateV(15,:) = [];
DateV(95,:) = [];

Dat = rand(length(Jday),1)

How is it possible to remove all of the days that have less than 24 measurements. For example, in the first day there is only 23 measurements thus I would need to remove that entire day, how could I repeat this for all of the array?

Oleg
  • 10,406
  • 3
  • 29
  • 57
Emma
  • 618
  • 12
  • 26
  • Could you say something about what you are trying to accomplish? It's possible an entirely different approach will suit your task better. – Nigel Jul 11 '13 at 00:56
  • I'm trying to calculate the temperature range for a given day where 'Dat' is the temperature, in order to keep this consistent among all of the days, I only want to keep days that have 24 hourly measurements i.e. no missing data points. – Emma Jul 11 '13 at 01:54

2 Answers2

1

Rather long answer, but I think it should be useful. I would do this using containers.Map. Possibly there is a faster way, but maybe for now this one will be good.

Jday = datenum('2009-01-01 00:00','yyyy-mm-dd HH:MM'):1/24:...
    datenum('2009-01-05 23:00','yyyy-mm-dd HH:MM');

DateV = datevec(Jday);
DateV(4,:) = [];
DateV(15,:) = [];
DateV(95,:) = [];


% create a map
dateMap = containers.Map();



% count measurements in each date (i.e. first three columns of DateV)
for rowi = 1:1:size(DateV,1)

    dateRow = DateV(rowi, :);
    dateStr = num2str(dateRow(1:3));

    if ~isKey(dateMap, dateStr)
        % initialize Map for a given date with 1 measurement (i.e. our
        % counter of measuremnts
        dateMap(dateStr)  = 1;
        continue;
    end
    % increment measurement counter for given date
    dateMap(dateStr)  = dateMap(dateStr) + 1;
end


% get the dates
dateStrSet = keys(dateMap);




for keyi = 1:numel(dateStrSet)

    dateStrCell = dateStrSet(keyi);  
    dateStr = dateStrCell{1};

    % get number of measurements in a given date
    numOfmeasurements = dateMap(dateStr);

    % if less then 24 do something about it, e.g. save the date
    % for later removal from DateV
    if numOfmeasurements < 24
        fprintf(1, 'This date has less than 24 measurement: %s\n', dateStr);
    end
end

The results is:

This date has less than 24 measurement: 2009     1     1
This date has less than 24 measurement: 2009     1     5
Marcin
  • 215,873
  • 14
  • 235
  • 294
1

A quick solution is to group by year, month, day with unique(), then count observation per day with accumarray() and exclude those with less than 24 obs with two steps of logical indexing:

% Count observations per day
[unDate,~,subs] = unique(DateV(:,1:3),'rows');
counts = [unDate accumarray(subs,1)]
counts =
        2009           1           1          22
        2009           1           2          24
        2009           1           3          24
        2009           1           4          24
        2009           1           5          23

Then, apply criteria to the counts and retrieve logical index

% index only those that meet criteria
idxC = counts(:,end) == 24
idxC =
      0
      1
      1
      1
      0

% keep those which meet criteria (optional, for visual inspection)
counts(idxC,:)
ans =
        2009           1           2          24
        2009           1           3          24
        2009           1           4          24

Finally, find the members of Dat that fall into the selected counts with a second round of logical indexinf through ismember():

idxDat = ismember(subs,find(idxC))
Dat(idxDat,:)
Oleg
  • 10,406
  • 3
  • 29
  • 57
  • So, this gives you the individual days that have the entire 24 values, then in order to select 'Dat' for these days we should select the rows that are equal to these counts? – Emma Jul 11 '13 at 10:29
  • I edited the answer to include a second round of logical indexing that filters out `Dat`. – Oleg Jul 11 '13 at 10:41