0

I have a cell array in which some of the entries have two data points. I want to average the two data points if the data were collected on the same day.

The first column of cell array 'site' is the date. The fourth column is the data concentration. I want to average the fourth column if the data comes from the same day.

For example, if my cell array looks like this:

01/01/2011  36-061-0069   1   10.4
01/01/2011  36-061-0069   2   10.1
01/04/2011  36-061-0069   1   7.9
01/05/2011  36-061-0069   1   13

I want to average the fourth column (10.4 and 10.1) into one row and leave everything else the same.

Help? Would an if elseif loop work? I'm not sure how to approach this issue, especially since cell arrays work a little differently than matrices.

SugaKookie
  • 780
  • 2
  • 17
  • 41

2 Answers2

3

You can do it succinctly without a loop, using a combination of unique, diff and accumarray.

Define data:

data = {'01/01/2011'  '36-061-0069'  '1'  '10.4';
        '01/01/2011'  '36-061-0069'  '2'  '10.1';
        '01/04/2011'  '36-061-0069'  '1'  '7.9';
        '01/05/2011'  '36-061-0069'  '1'  '13'};

Then:

dates = datenum(data(:,1),2); % mm/dd/yyyy format. Change "2" for other formats
[dates_sort ind_sort] = sort(dates);
[~, ii, jj] = unique(dates_sort);
n = diff([0; ii]);
result = accumarray(jj,vertcat(str2double(data(ind_sort,4))))./n;

gives the desired result:

result =

   10.2500
    7.9000
   13.0000

If needed, you can get the non-repeated, sorted dates with data(ind_sort(ii),1).

Explanation of the code: the dates are first converted to numbers and sorted. The unique dates and repeated dates are then extracted. Finally, data in repeated rows are summed and divided by the number of repetitions to obtain the averages.

Compatibility issues for Matlab 2013a onwards:

The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace the line [~, ii, jj] = unique(dates_sort) by

[~, ii, jj] = unique(dates_sort,'legacy')
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • What does ii and jj refer to? I've been trying to use your code since it seems to be exactly what I need, but I'm not able to do vertcat because my ii and jj are now different lengths. ii and n are the same length while jj is longer. My data array doesn't have the quotes (''). Don't know if that makes a difference. – SugaKookie Sep 27 '13 at 20:05
  • It's normal that ii and jj have different lengths. ii contains the date indices without repetitions, and jj contains all dates indicating which are repeated. Which error do you get? And how can you data not have the quotes? Isn't it a cell array with strings in the first two columns? – Luis Mendo Sep 27 '13 at 20:10
  • @user2258883 Another thing: which Matlab version do do you use? Tray changing `~` by a dummy variable name such as `aux` – Luis Mendo Sep 27 '13 at 20:16
  • I use MATLAB 2012 and the error I get is 'Error using vertcat Dimensions of matrices being concatenated are not consistent.' Using aux still gave the same error. – SugaKookie Sep 27 '13 at 20:25
  • Can you tell me exactly what is your cell array? What does it contain in its first column? Or better, what is the result of typing `whos data` (where `data` is the name of your cell array)? – Luis Mendo Sep 27 '13 at 20:27
  • Name Size Bytes Class Attributes data 232x4 115574 cell – SugaKookie Sep 27 '13 at 20:40
  • Maybe the problem is that the date strings don't all have the same width. I have updated the solution (changed first line); try now – Luis Mendo Sep 27 '13 at 20:51
  • Sorry! I wasn't on MATLAB over the weekend. I tried the updated solution and now it says 'Error using ==> vertcat, CAT argument dimensions are not consistent' – SugaKookie Sep 30 '13 at 12:47
  • Can you post a link to an example cell array (saved as a .mat file)? – Luis Mendo Sep 30 '13 at 13:50
  • I have two files here - the long named one is the one with all the data. 'data' is the file with only columns 1 through 4 because I wanted to make it match with the script you wrote. https://www.dropbox.com/sh/li3hh1nvt11vok5/4YGfwStQlo – SugaKookie Sep 30 '13 at 18:40
  • There were two problems with my answer: your last column is strings, not numbers as I thought; and the dates are not sorted. I think it works now, see updated solution. I have tested with your "data.mat" file. Your date format is "mm/dd/yyyy", right? – Luis Mendo Sep 30 '13 at 19:15
  • Thank you so much! It is working now. I didn't realize the last column was strings. I actually used another line of code to sort it, but incorporating it into this code seems more efficient. – SugaKookie Sep 30 '13 at 20:45
  • Great! We finally made it! :-) – Luis Mendo Sep 30 '13 at 22:13
  • @user2258883 I don't have the `read_mixed_csv` function or know the `netcdf` format... sorry! – Luis Mendo Oct 04 '13 at 16:29
0

It sounds like you want to do :

for it = 1:size(CellArray,1)
   sum = sum + cellArray{it}(4) % or .NameOfColumn if it a cell containing struct
end

mean(sum)
Vuwox
  • 2,331
  • 18
  • 33
  • It gives me an error saying 'Error using sum, Not enough input arguments'. – SugaKookie Sep 27 '13 at 18:19
  • Also, I don't want to find the mean of the entire 4th column. I only want to find the average of the rows where the first column (date) matches. Everything needs to be left alone. – SugaKookie Sep 27 '13 at 18:20
  • Sorry, I didn't saw that. But the LuisMendo answer seems to be good. – Vuwox Sep 27 '13 at 22:12