0

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.

File with data format:
(This is the StarData variable)
ID>>>>Values

002141865 3.867144e-03  742.000000  0.001121  16.155089  6.297494  0.001677

002141865 5.429278e-03  1940.000000  0.000477  16.583748  11.945627  0.001622

002141865 4.360715e-03  1897.000000  0.000667  16.863406  13.438383  0.001460

002141865 3.972467e-03  2127.000000  0.000459  16.103060  21.966853  0.001196

002141865 8.542932e-03  2094.000000  0.000421  17.452007  18.067214  0.002490

Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID. Thanks for any help given.

ImmortalxR
  • 319
  • 5
  • 17
  • ID is first column? And for matching ID's, do you want to average a given column, or all of them? – Luis Mendo Oct 14 '13 at 15:39
  • Does this help you? http://stackoverflow.com/questions/19056905/matlab-cell-array-average-two-values-if-another-column-matches – Luis Mendo Oct 14 '13 at 15:42
  • I have seen the question you posted but I am looking for the average of all of the columns and every time I try to use acummarray I cannot get it to average all of the columns for me, just a single one – ImmortalxR Oct 14 '13 at 15:47
  • So for each ID you want a single value, with the average of all columns for those selected rows? – Luis Mendo Oct 14 '13 at 15:48
  • If Im understanding correctly then yes, except I need the average for each column, not just a single value for all of the columns; so I would need the second column in my final result to be an average of everything in the second column and so on for all of the columns. – ImmortalxR Oct 14 '13 at 15:52
  • Ok. Got it. A modification of my previous answer wiil do. I'm posting it now – Luis Mendo Oct 14 '13 at 15:54

1 Answers1

1

A modification of this answer does the job, as follows:

[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
  averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end

The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.

Compatibility issues for Matlab 2013a onwards:

The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by

[~, ii, jj] = unique(value_sort,'legacy')
Community
  • 1
  • 1
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Hey this does produce output in the format I would like however it is not giving me averages for some reason? – ImmortalxR Oct 14 '13 at 16:04
  • Can you post an example with easy numbers and a small `StarData`? For example: given `StarData = [1 3 4; 1 5 6; 2 3 3]` which result would you get exactly? – Luis Mendo Oct 14 '13 at 16:05
  • @ImmortalxR But also show what the result should be for that case! – Luis Mendo Oct 14 '13 at 16:13
  • Okay sorry I hadnt understood exactly what you want, but I posted a better example now. – ImmortalxR Oct 14 '13 at 16:18
  • @ImmortalxR Huh? I am getting a third value of `1526.3` (i.e. (742+1940+1897)/3), not `463068`. Please copy my code again as currently appears in my answer and try – Luis Mendo Oct 14 '13 at 16:21
  • @Lusi Mendo I tried again and still no luck, am going to post a full set for one ID and you can see if it works for you. – ImmortalxR Oct 14 '13 at 16:38
  • @ImmortalxR You must be doing something different than I am doing. With your latest sample I get: averages(1)= 2141865, averages(2)= 0.0046, averages(3)=1905.3, which are correct. Are you _sure_ you are using the code as is _now_ in the answer? Delete your code and copy from scratch – Luis Mendo Oct 14 '13 at 16:45
  • I am posting the code in my OP exactly as I have it because even using the output I gave you with a clean copy of your code is not working on my end. – ImmortalxR Oct 14 '13 at 16:51
  • @ImmortalxR The problem must be in `StarData` then. Can you add `disp(StarData)` after the `evalin` line and post the result? – Luis Mendo Oct 14 '13 at 17:12
  • @ImmortalxR Also, are you declaring `averages` as `global` in your base workspace? – Luis Mendo Oct 14 '13 at 17:13
  • Yea I am declaring averages as global in the workspace as well, and I will post what I am getting in the OP – ImmortalxR Oct 14 '13 at 17:17
  • Please see edit at the end of my answer. I am defining `StarData` within your function and I get the correct result. Try it and tell me – Luis Mendo Oct 14 '13 at 17:24
  • I used the exact same code as you have posted above and sorry to say but I am not getting the correct values still, same incorrect value as before. I have an exact copy/paste of what you posted. – ImmortalxR Oct 14 '13 at 17:26
  • Please see updated edit. I have removed the function. Restart Matlab, paste code on command window and see `averages` – Luis Mendo Oct 14 '13 at 17:46
  • I tried your last suggestion and still my values are wrong, it is the weirdest thing at this point, by any chance would a different matlab version matter because I am using matlab r2013a – ImmortalxR Oct 14 '13 at 17:52
  • I use 2010b. Do you have access to a lower version? Weird indeed – Luis Mendo Oct 14 '13 at 17:58
  • 1
    @ImmortalxR It seems that `unique` has changed in 2013a (see http://www.mathworks.es/es/help/matlab/ref/unique.html). Try replacing second line to `[~, ii, jj] = unique(value_sort,'legacy');` – Luis Mendo Oct 14 '13 at 18:25
  • @ImmortalxR Glad to hear that! So the changed behaviour of `unique` is to blame. I will add an update in this answer. – Luis Mendo Oct 16 '13 at 14:43
  • 1
    @ImmortalxR I also suggest that you clean up your question by removing everything under "Thanks for any help given", as that would mostly confuse future readers – Luis Mendo Oct 16 '13 at 14:49