1

I have multiple excel files like the one shown below (hourly data). I want to obtain the daily average (e.g. from 17:00 to 16:00 of next day).I only know a little Matlab. Currently my solution is below but it got some problem.

  1. Read each excel file and store the data in a variable.
  2. Merge all the data in a single variable.
  3. Find the row number (n) with 17:00. For this, I obtained the datenum of each row and then just search (find) the datenum corresponding to 17:00.
  4. Average the data between each two rows (n(i)).

It works fine if the raw data is right. But the problems are below.

  1. Some data at 17:00 are missing. So there will be error when it try to find the row with 17:00.
  2. When I have more than 100 excel file, it runs very very slow.

Can any one give some suggestion how to solve the problem? I prefer to use Matlab since it is the only tool I know a little. Many thanks!

Edit 1: Code provided

Below is the code to obtain daily average. I have combined all the data in the file "summary_file.xls" (the code of this part is not shown.

As the example table shown below.

  1. The average for the time between 17:00, 1 Sep. and 2 Sep. has no problem.
  2. For days with data missing, I would like to obtain the average for all available data before 17:00.
  3. Example: since the data of 17:00 3 Sep. is missing, the average for 2-3 Sep. cannot be obtained. For such situation, I would like to obtain the average for all data before 17:00 (i.e. untill 9:00 3 Sep.)
  4. Then for 3-4 Sep., I would like to calculate the average starting from 18:00.
  5. For days with all data missing, just make all data as 0 or marked as not available.

[num,txt,raw] = xlsread('summary_file.xls'); % read file

date_num = num(:,11); % read the column containing date number
starting = '2003/09/05-17:00'; %  starting time
starting = datenum(starting,formatIn_2); % convert starting time to date number
ending = '2003/09/09-17:00';%  ending time
ending = datenum(ending,formatIn_2); %convert ending time to date number

s = starting:1:ending; % All date number with 17:00

%% find the row number with 17:00
for i = 1:ending-starting+1
    [a(i) b(i)] = find(date_num==s(i));
end
%% Store the averaged data in variable p
for i = 1:ending-starting
    p(i,:) = mean(num(a(i):a(i+1)-1,:));
end

Sample input data -

+------+-------+----------+-------+-------+-------+-------+-------+-------+
| 2003 | 1-Sep |  15:00   | 100.2 |  29   | 70.5  | 3.903 |  728  |   0   |
+------+-------+----------+-------+-------+-------+-------+-------+-------+
| 2003 | 1-Sep | 16:00    | 100.1 | 29.31 | 70.7  | 4.328 | 611.8 | 0     |
| 2003 | 1-Sep | 17:00    | 100.1 | 29.64 | 67.06 | 3.719 | 434.8 | 0     |
| 2003 | 1-Sep | 18:00    | 100.1 | 29.67 | 64.4  | 3.005 | 172.4 | 0     |
| 2003 | 1-Sep | 19:00    | 100.1 | 29.06 | 68.22 | 2.292 | 19.89 | 0     |
| 2003 | 1-Sep | 20:00    | 100.2 | 28.43 | 74.7  | 2.436 | 0.428 | 0     |
| 2003 | 1-Sep | 21:00    | 100.2 | 27.92 | 76.2  | 1.931 | 0.006 | 0     |
| 2003 | 1-Sep | 22:00    | 100.3 | 27.67 | 77.3  | 1.825 | 0.007 | 0     |
| 2003 | 1-Sep | 23:00    | 100.4 | 27.55 | 77.9  | 1.622 | 0.007 | 0     |
| 2003 | 1-Sep | 24:00:00 | 100.4 | 27.69 | 77.8  | 0.863 | 0.008 | 0     |
| 2003 | 2-Sep | 1:00     | 100.4 | 27.55 | 78.3  | 0.879 | 0.008 | 0     |
| 2003 | 2-Sep | 2:00     | 100.3 | 27.05 | 82.1  | 1     | 0.016 | 0.762 |
| 2003 | 2-Sep | 3:00     | 100.3 | 26.41 | 86.8  | 0.805 | 0.006 | 0     |
| 2003 | 2-Sep | 4:00     | 100.2 | 26.6  | 85.5  | 0.522 | 0.011 | 0.508 |
| 2003 | 2-Sep | 5:00     | 100.2 | 25.53 | 83.8  | 2.158 | 0.011 | 0     |
| 2003 | 2-Sep | 6:00     | 100.3 | 24.5  | 86.6  | 2.711 | 0.016 | 0     |
| 2003 | 2-Sep | 7:00     | 100.4 | 24.85 | 86.9  | 2.562 | 0.016 | 4.318 |
| 2003 | 2-Sep | 8:00     | 100.6 | 21.11 | 94    | 8.15  | 9.96  | 26.67 |
| 2003 | 2-Sep | 9:00     | 100.6 | 22.23 | 91.9  | 5.065 | 31.67 | 0.254 |
| 2003 | 2-Sep | 10:00    | 100.6 | 23.51 | 88.8  | 5.742 | 39.16 | 0.254 |
| 2003 | 2-Sep | 11:00    | 100.6 | 24    | 87.7  | 4.494 | 97.8  | 0     |
| 2003 | 2-Sep | 12:00    | 100.6 | 24.69 | 85.3  | 4.709 | 142.2 | 0     |
| 2003 | 2-Sep | 13:00    | 100.5 | 25.57 | 82.8  | 5.66  | 259.1 | 0     |
| 2003 | 2-Sep | 14:00    | 100.4 | 25.69 | 81.9  | 5.634 | 157.5 | 0     |
| 2003 | 2-Sep | 15:00    | 100.3 | 26.18 | 79.1  | 5.564 | 308.2 | 0     |
| 2003 | 2-Sep | 16:00    | 100.3 | 26.08 | 78.3  | 6.283 | 135.3 | 0     |
| 2003 | 2-Sep | 17:00    | 100.3 | 25.75 | 81.2  | 4.595 | 55.68 | 0.762 |
| 2003 | 2-Sep | 18:00    | 100.3 | 25.01 | 84.5  | 4.843 | 55.21 | 1.778 |
| 2003 | 2-Sep | 19:00    | 100.3 | 25.15 | 86.1  | 1.433 | 22.43 | 0     |
| 2003 | 2-Sep | 20:00    | 100.3 | 24.98 | 86.1  | 1.985 | 0.301 | 0     |
| 2003 | 2-Sep | 21:00    | 100.3 | 24.75 | 85.1  | 0.712 | 0.009 | 0     |
| 2003 | 2-Sep | 22:00    | 100.4 | 24.76 | 85.3  | 1.546 | 0.011 | 0     |
| 2003 | 2-Sep | 23:00    | 100.5 | 24.92 | 84.5  | 1.186 | 0.008 | 0     |
| 2003 | 2-Sep | 24:00:00 | 100.5 | 24.96 | 84.9  | 1.31  | 0.007 | 0     |
| 2003 | 3-Sep | 1:00     | 100.5 | 25    | 85.3  | 0.702 | 0.012 | 0     |
| 2003 | 3-Sep | 2:00     | 100.5 | 24.99 | 86    | 0.35  | 0.017 | 0     |
| 2003 | 3-Sep | 3:00     | 100.4 | 25.07 | 86.1  | 0.69  | 0.008 | 0     |
| 2003 | 3-Sep | 4:00     | 100.3 | 24.92 | 86.5  | 1.347 | 0.011 | 0     |
| 2003 | 3-Sep | 5:00     | 100.3 | 25.27 | 85.5  | 0.834 | 0.009 | 0     |
| 2003 | 3-Sep | 6:00     | 100.3 | 24.97 | 86.9  | 0.627 | 0.012 | 0     |
| 2003 | 3-Sep | 7:00     | 100.3 | 24.8  | 87.7  | 0.755 | 0.108 | 0     |
| 2003 | 3-Sep | 8:00     | 100.4 | 25.54 | 85    | 0.202 | 37.11 | 0     |
| 2003 | 3-Sep | 9:00     | 100.4 | 26.72 | 81    | 1.853 | 219.4 | 0     |
| 2003 | 3-Sep | 18:00    | 100.2 | 29.67 | 56.39 | 2.856 | 456.2 | 0     |
| 2003 | 3-Sep | 19:00    | 100.2 | 30.17 | 53.66 | 2.204 | 266   | 0     |
+------+-------+----------+-------+-------+-------+-------+-------+-------+
Divakar
  • 218,885
  • 19
  • 262
  • 358
user2230101
  • 455
  • 3
  • 6
  • 15

3 Answers3

1

The following code might serve your requirements. Again, since you are interested in daily averages, the output would be much be smaller, as it's over a 24 hour period and I am assuming you need that. Also, it takes care of your missing data condition.

Code -

%% Setup params and data
start_hour = 17;
[num,txt,raw] = xlsread('summary_file.xls');

datenums = NaN(size(num,1),1);
for count = 1:size(num,1)
    year1 = cell2mat(raw(count,1));
    date1 = cell2mat(raw(count,2));
    time1 = cell2mat(raw(count,3));
    date_str = strcat( num2str(year1) , '-', date1 );
    datenums(count) = datenum( date_str, 'yyyy-dd-mmm')  + time1(:);
end

%% Take care of conditions
firstdata_start_hour = round(24*cell2mat(raw(1,3)));
if firstdata_start_hour > 17
    start1 =  floor(datenums(1))  + (start_hour/24);
elseif firstdata_start_hour < 17
    start1 =  floor(datenums(1))-1  + (start_hour/24);
else
    start1 = datenums(1);
end

ind1 = floor(datenums-start1) + 1;

%% Start Processing
num_items = size(num,2)-3;
num_days = max(ind1);

bins = NaN(num_days,num_items);
for count1 = 1:size(bins,2)
    for count2 = 1:size(bins,1)
        bins(count2,count1) = mean(num(find(ind1==count2),count1+3));
    end
end
bins(isnan(bins))=0;
average_nums = bins

Output of the averages for some data, compiled by OP on my request -

+-----------+---------------+------+-------------+------------+-----------+--------------+
|   Date    | Pressure(kPa) | Temp | Humidity(%) | W-spd(m/s) | Radiation | Rainfall(mm) |
+-----------+---------------+------+-------------+------------+-----------+--------------+
| 8/10/2009 | 100.1         | 25.8 | 79.1        | 1.4        | 82.6      | 1.7          |
| 8/11/2009 | 100.2         | 27.5 | 75.7        | 1.9        | 173.8     | 0.0          |
| 8/12/2009 | 100.1         | 28.4 | 73.5        | 2.1        | 177.1     | 0.0          |
| 8/13/2009 | 100.0         | 28.4 | 73.2        | 2.5        | 197.4     | 0.0          |
| 8/14/2009 | 100.0         | 28.5 | 73.5        | 2.2        | 151.2     | 0.0          |
| 8/15/2009 | 100.2         | 27.3 | 75.4        | 1.2        | 96.2      | 0.4          |
| 8/16/2009 | 100.2         | 27.1 | 75.5        | 1.4        | 122.6     | 0.0          |
| 8/17/2009 | 100.2         | 27.2 | 75.7        | 1.5        | 158.3     | 0.2          |
| 8/18/2009 | 100.2         | 27.5 | 72.2        | 1.4        | 186.4     | 0.0          |
| 8/19/2009 | 100.3         | 28.4 | 68.4        | 1.9        | 186.9     | 0.0          |
| 8/20/2009 | 100.3         | 28.1 | 69.2        | 2.0        | 184.8     | 0.0          |
| 8/21/2009 | 100.3         | 26.5 | 75.8        | 1.3        | 122.3     | 0.6          |
+-----------+---------------+------+-------------+------------+-----------+--------------+

For multiple excel files, you have to loop over all such files in a loop. There doesn't appear to have any other way out.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Many thanks! Above code works great! But the 'time1(:)' in line 9 need to be revised to 'time1(:)/24'. Is it correct? – user2230101 Mar 03 '14 at 02:54
  • After reading into time1 - "time1 = cell2mat(raw(count,3));", check the values of time1. If they are fractions, don't use division by 24, otherwise use it. On my system, it was reading as fractions, so the division by 24 wasn't needed for me. I suspect it could be an issue of the way you have created the XLS sheet or MATLAB version itself. – Divakar Mar 03 '14 at 04:49
  • Consider voting up the answers that were useful. It could be done for not just for the accepted answer, but for others too. Read more here - http://stackoverflow.com/help/privileges/vote-up – Divakar Mar 03 '14 at 11:40
  • Thanks! Confirmed that the time 1 is not fractions so division by 24 is needed. I also want to vote the answer up but I do not have enough reputation (15) currently. I will do that after I have enough reputation. – user2230101 Mar 03 '14 at 13:16
  • Also, if you could manage to show the output mean/average values in a tabular form, similar to the way you have shown the input data, would be nice to showcase the results! If you could do that, suggest me the edit in my answer! – Divakar Mar 03 '14 at 13:23
  • I have added my output table in my question. Is it better to show it in your answer? I have also added a table head and a column of date with the code below. for count2 = 1:size(bins,1) datestrs {count2} = datestr(start1+count2,'yyyy-mm-dd'); end datestrs = datestrs' head = {'Date','Pressure(kPa)','Air Temp(°C)','Humidity(%)','Wind Speed(m/s)','Radiation(W/m2)','Rainfall(mm)','n'}; xlswrite(output.xls,head,1,'A1'); xlswrite(output.xls,datestrs,1,'A2'); xlswrite(output.xls,average_nums ,1,'B2'); – user2230101 Mar 04 '14 at 15:06
  • I have put that into my answer, and removed that from your question. Thank you so much for getting the output data! – Divakar Mar 04 '14 at 15:55
0

When you use Matlab's xlsread it opens and closes the Excel COM Server, so using this with multiple files will slow down execution. Read this post at Matlab Central which explains how can you avoid opening and closing Excel COM Server.

As for the missing data, maybe you could interpolate 17:00 with the data from find(time<17:00,'last') and find(time>17:00,'first')

R. Schifini
  • 9,085
  • 2
  • 26
  • 32
0

I love using MATLAB but if you don't mind me asking, have you considered using the R statistical software Package? Its really cool in my opinion you don't have to agree with me.

It is very easy to do the tasks that you are asking for. R can even handle missing data and easily work around it and can read excel files easily and search for the data you want.

Where to get R:

http://cran.r-project.org/

jonathan1987
  • 251
  • 1
  • 6
  • 17
  • Thanks! I am also interested in using R. But I do not have sufficient time to learn a new software for this task. May be later I can pick it up. – user2230101 Mar 01 '14 at 16:21