How to calculate mean of only few columns of a text file in R?

Question

I am using following code in R to calculate mean of all columns in various text files. Now, I need to modify the existing code to calculate mean of only few columns e.g. Temp [C], Press [Pa], Pow [W] etc. (All the columns in the txt file are in float numbers, the first column is Date/Time).

Could anyone please help in modifying the existing code to achieve the desired objective.

xyz <- xy %>% 
    as_tibble() %>%
    group_by(time_sp = lubridate::floor_date(`Date/Time`, "15 mins")) %>% 
    summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = 
TRUE)))

  write.csv(xyz, paste0(dirlist[idx],"15row.csv"), row.names = FALSE)

Sample input data is as follows:

Date/Time Temp [C] TQ [C] Press [Pa] PQ [Pa] Pow [W] PowQ [W] ...
1990-02-01S0:00:01 27 5 298 -4 1278 1... 
1990-02-01S0:00:02 25 3 298 -4 1277 0...
....
...

You can just specify the columns in `across()`..i.e. in place of `where(is.numeric)`, just put the columns you want.. Perhaps I'm not sure what you are asking here. — langtang, May 23 '23 at 17:31
Do you want `data.table` (as you tagged), `dplyr` (since it appears you're already using this), or both? — r2evans, May 23 '23 at 18:35

score 2 · Accepted Answer · answered May 23 '23 at 17:33

2

cn <- c('Temp [C]', 'Press [Pa]', 'Pow [W]')
xyz <- xy[ , cn ] %>% 
    as_tibble() %>%
    group_by(time_sp = lubridate::floor_date(`Date/Time`, "15 mins")) %>% 
    summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = 
TRUE)))

  write.csv(xyz, paste0(dirlist[idx],"15row.csv"), row.names = FALSE)

answered May 23 '23 at 17:33

br00t

1,440
8
10

While working on bulk data, I realized that few columns are not present in all the files. e.g. files of folder 1 have Temp, Press and Pow, whereas files of folder 2 have only Press and Pow. In this case, it generates output for folder 1 but not for folder 2. It gives error that col Temp not found. In such case, how can I modify the existing code. Could you please help me in this. Thanks. – Alexia k Boston May 26 '23 at 15:54
Before running the pipeline you can do create a flag something like `all_cols_present <- all(cn %in% colnames(xyz))` to check if all expected columns are present. Alternately you may want to just retain colnames known to be in the data.frame such as: `cn <- intersect(cn, colnames(xyz))` – br00t May 26 '23 at 19:27
Sorry, I could not understand. My aim is to calculate the mean of all columns present in a file. If File1 has Temp, Press and Pow, I want to calculate mean of all the three columns in output mean_File1.csv. If File2 has only Press and Pow, I want to calculate mean of the two columns in output mean_File2.csv. Will your suggested commands do the same? – Alexia k Boston May 26 '23 at 22:53

How to calculate mean of only few columns of a text file in R?

1 Answers1