0

I am using following code in R to calculate mean of all columns in various text files. Now, I need to modify the existing code to calculate mean of only few columns e.g. Temp [C], Press [Pa], Pow [W] etc. (All the columns in the txt file are in float numbers, the first column is Date/Time).

Could anyone please help in modifying the existing code to achieve the desired objective.

xyz <- xy %>% 
    as_tibble() %>%
    group_by(time_sp = lubridate::floor_date(`Date/Time`, "15 mins")) %>% 
    summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = 
TRUE)))

  write.csv(xyz, paste0(dirlist[idx],"15row.csv"), row.names = FALSE)

Sample input data is as follows:

Date/Time Temp [C] TQ [C] Press [Pa] PQ [Pa] Pow [W] PowQ [W] ...
1990-02-01S0:00:01 27 5 298 -4 1278 1... 
1990-02-01S0:00:02 25 3 298 -4 1277 0...
....
...

1 Answers1

2
cn <- c('Temp [C]', 'Press [Pa]', 'Pow [W]')
xyz <- xy[ , cn ] %>% 
    as_tibble() %>%
    group_by(time_sp = lubridate::floor_date(`Date/Time`, "15 mins")) %>% 
    summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = 
TRUE)))

  write.csv(xyz, paste0(dirlist[idx],"15row.csv"), row.names = FALSE)
br00t
  • 1,440
  • 8
  • 10
  • While working on bulk data, I realized that few columns are not present in all the files. e.g. files of folder 1 have Temp, Press and Pow, whereas files of folder 2 have only Press and Pow. In this case, it generates output for folder 1 but not for folder 2. It gives error that col Temp not found. In such case, how can I modify the existing code. Could you please help me in this. Thanks. – Alexia k Boston May 26 '23 at 15:54
  • Before running the pipeline you can do create a flag something like `all_cols_present <- all(cn %in% colnames(xyz))` to check if all expected columns are present. Alternately you may want to just retain colnames known to be in the data.frame such as: `cn <- intersect(cn, colnames(xyz))` – br00t May 26 '23 at 19:27
  • Sorry, I could not understand. My aim is to calculate the mean of all columns present in a file. If File1 has Temp, Press and Pow, I want to calculate mean of all the three columns in output mean_File1.csv. If File2 has only Press and Pow, I want to calculate mean of the two columns in output mean_File2.csv. Will your suggested commands do the same? – Alexia k Boston May 26 '23 at 22:53