How to join multiple meteorological data excel files in one single dataframe in R?

Question

This is my first time posting a question here and I'm kinda a newbie in R. So forgive me my errors posting it here.

Basicaly I have multiple excel files with meteorological data that is organized like this:

C1_Maio.csv OBS: I have the files C1_Maio, C1_Junho, C1_Julho (names in portuguese = May, June and July)

The all the files contains three different temperatures measurements and one for humidity (C1_Maio -> T1, T2, T3 and H)

I have 4 other excel files C2_Maio; C3_Maio; C4_Maio; C5_Maio following the same logic.

I'm trying to join all these files in sequence like this:2

So, to summarize, I need to join all C1_Maio, C1_Junho and C1_Julho in one continuous excel file. I tried to write a code (again a newbie here) and I don't know what im missing... When I merge them something is going on that it is messing with the data inside the csv file. Here is the code bellow with the descriptions:

    # Packages

#install.packages(openxlsx)
#install.packages(tidyverse)

# Libraries

library("openxlsx")
library("tidyverse")

# Insertion of data from the Maio de Cadiretas sensors

C_1_Maio <- read.csv("D:/Carreira Acadêmica/Doutorado - UB/Projetos/HoliSoils/Areas de Estudo/Cadiretes/Dados/Sensores/2022-05-11/C-1.csv", header = FALSE, sep = ";")


# Insertion of data from the Junho de Cadiretas sensors

C_1_Junho <- read.csv("D:/Carreira Acadêmica/Doutorado - UB/Projetos/HoliSoils/Areas de Estudo/Cadiretes/Dados/Sensores/2022-06-29/C-1.csv", header = FALSE, sep = ";")


# Joining dataframes from different Cadiretas dates

C1_cad <- rbind(C_1_Maio, C_1_Junho)

#Here I need to find a way to join these two excel files based on datetime column. That is, these two files have a column that R calls it V2 that has the following format: 2021.07.02 07:45

#In this way, it would specify that where the observations of the May file end, those of June will begin.

#The problem is that these files are automatically generated by a program, so in the June file there may be observations that were measured in May, and it seems to me that due to errors in the program the measurements may be different. As in the following generic example:

#EX:
#                 A          B        C   D           E       F            G     H     I
#C1_Maio:   14   2021.07.02 11:00     4  23,1875     23,375   23,1875     461    202    0
#C1_Junho:  50   2021.07.02 11:00     4  24,7500     22,375   22,1975     461    202    0

#In short, I needed the data to follow the order established by the date and time of observation and if there are duplicate data for the same date and time, they should appear repeated without one replacing the other.



# Extracting the joined data to Excel in separate columns Cadiretes

cad_T1_C1 <- C1_cad[, c("V2","V4"), drop = FALSE]

cad_T2_C1 <- C1_cad[, c("V2","V5"), drop = FALSE]

cad_T3_C1 <- C1_cad[, c("V2","V6"), drop = FALSE]

cad_H_C1 <- C1_cad[, c("V2","V7"), drop = FALSE]



C1_TM_Cadiretes <- list('T1' = cad_T1_C1, 'T2' = cad_T2_C1, 'T3' = cad_T3_C1, 'Moisture' = cad_H_C1)
write.xlsx(C1_TM_Cadiretes,"C:\\Users\\Lenovo\\Desktop\\Holisoils data\\C1_TM_Cadiretes.xlsx")


# In this way what you would need is the Excel file generated to contain the data of "T1", "T2", "T3" and "Humidity" in separate sheets in the same CSV or Excel file

Does this answer your question? [Merging of multiple excel files in R](https://stackoverflow.com/questions/46305724/merging-of-multiple-excel-files-in-r) — Andrea M, Sep 05 '22 at 14:46
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Sep 07 '22 at 13:23

AlpacaKing · Answer 1 · 2022-09-07T05:58:09.267

0

library(tidyverse)

# Read multiple file at once
df <- list.files(path = "/the_directory_that_contain_the_files", pattern = "*.csv") %>% 
  map_df(~read_csv(.))

# Remove duplicates
df1 <- data.frame(a = c('a', 'b', 'c', 'c', 'd'),
                  b = c(1, 2, 3, 4, 5))
df2 <- df1 %>% 
  distinct(a, .keep_all = T)

> df2
  a b
1 a 1
2 b 2
3 c 3
4 d 5

# ------------ Update ------------
library(tidyverse)

file_list <- list.files(path = "C:/Users/Desktop/abc_folder",  # Folder directory that contains files
                        pattern = "*.csv",  # Get all .csv files directory within the folder
                        full.names = T)

df <- map_df(file_list, ~read_csv(.))  # Read-in all .csv files and row-binded

edited Sep 07 '22 at 05:58

answered Sep 05 '22 at 15:17

AlpacaKing

371
2
10

Hello there and thanks for sharing your answer. Altough when I run the df <- list.files(path = "/the_directory_that_contain_the_files", pattern = "*.csv") %>% map_df(~read_csv(.)) It doesn't return a dataframe inside R, what am I missing? – Eduardo Garcia Sep 06 '22 at 18:41
Perhaps the working directory was not set up properly. The code had been updated, you may want to have a try on it. You could check on the file_list whether the file directories were read-in properly before runnning the subsequent map_df() function. – AlpacaKing Sep 07 '22 at 06:01

How to join multiple meteorological data excel files in one single dataframe in R?

1 Answers1