The following dataframe is a subset of a bigger df, which contains duplicated information
df<-data.frame(Caught=c(92,134,92,134),
Discarded=c(49,47,49,47),
Units=c(170,170,220,220),
Hours=c(72,72,72,72),
Colour=c("red","red","red","red"))
In Base R, I would like to get the following:
df_result<-data.frame(Caught=226,
Retained=96,
Units=390,
Hours=72,
colour="red")
So basically the results is the sum of unique values for columns Caught, Retained, Units and leaving the same value for Hours and colour (Caught=92+134, Retained=49+47, Units= 170+220, Hours=72, colour="red)
However, I intend to do this in a much bigger data.frame with several columns. My idea was to apply a function based on column names as:
l <- lapply(df, function(x) {
if(names(x) %in% c("Caught","Discarded","Units"))
sum(unique(x))
else
unique(x)
})
as.data.frame(l)
However, this does not work, as I am not entirely sure how to extract vector names when using lapply()
and other functions such as this.
I have tried withouth succes to implement by()
, apply()
functions.
Thanks