2

I have a dataframe like this:

df <- structure(list(A = c(2, 3, 1), B = c(3, 2, 1), C = c(4, 5, 1), 
    D = c(4, 4, 1), Genus = c("Ensifer", "Ensifer", "Ensifer"
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

      A     B     C     D Genus  
  <dbl> <dbl> <dbl> <dbl> <chr>  
1     2     3     4     4 Ensifer
2     3     2     5     4 Ensifer
3     1     1     1     1 Ensifer

In this dataframe I have five columns, in which four columns have values while fifth column have names and the same name is repeated multiple times but I want that the name Ensifer become one and all the values add up and come in a single row just like this

      A     B     C     D Genus  
  <dbl> <dbl> <dbl> <dbl> <chr>  
1     6     6    10     9 Ensifer

I want to do this in R as data is too long

I have tried this code but it is taking too long

count <- read.csv("count_data.csv", header=T)
shl <- aggregate(count, by=count['Genus'], sum)
TarJae
  • 72,363
  • 6
  • 19
  • 66

3 Answers3

1

use the aggregate() function in combination with subsetting

count <- read.csv("count_data.csv", header=T)
# Subset the data to only include the columns you need
count_subset <- count[,c("Genus", "Sample1", "Sample2", "Sample3")]
# Use the aggregate() function to group the data by "Genus" and sum the values
count_agg <- aggregate(. ~ Genus, data=count_subset, sum)
colnames(count_agg) <- c("Genus", "Sample1", "Sample2", "Sample3")


count_agg

Change the sample values according to your image

allamiro
  • 58
  • 1
  • 10
1

We could do it either:

dplyr: Here we apply the sum() function across column A:D.

library(dplyr)
df %>% 
  summarise(across(A:D, sum), .by=Genus)

 Genus       A     B     C     D
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 Ensifer     6     6    10     9

base R1:

result <- aggregate(df[, 1:4], by = list(df$Genus), FUN = sum)
names(result)[1] <- "Genus"

Genus A B  C D
1 Ensifer 6 6 10 9

base R2:

t(sapply(split(df[, 1:4], df$Genus), colSums))

        A B  C D
Ensifer 6 6 10 9
TarJae
  • 72,363
  • 6
  • 19
  • 66
1

If high performance is a requirement try data.table.

setDT adds indices and creates a data.table object in place, .SD accesses the variables/columns and by is the grouping.

library(data.table)

setDT(count)

count[, lapply(.SD, sum), by = Genus]
     Genus A B  C D
1: Ensifer 6 6 10 9
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29