0

So I have a large data set (50,000 rows and 500 columns). I merged the rows I wanted to by this code:

Similarities <- Home %>%
  group_by_at(c(1,2,5,9,70,26)) %>%
  summarize_all(.funs = function(x) paste(unique(x), collapse = ','))

In this code, for other the other rows that are combined and have different values, their output turns into a list separated with commas. However, now I want to sum all the values in one specific column, in which I tried this code:

Similarities <- Home %>%
  group_by_at(c(1,2,5,9,70,26)) %>%
  summarize_at(.vars = FTR, .funs = function(x) paste(sum(x))),
  summarize_all(.funs = function(x) paste(unique(x), collapse = ','))

I assumed it wouldn't work because I wasn't sure what I was doing. My goal is to have the specific column: "FTR", when I merge rows together, all the values in "FTR" be added together.

An example of the data would be:

Total  Type  Clm   FTR     Loss

300    water  2      -103  N

200    fire   3      203   Y
 
300    water  2      100   Y

What my code does now is:

Total    Type   CLM  FTR        Loss
300      water  2    -103, 100  Y, N
200      fire   3    203        Y

But what I want is:

Total   Type   CLM    FTR   Loss
300     water   2    -3     Y, N
200     fire    3    203    Y
user4157124
  • 2,809
  • 13
  • 27
  • 42
Anna
  • 3
  • 2

1 Answers1

0

The following code sums the collapsed columns, like the question asks for.

special_sum <- function(x, sep = ", ", na.rm = TRUE){
  f <- function(y, na.rm){
    y <- as.numeric(y)
    sum(y, na.rm = na.rm)
  }
  x <- as.character(x)
  x <- strsplit(x, sep)
  sapply(x, f, na.rm = na.rm)
}

WIth the second data.frame posted in the question, the function special_sum could be called as follows. The group columns are for tests purposes only.

Home <- read.table(text = "
Total    Type   CLM  FTR        Loss
300      water  2    '-103, 100'  'Y, N'
200      fire   3    203        Y
", header = TRUE)


Home %>%
  group_by(1, 2) %>% 
  summarize_at(vars('FTR'), special_sum)
## A tibble: 2 x 3
## Groups:   1, 2 [1]
#    `1`   `2`   FTR
#  <dbl> <dbl> <dbl>
#1     1     2    -3
#2     1     2   203

Note that you should probably sum first then paste the values.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66