-2

I have a dataframe and i want to calculate the sum of variables present in a vector in every row and make the sum in other variable after i want the name of new variable created to be from the name of the variable in vector

for example

data

Name      A_12    B_12    C_12   D_12    E_12
r1        1         5      12      21     15
r2        2         4       7      10      9
r3        5        15      16       9      6
r4        7         8       0       7     18

let's say i have two vectors

vector_1 <- c("A_12","B_12","C_12")
vector_2 <- c("B_12","C_12","D_12","E_12")

The result i want is :

New_data >

 Name        A_12     B_12   C_12   ABC_12     D_12    E_12   BCDE_12
    r1        1         5     12      18         21     15      54
    r2        2         4      7      13         10      9      32
    r3        5        15     16      36          9      6      45
    r4        7         8      0      15          7     18      40

I created for loop to get the sum of the rows in a vector but i didn't get the correct result Please tell me ig you need any more informations or clarifications Thank you

r2evans
  • 141,215
  • 6
  • 77
  • 149
Reda
  • 449
  • 1
  • 4
  • 17

3 Answers3

0

You can use rowSums and simple column-subsetting:

dat$ABC_12 <- rowSums(dat[,vector_1])
dat$BCDE_12 <- rowSums(dat[,vector_2])
dat
#   Name A_12 B_12 C_12 D_12 E_12 ABC_12 BCDE_12
# 1   r1    1    5   12   21   15     18      53
# 2   r2    2    4    7   10    9     13      30
# 3   r3    5   15   16    9    6     36      46
# 4   r4    7    8    0    7   18     15      33

Note that if your frames inherit from data.table, then you'll need to use either subset(dat, select=vector_1) or dat[,..vector_1] instead of simply dat[,vector_1]; if you aren't already using data.table, then you can safely ignore this paragraph.

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • thank you for your answer , it works the only problem is that , i don't know the name of vectors because they are generated using for loop so i don't know how to get there names to formulate the name of my variable sum for example i should get the first 3 letters of each variable to formulate the name of new variable : example V1= M11T888 V2=M22T888 , V3=M33T888 the name of new variable should be M112233T888 that's why i need to get the first 3 letters of each element of the vector – Reda Oct 28 '21 at 14:21
  • And you think it is okay to *assume* that they all have some common suffix? – r2evans Oct 28 '21 at 15:11
  • actually , they don't have the same suffix because some of theme have the exacte same name with an extra _2 for example M11T888 and M11T888_2, i just checked my data and knew there's some variables like this – Reda Oct 28 '21 at 15:14
  • You're asking for a method to combine unknown column names using a demonstrated-only heuristic (i.e., shared suffix) that is known to be fallible, is that right? – r2evans Oct 28 '21 at 15:34
  • yes and alsoi don't know the number of vectors because they are generated using for loop – Reda Oct 30 '21 at 11:27
  • You've demonstrated creating new names given the sample column names in your question, okay; after that, you state that the real column names look different and defy your presumed heuristic for combining them. After your OP where you ask about sums, you are implicitly expecting us to be able to fix your naming convention when your real data is completely different and yet you do not provide clear examples of these different names. Can you see why this question is stagnant? **Update your question** to better define your naming conundrum. (Lacking that, let's close it as "needs debugging info".) – r2evans Oct 30 '21 at 11:43
  • Hello , yes, it's better to close it i will post new question giving more informations and giving also my tries that didn't work thank you – Reda Oct 30 '21 at 11:45
  • To confirm, you're saying that question you asked (*"sum of rows"*) is not really your question at all? Or that you asked it, have an answer, and don't need it anymore. The typical method on SO is that you accept one of the acceptable answers and then ask another. My point about closing due to naming-convention issues is to point out that the question does not ask for what you need nor does it give us sufficient info for what you need. If you *needed* the row-of-sums thing and now have a new question, that's different; accept and ask another, please. – r2evans Oct 30 '21 at 12:11
  • No this it was because, didn't formule my question in a good way for example i said that i gave example of vectors while the vectors are generated using for loop, as you can see , the asnwers used a list of vectors while i don't know the vectors tha t will be genrated, also the name of variables, because it's not the names i want to give to my new variables, and finally because i fixed some bugs in my program so i though it will be better to ask new question based on new informations and also itried to give more explications please tell me what the right thing to do in cases like these. – Reda Oct 30 '21 at 12:20
0

Like this (using dplyr/tidyverse)

df %>% 
  rowwise() %>%
  mutate(
    ABC_12 = sum(c_across(vector_1)),
    BCDE_12 = sum(c_across(vector_2))
  )

Though I'm not sure the sums are correct in your example

-=-=-=EDIT-=-=-=- Here's a function to help with the naming.

ex_fun <- function(vec, n_len){
  paste0(paste(substr(vec,1,n_len), collapse = ""), substr(vec[1],n_len+1,nchar(vec[1])))
}

Which can then be implemented like so.

df %>% 
  rowwise() %>%
  mutate(
    !!ex_fun(vector_1, 1) := sum(c_across(vector_1)),
    !!ex_fun(vector_2, 1) := sum(c_across(vector_2)),
  )

-=-= Extra note -=--=

If you list your vectors up you could then combine this with r2evans answer and stick into a loop if you prefer.

vectors = list(vector_1, vector_2)

for (v in vectors){
  df[ex_fun(v, 1)] <- rowSums(df[,v])
}
Quixotic22
  • 2,894
  • 1
  • 6
  • 14
  • Hello , thank you for your answer , it works the only problem is that , i don't know the name of vectors because they are generated using for loop so i don't know how to get there names to formulate the name of my variable sum for example i should get the first 3 letters of each variable to formulate the name of new variable : example V1= M11T888 V2=M22T888 , V3=M33T888 the name of new variable should be M112233T888 that's why i need to get the first 3 letters of each element of the vector – Reda Oct 28 '21 at 14:17
  • Edit added, hope it helps – Quixotic22 Oct 28 '21 at 14:42
0

I believe this might work, so long as only the starting digits are different:

library("tidyverse")

#Input dataframe.
data <- data.frame(Name =c("r1", "r2", "r3", "r4"), A_12 = c(1, 2, 5, 7), B_12 = c(5, 4, 15, 8),
           C_12 = c(12, 7, 16, 0), D_12 = c(21, 10, 9, 7), E_12 = c(15, 9, 6, 18))

#add all vectors to the "vectors" list. I have added vector_1 and vector_2, but
#there can be as many vectors as needed, they just need to be put in the list.
vector_1 <- c("A_12","B_12","C_12")
vector_2 <- c("B_12","C_12","D_12","E_12")

vector_list<-list(vector_1, vector_2)

vector_sum <- function(data, vector_list){
  output <- data |>
    dplyr::select(1, all_of(vector_list[[1]]))
  
  for (i in vector_list) {
    name1 <- substring(as.character(i), 1,1) |> paste(collapse = '')
    name2 <- substring(as.character(i[1]), 2)
    
    input_temp <- dplyr::select(data, all_of(i))
    input_temp <- mutate(input_temp, temp=rowSums(input_temp))
    names(input_temp)[names(input_temp) == "temp"] <- paste(name1, name2)
    
    output = cbind(output, input_temp)
  }
  
  output[, !duplicated(colnames(output))]
}


vector_sum(data, vector_list)
Arshiya
  • 93
  • 9