Is there a reason RowSums(df[grep wouldn't work accurately?

Question

I used

df$Total.P.n <- rowSums(df[grep('p.n', names(df), ignore.case = FALSE)])

to sum count values from any column name containing p.n, but the values it produced are way off. The columns are counts of certain combinations of language types in a language corpus. I want to get a summary of all times p.n. was used within other combinations, but am struggling. It seems like perhaps it is counting other occurences like e.sp.NR in my variable names, but shouldn't ignore.case=FALSE take care of that? I've also tried tidyverse and dplyr solutions to no avail.

Here's example of df structure:

ID.	do.p.n.NP	do.p.n.SE	p.d.e.sp.SR
1510	4	6	2
1515	2	0	1

and what I need:

ID.	do.p.n.NP	do.p.n.SE	p.d.e.sp.SR	Total.P.n
1510	4	6	2	10
1515	2	0	1	2

please provide a proper reproducible example. Paste the output of `dput(df)` in your question — GuedesBF, Jan 28 '22 at 19:30
is the last column supposed to be c(0,1) as in the first table or c(2,1) as in the seccond one? — GuedesBF, Jan 28 '22 at 20:11
@GuedesBF, yes, sorry, I am working on this problem from COVID quarantine. I appreciate your patience with my mistakes. This is as close to a full reprex as I can do- it's restricted use data. — Sarah P., Jan 30 '22 at 00:48

TarJae · Answer 1 · 2022-01-28T19:37:49.630

1

Update after update(new column names) of OP: The code is like:

df$Total.P.n <- rowSums(df[grep('p.n', names(df), ignore.case = FALSE)])
df$p.d.e.sp.SR <- rowSums(df[,2:3]!=0)

    ID. do.p.n.NP do.p.n.SE. p.d.e.sp.SR Total.P.n
1 1510         4          6           2        10
2 1515         2          0           1         2

First answer: The argument pattern you are searching for e.g. p.n does not exist in df. Therefore I think you mean pn: Then your code works as expectect:

df$Total.P.n <- rowSums(df[grep('pn', names(df), ignore.case = FALSE)])

   ID. do.pn.NP do.pn.SE. p.d.e.sp.SR Total.P.n
1 1510        4         6           0        10
2 1515        2         0           1         2

edited Jan 28 '22 at 19:37

answered Jan 27 '22 at 20:22

TarJae

72,363
6
19
66

1

Sorry, I don't know how I missed this, but p.n. DOES exist in df. I've updated the reprex to reflect that. – Sarah P. Jan 28 '22 at 19:22
Please see my update. – TarJae Jan 28 '22 at 19:38
1

This regex is potentially fragile, as I suspect the OP wants to match literal dots. As is, the regex will also match "pxn", "pyn" or whatever. It works on the given example, but is likely not scalable. – GuedesBF Jan 28 '22 at 19:45
Ok. I agree. Do you think `'p\\.n'` is better?. I think the main issue was that column `p.d.e.sp.SR` needs a own line of code to count cells that are not 0 thus providing to Total row sum. But it is a guess. What do you think? – TarJae Jan 28 '22 at 19:49
1

I do not know. I just realized the OP updated the data, but their tables are inconsistent with one another. Different values for the `p.d.e.sp.SR` column. I asked for clarifications – GuedesBF Jan 28 '22 at 20:12
anyway, both "p\\.n" and "p.n" do not match "p.d.e.sp.SR", so these work. – GuedesBF Jan 28 '22 at 20:14

GuedesBF · Answer 2 · 2022-01-28T19:39:39.923

0

If we can use dplyr, I would suggest using a tidy-select function / selection helper like matches. And please mind that your regex is likely wrong. If we need to match literal dots . , we need to escape the metacharacter with a double backslash. The appropriate regex would be n\\.p.

library(dplyr)

data

df <- tibble(`ID.` = c(1510, 1515), `do.p.n.NP` = c(4,2), `do.p.n.SE.` = c(6,0), `p.d.e.sp.SR` = c(0,1))

answer

df %>%
    mutate(Total.P.n = rowSums(across(matches('p\\.n'))))

# A tibble: 2 × 5
    ID. do.p.n.NP do.p.n.SE. p.d.e.sp.SR Total.P.n
  <dbl>     <dbl>      <dbl>       <dbl>     <dbl>
1  1510         4          6           0        10
2  1515         2          0           1         2

edited Jan 28 '22 at 19:39

answered Jan 27 '22 at 20:48

GuedesBF

8,409
5
19
37

this is the error I get when I use the above code: Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'rowSums': `across()` must only be used inside dplyr verbs. – Sarah P. Jan 28 '22 at 19:24
It is working just fine here. Plese check if you are using the exact same synthax – GuedesBF Jan 28 '22 at 19:41
Please see the updated answer – GuedesBF Jan 28 '22 at 19:43

Is there a reason RowSums(df[grep wouldn't work accurately?

2 Answers2