0

I have a data frame with 2 columns

<string> <count>

for example:

qwerty 24
1qwerty 21
123456 20
qwerty123 12
abc123 10
xyz223 1
test223 2
test@123 11
xyz@123 10

I want to make a data frame with the structure

<suffix> <count>

suffix will contain the trailing numbers or a symbol followed by a number. The suffix of any string that contains only numbers will be NA (in this example "qwerty", "123456", and "1qwerty" would be NA)

count will be the sum of all the counts in the first data frame that has that type of suffix

ie. the required output for the example would be

NA 65
123 22
@123 21
223 3
Nikhil KR
  • 23
  • 2
  • 1
    I'm having a hard time correlating your expected output with the sample input data. Can you explain it better? – Tim Biegeleisen Oct 20 '19 at 06:49
  • The first 3 rows of my sample input will return NA for its suffix, their total count is 24+21+20 which is 65 The next 2 rows of my sample input will return "123", their total count is 12+10 = 22 The next 2 rows of my sample input will return "223", their total count is 1+2 = 3 The next 2 rows of my sample input will return "@123". their total count is 11+10 Sorry for the horrible explanation i'm new to this – Nikhil KR Oct 20 '19 at 06:57

1 Answers1

0

You can try:

tapply(df$count, gsub("^\\d.*|[A-Za-z]", "", df$string), sum)

     @123  123  223 
  65   21   22    3 
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56