0

I have a tibble structured like so:

structure(list(c("All", "Males", "Females"), `All Ages` = c("8.7\n            \n                \n                    6,315", 
"8.6\n            \n                \n                    3,306", 
"8.7\n            \n                \n                    1,136"
), `<18` = c("9.5\n            \n                \n                    19", 
"9.5\n            \n                \n                    14", 
"10.0\n            \n                \n                    3"
), `18-29` = c("8.7\n            \n                \n                    1,434", 
"8.7\n            \n                \n                    1,081", 
"8.7\n            \n                \n                    330"
), `30-44` = c("8.7\n            \n                \n                    1,746", 
"8.7\n            \n                \n                    1,219", 
"8.7\n            \n                \n                    507"
), `45+` = c("8.5\n            \n                \n                    951", 
"8.4\n            \n                \n                    730", 
"8.8\n            \n                \n                    212"
)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"
))

which looks like this:

# A tibble: 3 x 6
  ``     `All Ages`          `<18`         `18-29`         `30-44`         `45+`        
  <chr>  <chr>               <chr>         <chr>           <chr>           <chr>        
1 All    "8.7\n            … "9.5\n      … "8.7\n        … "8.7\n        … "8.5\n      …
2 Males  "8.6\n            … "9.5\n      … "8.7\n        … "8.7\n        … "8.4\n      …
3 Femal… "8.7\n            … "10.0\n     … "8.7\n        … "8.7\n        … "8.8\n      …

I want to remove the line break characters from each cell where it appears. So I tried something like this:

df %>% mutate(`18-29` = str_replace_all(`18-29`, "\n", ""))

to see if I could get it to work for just the one column, but i've not been able to. I've tried many variations (e.g. putting .$ in front of the column names, just a ., trying to copy and paste the gap which appears between the two data points in each cell after running View(df), putting an additional escape \ before the \n, removing the tick marks ` around the column name, replacing them with single ' or double " speech marks, etc.), but none have worked. The most common error I run into is Error in initialize(...) : attempt to use zero-length variable name.

Ultimately, I want to remove the \n characters, then split each of the columns with numeric values in them into two, prefixed by cnt_ and rating_ (so i'd end up with cnt_<18, rating_<18, cnt_18-29, rating_18-29, etc.).

C.Robin
  • 1,085
  • 1
  • 10
  • 23
  • 1
    I.e., by getting rid of the **\n** , you will be having `"8.7 1,434"` in one of the lines. What will be the logic to combine them? Will it be 8.71434 ?Or? – maydin Jul 18 '21 at 13:03
  • I would want to separate on the empty space. I can handle that with `separate` though, just want to remove the `\n` beforehand – C.Robin Jul 18 '21 at 15:25

1 Answers1

3

The error is because you have a column which has empty name, tibbles don't allow that. You can use janitor::clean_names() to make the column names standard.

str_replace_all with empty replacement is same as using str_remove_all.

library(dplyr)
library(stringr)

df %>%
  janitor::clean_names() %>%
  mutate(x18_29 = str_remove_all(x18_29, "\n"))

To apply for all the columns use across.

df %>%
  janitor::clean_names() %>%
  mutate(across(.fns = ~str_remove_all(., "\n")))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213