Extract first word from a column and insert into new column

Question

I have a dataframe below and want to extract the first word and insert it into a new column

Dataframe1:

COL1
Nick K Jones
Dave G Barros
Matt H Smith

Convert it to this:

Dataframe2:
COL1              COL2
Nick K Jones      Nick
Dave G Barros     Dave
Matt H Smith      Matt

score 49 · Answer 1 · edited Jun 24 '19 at 10:15

49

We can use function stringr::word:

library(stringr)

Dataframe1$COL2 <- word(Dataframe2$COL1, 1)

edited Jun 24 '19 at 10:15

zx8754

52,746
12
114
209

answered Aug 11 '17 at 18:33

Colibri

682
6
8

2

This works well but is very slow for larger data. I'm working with half-a-milion rows and `str_extract(Dataframe2$COL1, '[A-Za-z]+')` (also from the `stringr` package) is at least ten times faster. – nJGL Nov 14 '19 at 14:19
clearly the best answer for a problem that is meant to be simple – Garini Feb 23 '21 at 22:47

score 32 · Accepted Answer · answered Aug 10 '15 at 17:43

32

You can use a regex ("([A-Za-z]+)" or "([[:alpha:]]+)"or "(\\w+)") to grab the first word

Dataframe1$COL2 <- gsub("([A-Za-z]+).*", "\\1", Dataframe1$COL1)

answered Aug 10 '15 at 17:43

Rorschach

31,301
5
78
129

1

why use `gsub` when you need to replace just first occurrence. use `sub` – Saksham Aug 11 '15 at 18:12
1

@Saksham you're right `sub` would be better here, thanks – Rorschach Aug 12 '15 at 05:53
What if the first word is a number: 495 or Q1? When I try this formula it just keeps "Q" and not Q1, and for 495, it takes all the numbers after it: "495 3Be" @nongkrong – Nick Aug 12 '15 at 21:52
2

@Nick try the option `"(\\w+)"`, or you can add into the brackets the options for matching numbers, ie. `[0-9A-Za-z]+` and `[[:digit:]]` – Rorschach Aug 12 '15 at 21:55
That didn't work unfortunately. I basically just want to grab the first word (whether that be characters or numbers before a space). So if I have P1 Media, in the past it would print out P. For 495 54, it would print out everything instead of just 495. @nongkrong – Nick Aug 12 '15 at 22:10
@Nick Specifically, did you try `sub("(\\w+).*", "\\1", Dataframe1$COL1)` – Rorschach Aug 12 '15 at 22:14
That worked thanks! and last question: is there a way to make all of them lowercase? @nongkrong – Nick Aug 13 '15 at 12:22
@Nick: you can set the argument `ignore.case = TRUE` to not worry about case sensitivity anymore. Or use `tolower()` – andschar Apr 11 '17 at 11:17
The above worked. I'm looking for a solution that, in addition to the one above, creates a new column with the remaining variables after the split. e.g. in the above example, we have a new column, COL3 that has values as `K Jones`, `G Barros` – andy Aug 16 '22 at 11:25

score 14 · Answer 3 · answered Aug 10 '15 at 17:52

14

The function strsplit can be useful

Dataframe1$COL2 <- strsplit(Dataframe1$COL1, " ")[[1]][1]

Then you can change the last bracketed number to select other parts from the string too.

answered Aug 10 '15 at 17:52

mattbawn

1,358
2
13
33

Extract first word from a column and insert into new column

3 Answers3

Linked

Related