I'm new to R and practicing using the Titanic data set from Kaggle. I am attempting to separate last name, first name, salutation, and extra information into separate columns so that I can try to categorize the age of the passengers - adult or child.
The following is sample data from the Train data set:
head(traindf,5)
# Source: local data frame [5 x 12]
#
# PassengerId Survived Pclass
# 1 1 0 3
# 2 2 1 1
# 3 3 1 3
# 4 4 1 1
# 5 5 0 3
# Variables not shown: Name (chr), Sex (fctr), Age (dbl), SibSp (int), Parch
# (int), Ticket (fctr), Fare (dbl), Cabin (fctr), Embarked (fctr)
The following is a sample that includes the Name:
select(traindf,Survived,Pclass,Name,Sex)
# Source: local data frame [891 x 4]
#
# Survived Pclass Name Sex
# 1 0 3 Braund, Mr. Owen Harris male
# 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female
# 3 1 3 Heikkinen, Miss. Laina female
# 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female
# 5 0 3 Allen, Mr. William Henry male
# 6 0 3 Moran, Mr. James male
# 7 0 1 McCarthy, Mr. Timothy J male
# 8 0 3 Palsson, Master. Gosta Leonard male
# 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female
# 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female
I can use the following code to separate last name from the rest of the column:
require(tidyr) # for the separate() function
traindfnames <- traindf %>%
separate(Name, c("Lastname","Salutation"), sep = ",")
traindfnames
# Source: local data frame [891 x 13]
#
# PassengerId Survived Pclass Lastname
# 1 1 0 3 Braund
# 2 2 1 1 Cumings
# 3 3 1 3 Heikkinen
# 4 4 1 1 Futrelle
# 5 5 0 3 Allen
# 6 6 0 3 Moran
# 7 7 0 1 McCarthy
# 8 8 0 3 Palsson
# 9 9 1 3 Johnson
# 10 10 1 2 Nasser
# .. ... ... ... ...
# Variables not shown: Salutation (chr), Sex (fctr), Age (dbl), SibSp (int),
# Parch (int), Ticket (fctr), Fare (dbl), Cabin (fctr), Embarked (fctr)
However, when I try to add a field for First Name:
traindfnames <- traindf %>%
separate(Name, c("Lastname","Salutation","firstname"), sep =",,")
I get this error:
# Error: Values not split into 3 pieces at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 2
Am I using incorrect syntax or 3 fields from one column isn't possible?