I have an external data file like below, with no delimiters:
PLAYER TEAM STUFF1 STUFF2
Jim Smith NYY 100 200
Jerry Johnson Jr. PHI 100 200
Andrew C. James STL 200 200
A. J. Williams CWS 100 200
Felix Rodriguez BAL 100 100
How can I read this file? I am thinking of using readLines
and splitting the string before any sequence of three consecutive capital letters. However, I do not know how to do it.
What if only the first letter of the team name was capitalized?
Below is a similar file in which a name is followed by a column of numbers. I can read these data with the code that follows:
TEAM STUFF1 STUFF2
New York Yankees 100 200
Philadelphia Phillies 100 200
Boston Red Sox 200 200
Los Angeles Angels 100 200
Chicago White Sox 100 100
Chicago Cubs 200 100
New York Mets 200 200
San Francisco Giants 100 300
Minnesota Twins 100 300
St. Louis Cardinals 200 300
Here is the code to read the second data set:
setwd('c:/users/mmiller21/simple R programs/')
my.data3 <- readLines('team.names.with.spaces.txt')
# split between desired columns
my.data4 <- do.call(rbind, strsplit(my.data3, split = "(?<=[ ])(?=[0-9])", perl = T))
# returns string w/o leading or trailing whitespace
# This function is not mine and was found on Stack Overflow
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
my.data5 <- trim(my.data4)
# remove header
my.data6 <- my.data5[-1,]
# convert to data.frame
my.data6 <- data.frame(my.data6, stringsAsFactors = FALSE)
my.data6[,2] <- as.numeric(my.data6[,2])
my.data6[,3] <- as.numeric(my.data6[,3])
my.data6
X1 X2 X3
1 New York Yankees 100 200
2 Philadelphia Phillies 100 200
3 Boston Red Sox 200 200
4 Los Angeles Angels 100 200
5 Chicago White Sox 100 100
6 Chicago Cubs 200 100
7 New York Mets 200 200
8 San Francisco Giants 100 300
9 Minnesota Twins 100 300
10 St. Louis Cardinals 200 300
Thank you for any advice. I prefer a solution in base R.