Suppose I have a CSV file (titled "Substance Related Deaths of Females in 2014") of a bunch of data that looks like (keep in mind this is a CSV file, and just a small sample, and it's also made-up data so the numbers aren't real):
Substance Related Deaths
of Females
by country
2014
Country pregnant status alcohol opiates heroin
USA pregnant 1,230 4,844 893
not pregnant 23,440 12,773 2,005
CANADA pregnant 1,094 735 804
not pregnant 18,661 5,787 1,050
GERMANY pregnant 444 97 203
not pregnant 1,007 388 1,375
MEXICO pregnant 786 1,456 1,532
not pregnant 20,562 2,645 7,594
The original CSV file contains 30 rows (including stuff we don't want at the top and bottom) and 8 columns.
Now suppose I want to ONLY keep all the rows where each row starts with a country with capitalized letters (in other words, I only want the rows that list the country first, and only the "pregnant" data). Here's what I did:
df <- readLines("substancedeaths.csv")
linesTOkeep <- grep("^[A-Z]",df)
mydata <- df[linesTOkeep]
finaltable <- read.table(textConnection(mydata),sep=",")
The original data has 10 countries, with 8 columns (first column is "State", rest are substances). The end goal is to have a data frame with 10 rows and 8 columns. But after running my code, I end up with only 8 rows and 8 columns, it's omitting the USA and CANADA rows, looking like this:
GERMANY pregnant 444 97 203
MEXICO pregnant 786 1,456 1,532
And so forth. Germany is at the top but USA and CANADA should be. Any ideas what may be happening?