2

I'm trying to import the following text file:

   "year"   "sex"   "name"       "n"    "prop"
"1" 1880    "F"     "Mary"      7065    0.0723835869064085
"2" 1880    "F"     "Anna"      2604    0.0266789611187951
"3" 1880    "F"     "Emma"      2003    0.0205214896777829
"4" 1880    "F"     "Elizabeth" 1939    0.0198657855642641
"5" 1880    "F"     "Minnie"    1746    0.0178884278469341
"6" 1880    "F"     "Margaret"  1578    0.0161672045489473
"7" 1880    "F"     "Ida"       1472    0.0150811946109318
"8" 1880    "F"     "Alice"     1414    0.0144869627580554
"9" 1880    "F"     "Bertha"    1320    0.0135238973413247
"10"1880    "F"     "Sarah"     1288    0.0131960452845653

and I don't have any problems using:

data <-read.table("~/Documents/baby_names.txt",header=TRUE,se="\t")

However, I haven't figured out how to do it with readr. The following command fails:

data2 <-read_tsv("~/Documents/baby_names.txt")

I know the problem is related to the fact that the first row contains five elements (the headings) and the rest 6 but I don't know how to tell readr to ignore the "1", "2", "3" and so on. Any suggestions?

Jamie Eltringham
  • 810
  • 3
  • 16
  • 25
  • 2
    If I were working with this file, I would just add `"id"` to list of column names. Then `read.table()` with `header=TRUE` would work as expected. – Tim Biegeleisen Jun 14 '16 at 07:20

2 Answers2

1

We can read in two steps (not tested):

# read the columns, convert to character vector
myNames <- read_tsv(file = "myFile.tsv", n_max = 1)[1, ]

# read the data, skip 1st row, then drop the 1st column
myData <- read_tsv(file = "myFile.tsv", skip = 1, col_names = FALSE)[, -1]

# assign column names
colnames(myData) <- myNames
zx8754
  • 52,746
  • 12
  • 114
  • 209
0

You can read in the body and the column names separately and then combine them:

require(readr)

df <- read_tsv("baby_names.txt", col_names = F, skip = 1)

col_names <- read.table("baby_names.txt", header = F, sep = "\t", nrows = 1)

df$X1 <- NULL
names(df) <- col_names

Result:

> head(df)
     1     1         1    1          1
1 1880 FALSE      Mary 7065 0.07238359
2 1880 FALSE      Anna 2604 0.02667896
3 1880 FALSE      Emma 2003 0.02052149
4 1880 FALSE Elizabeth 1939 0.01986579
5 1880 FALSE    Minnie 1746 0.01788843
6 1880 FALSE  Margaret 1578 0.01616720

I don't think there is an easy way of setting row_names in read_tsv() as there is with read.table(), but this should be sufficient workaround.

niczky12
  • 4,953
  • 1
  • 24
  • 34