4

Rstudio version 0.96.331 and knitr version 0.8

I thought my problem had been solved with update of RStudio and libraries... however:

The following run in R gives me 940 unique Table.ID values. Run in a knitr chunk I get 228 unique values and the following warning:

"invalid input found on input connection 'http://www2.census.gov/acs2010_5yr/summaryfile/Sequence_Number_and_Table_Number_Lookup.txt'

I don't understand why the distinction exists between the two methods.

Sequence <- read.csv("http://www2.census.gov/acs2010_5yr/summaryfile/Sequence_Number_and_Table_Number_Lookup.txt",
                   stringsAsFactors=FALSE)
unique(Sequence$Table.ID)

enter image description here enter image description here

Michael Williams
  • 1,125
  • 2
  • 9
  • 13
  • What version of `rstudio` and `knitr` are you using? – Maiasaura Sep 07 '12 at 18:05
  • Please update your question with output from `sessionInfo()` – Maiasaura Sep 07 '12 at 18:20
  • 1
    Try now with the `fileEncoding` argument (see updated answer below). – Maiasaura Sep 07 '12 at 18:29
  • That worked! Thanks. Now, do you know of any documentation of why the fileEncoding argument must be used in knitr and not in the console? – Michael Williams Sep 07 '12 at 18:38
  • 1
    @MichaelWilliams That is because RStudio sets `options(encoding = 'UTF-8')` before calling `knitr`. It is probably worth reporting to RStudio developers, or you can reset to default `options(encoding = 'native.enc')` before you read the file. – Yihui Xie Sep 07 '12 at 18:49

1 Answers1

5

Works fine on Rstudio version 0.96.331 and knitr version 0.8

My .Rmd file:

        knitr test for length
        ========================================================
        This should successfully return a length of 940

    ```{r}
    Sequence <- read.csv("http://www2.census.gov/acs2010_5yr/summaryfile/Sequence_Number_and_Table_Number_Lookup.txt", 
fileEncoding = "iso8859-8", stringsAsFactors = FALSE)
    length(unique(Sequence$Table.ID))
    ```

Resulting in this:

enter image description here

Maiasaura
  • 32,226
  • 27
  • 104
  • 108
  • As you say, setting the `fileEncoding` you give to `read.csv` is the solution. The value that works for me is `native.enc`, I found it by running `getOption("encoding")` in an interactive session. `knitr` changes it to `UTF-8`. – rescdsk Oct 11 '12 at 19:03