1

The size of my file is 335.1 MB. R Studio seems to have some diffulty in reading it. I got this pop up: enter image description here

Note, that I get two different error messages according to the readr function I use.

read_csv

> Mydata <- read_csv("P:/Projects/Project/Data_folder/mydata/mydata.csv")
Rows: 2068023 Columns: 1                                                                                                                 
Error in nchar(x, "width") : invalid multibyte string, element 1
In addition: Warning message:
One or more parsing issues, see `problems()` for details 

read_csv2

> Mydata <- read_csv2("P:/Projects/Project/Data_folder/mydata/mydata.csv")
i Using "','" as decimal and "'.'" as grouping mark. Use `read_delim()` for more control.
Rows: 2045785 Columns: 25                                                                                                                
Error in nchar(x, "width") : invalid multibyte string, element 1
In addition: Warning message:
One or more parsing issues, see `problems()` for details 

read_csv2 seems to be the correct option because it recognizes the columns. But should I use some other method instead?

Gato
  • 389
  • 2
  • 12
  • 1
    5mb seems very low whats the return of `memory.limit()` ? – user12256545 May 26 '23 at 13:19
  • The return of ```memory.limit()``` is 15777. What does it mean? – Gato May 26 '23 at 13:40
  • Is it 15 777 MB? – Gato May 26 '23 at 13:52
  • 1
    yes its ~15,8 Gb – user12256545 May 26 '23 at 13:58
  • can you try to load it from the console not the editor? – user12256545 May 26 '23 at 14:02
  • 1
    The "multibyte" error messages indicate that your csv file contains non-ASCII characters. I am not familiar with the readr package, but in base R you can look at the read.csv function and its fileEncoding argument. – BigFinger May 26 '23 at 14:11
  • @user12256545 yes I tried that, it didn't work. The only solution that has somehow worked so far, is ```read_tsv```, but it cuts the file to a much smaller size than it actually is – Gato May 26 '23 at 14:50
  • Popup says "source editor", did you try to open it as a text in RStudio editor (View file in file pane)? Other than that is seems indeed more like an encding issue. What's the source of that file? Is it combined from some other files? – margusl May 26 '23 at 15:34
  • I didn't try to open it as text. As a matter of fact, it is combined from other files – Gato May 30 '23 at 12:00
  • I got it fixed like this: ```Mydata <- read_csv2("P:/projects/project/project_data/mydata/mydata.csv", locale = locale(encoding = "latin1"))```. This "latin1" seems to be some kind of a magic word that brings the correct encoding with it. On one line in the data there was the symbol ";" in free text, and that confused R when the encoding was "UTF-8" – Gato May 30 '23 at 12:28

1 Answers1

0

I got it fixed like this: Mydata <- read_csv2("P:/projects/project/project_data/mydata/mydata.csv", locale = locale(encoding = "latin1")). This "latin1" seems to be some kind of a magic word that brings the correct encoding with it. On one line in the data there was the symbol ";" in free text, and that confused R when the encoding was "UTF-8"

Credit: Gato

Mark
  • 7,785
  • 2
  • 14
  • 34