3

I am trying to make R read my CSV file (which contains numerical and categorical data). I am able to open this file on a Windows computer(I tried different ones and it always worked) without any issues, but it is not working on my Mac at all. I am using the latest version of R. Originally, the data was in Excel and then I converted it to csv.

I have exhausted all my options, I tried recommendations from similar topics but nothing works. One time I sort of succeeded but the result looked like this: ;32,0;K;;B;50;;;; I tried the advice given in this topic Import data into R with an unknown number of columns? and the result was the same. I am a beginner in R and I really know nothing about coding or programming, so I would appreciate tremendously any kind of advice on this issue.Below are my feckless attempts to fix this problem:

> file=read.csv("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L))    : 
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep = " ")
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
duplicate 'row.names' are not allowed
> ?read.csv
> file=read.csv2("~/Desktop/file.csv", sep = ";")
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 
invalid multibyte string at '<ca>110'
> file=read.csv2("~/Desktop/file.csv", sep = ";", header=TRUE)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 
invalid multibyte string at '<ca>110'
> file=read.csv("~/Desktop/file.csv", sep=" ",row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
duplicate 'row.names' are not allowed
> file=read.csv("~/Desktop/file.csv", row.names=1)
Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
more columns than column names
> file=read.csv("~/Desktop/file.csv", sep=";",row.names=1)
Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 
invalid multibyte string at '<ca>110'

This is what the header of the data looks like. So using the advice below, I saved the document in the CSV format for Mac and once I executed the View(file) function, everything looked ok, except for some rows like the row#1 (Cord Number 1) below, it was completely misplaced :

    Cord.Number Ply Attch   Knots   Length  Term    Thkns   Color   Value   
1,S,U,,37.0,K,,MB,,,"5.5 - 6.5:4, 8.0 - 8.5:2",,UR1031,unknown,             
1s1 S   U   1S(5.5/Z) 1E(11.5/S)    46.5    K   NA  W   11  
1s2 S   U   1S(5.5/Z) 5L(11.0/Z)    21.0    B   NA  W   15

This is what the spreadsheet looks like in R Studio on Windows (I don't have enough reputation to post an image): http://imgur.com/zQdJBT2

Community
  • 1
  • 1
Jen
  • 161
  • 2
  • 7
  • Did you try `quote="'"` or `quote='"'`? – Ferdinand.kraft Jun 28 '13 at 02:56
  • @Ferdinand.kraft I just tried, same result 40,0;K;;B;50;;;; – Jen Jun 28 '13 at 03:04
  • The error `invalid multibyte string at '110'` seems an encoding problem..try something like `read.delim("~/Desktop/file.csv", fileEncoding="UTF-8")` – agstudy Jun 28 '13 at 03:08
  • @agstudy just tried and it says: Warning message: In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : invalid input found on input connection '~/Desktop/UR1031-UR1199.csv' – Jen Jun 28 '13 at 03:26
  • This means that it don't find the file.... – agstudy Jun 28 '13 at 03:31
  • and the strangest thing is, everything works perfectly fine on a Windows computer – Jen Jun 28 '13 at 03:32
  • @agstudy I worked with the files from the same directory and it was always able to locate files; what happens here is, I get the same output 40,0;K;;B;50;;;; and the Warning message. So it is apparently reading the file but then I don't understand why it's reading it in this ;;;;;;;;;;;; messed up format – Jen Jun 28 '13 at 03:39
  • How did you convert to CSV? Excel gives you options for "CSV" "CSV(MSDOS)" and "CSV(Mac)" which relates to the end-of-line characters. Next, please post some of the csv file contents, as it may well be that you've got characters in cell values which are disrupting the read. And, finally, why not load the `XLconnect` and `xlsx` packages and just read directly from the Excel workbook? – Carl Witthoft Jun 28 '13 at 11:50
  • @CarlWitthoft Thank you for your advice!I almost got R Studio to read it, but once I executed the View(file) command to be able to see it in the R Studio spreadsheet, everything looked fine except for the 1st line and some other lines too – Jen Jun 28 '13 at 18:13
  • @CarlWitthoft Here is what the header of the file looks like: Cord.Number Ply Attch 1 1,S,U,,37.0,K,,MB,,,"5.5 - 6.5:4, 8.0 - 8.5:2",,UR1031,unknown, 2 1s1 S U 3 1s2 S U 4 1s3 S U 5 1s4 S U 6 1s5 S U – Jen Jun 28 '13 at 18:17
  • @CarlWitthoft I also tried `xlsx` and `XLconnect` and the data looked like this again 94;U;U;;;B;;BB;;;; – Jen Jun 28 '13 at 18:32
  • You said that Excel on Windows can read it fine. Can you post what it looks like there? – Hong Ooi Jun 28 '13 at 18:58
  • @HongOoi Here is a link on igmur of R Studio screenshot on Windows – Jen Jun 28 '13 at 19:16
  • Right. So it looks like Excel actually wrote the file correctly (the fields in row 1 are separated by commas) but for whatever reason it failed to import. Can you open the file in a text editor? Not Word or any other word processor, just a plain simple editor that won't pull any formatting tricks. Then post an extract. – Hong Ooi Jun 28 '13 at 19:29
  • @HongOoi just to be sure, do you mean to open it in a text editor on Mac?or Windows and then send it to Mac? – Jen Jun 28 '13 at 19:57
  • Text editor on the Mac. That's where the errors seem to be occurring. – Hong Ooi Jun 28 '13 at 19:58
  • @HongOoi This is what happens when I open it in Text Editor on Mac (I got the same output when I tried to save the CSV file in a txt format on Mac): Cord Number,Ply,Attch,Knots,Length,Term,Thkns,Color,Value,Alt. Values,Subs Pos,Notes,Khipu Number,Location, 1,S,U,,37.0,K,,MB,,,"5.5 - 6.5:4, 8.0 - 8.5:2",,UR1031,unknown, 1s1,S,U,1S(5.5/Z) 1E(11.5/S),46.5,K,,W,11,,,,UR1031,unknown, 1s2,S,U,1S(5.5/Z) 5L(11.0/Z),21.0,B,,W,15,,,,UR1031,unknown, 1s3,S,U,,4.0,B,,W,,,,,UR1031,unknown, 1s4,S,U,2L(10.5/Z),48.5,K,,W,2,,,,UR1031,unknown, 1s5,S,U,1S(5.5/Z) 1E(12.0/Z),33.0,U,,W,11,,,,UR1031 – Jen Jun 28 '13 at 20:00
  • @HongOoi and this is what happens when I try to open an XLSX in Text Editor: –ѕ а°± б> ю€ и ю€€€bгd е f з h й j л l н n п pсr €€€€€э!"#$%&'()*+./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRS – Jen Jun 28 '13 at 20:07
  • Could it be because I am using my Mac in Russian?I set my R and R Studio to English, but Mac is in Russian – Jen Jun 28 '13 at 20:07
  • (Opening an xlsx file will show lots of gibberish, because it's a binary format.) The language setting could be a factor, although if there are no Russian characters in the file, it shouldn't make a difference. What you could also do is export the file as tab-delimited, and use `read.delim` to read it into R. I've had cases where R got confused by a seemingly innocuous file, and changing the separator fixed things. – Hong Ooi Jun 29 '13 at 16:39

3 Answers3

1

As a workaround, what you can do is open the csv file on a Windows machine, and then save it to a .rdata file. Rdata is R's internal storage format. You can then put the file on a USB stick, (or DropBox, Google Drive, or whatever), copy it to your Mac, and work on it there.

# on the Windows PC
dat <- read.csv("<file>", ...)

save(dat, file="<file location>/dat.rdata")


# copy the dat.rdata file over, and then on your Mac:
load("<Mac location>/dat.rdata")
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

fileEncoding="latin1" is a way to make R read the file, but in my case it came with loss of data and special characters. For example, the symbol € disappeared.

As a workaround that worked best for me for this issue (I'm on a mac too), I opened first the file on Sublime Text, and saved it "with encoding" UTF 8. When trying to import it after again, it could get read by R with no problem, and my special character were still present.

0

I had a similar problem, but when including , fileEncoding="latin1" after file's name it works

Lucnp
  • 13
  • 3