Read CSV File with specific caractere with R

Question

I'm trying to read a csv file with R exported from Survey Monkey in French which contain special caracteres as "dâ€™administration", "systÃ¨me", "vousÂ" and "double space" that are impossible to kill. This syntax is really difficult to manage, do you have any advice ? do I have to read it as a UTF-8 format. Thanks for your help. Best

Have you studied [this answer](https://stackoverflow.com/a/33697504/1305688)? It's generally good to demonstrate you already put some effort into it. Like share what you have tried and what went wrong. — Eric Fail, Jun 04 '20 at 12:06

score 0 · Answer 1 · answered Jun 04 '20 at 09:48

0

You can always do a search and replace, either directly in the .csv using excel or even a text editor or you can use stringr to do it in R. You can also see this post handling special characters e.g. accents in R

answered Jun 04 '20 at 09:48

Al3xEP

328
2
9

Thanks Al3xEP for this quick answer. The search and replace option doesn't work very well especially with the double space which are detectable but impossible to kill – sebastien Jun 04 '20 at 09:57
1

Hmm. Not sure what platform you are on but w/ LibreOffice for instance, you can search for a regular expression or even type space twice in the find " " and calc will find it in your csv. You can then replace. With stringr, it's fairly easy (see the cheat sheet ), you just need to identify double spaces with the right regex, in your case "\s" which stands for any white space which you just replace by a simple space " " (this will actually replace single spaces as well but by a single space, so that is fine) – Al3xEP Jun 04 '20 at 11:15
I'm working one a csv file in office 2016, i can find the double space with the searche function but i'im not able to change it Same probleme in R, i can detect the double space but i'm not able to deal with it and it generate wrong name in the tittle or in the argument of the variable I'm going to check again the encoding format when importing with R – sebastien Jun 04 '20 at 12:12
You should be able to replace any white space and that includes the double space. `str_replace("\s"," ")` – Al3xEP Jun 04 '20 at 15:25
If you're working in MS Office, you can use the special characters without any trouble, as long as you download them correctly. The Surveymonkey API delivers it cleanly. If reading from a saved text file, don't just open it as default: Text Import Wizard Step 1 of 3, if your file contains accented characters or double-byte characters such as in Asian languages, File Origin should be 65001: Unicode (UTF-8) – sysmod Jun 04 '20 at 17:15

score 0 · Accepted Answer · answered Jun 04 '20 at 17:24

I just saw you're reading a CSV file. Here's how to read them correctly, as that other post also indicates: https://sysmod.wordpress.com/2016/08/28/excel-gene-mutation-and-curation/

If you really want to replace the accented characters with plain ANSI, here's a VBA function: Function UnAccent(ByVal inputString As String) As String ' http://www.vbforums.com/archive/index.php/t-483965.html Dim index As Long, Position As Long Const ACCENTED_CHARS As String = "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿøØŸœŒ" Const ANSICHARACTERS As String = "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyyoOYoO" For index = 1 To Len(inputString) Position = InStr(ACCENTED_CHARS, Mid$(inputString, index, 1)) If Position > 0 Then Mid$(inputString, index) = Mid$(ANSICHARACTERS, Position, 1) Next UnAccent = inputString End Function

Read CSV File with specific caractere with R

2 Answers2