1

I am reading an excel sheet into R using the readxl::read_excel function. While largely this has been a successful have been noticing an issue with certain columns.

Uploaded a demo file in the File dropper link

When the file is read into R using the code

df_excel <- readxl::read_excel(here::here("demo.xlsx"), sheet = 1) %>% janitor::clean_names()

certain values are read as long floating numbers. For example, in the below output 1.1000000000000001 has to be just 1.1.

attempts

  • Tried copying the excel column into notepad and repasting into excel.
  • Copy the column and paste as text
  • reading columns a text while using the read_excel function
  • if I save the excel file into csv the file reads as expected into R. However, other columns ( not provided in the demo) have certain special characters like Oranges are >= Apple which csv converts to Oranges are ? Apple. >= is converted to ≥

Is there a way I can read the columns properly in excel itself ? Or open the excel sheet to fix them all across columns manually Or any solution with csv to avoid the converting of the special characters to ?

# A tibble: 32 x 2
   numbers_1          numbers_2         
   <chr>              <chr>             
 1 1                  1                 
 2 1.1000000000000001 1.1000000000000001
 3 1.1000000000000001 1.1000000000000001
 4 1.1000000000000001 1.1000000000000001
 5 1.1000000000000001 1.1000000000000001
 6 1.1000000000000001 1.1000000000000001
 7 1.1000000000000001 1.1000000000000001
 8 3.2.100            3.2.100           
 9 3.2.100            3.2.100           
10 3.2.100            3.2.100           
# ... with 22 more rows
user5249203
  • 4,436
  • 1
  • 19
  • 45

1 Answers1

0

I'm seeing the same weirdness which I do not understand, but when I save as a CSV file from my spreadsheet program and execute this I get what seems to be the expected result:

read.csv("~/Documents/R_code/demo.csv", colClasses="character")
#---------------
   numbers numbers.1
1        1         1
2      1.1       1.1
3      1.1       1.1
4      1.1       1.1
5      1.1       1.1
6      1.1       1.1
7      1.1       1.1
8  3.2.100   3.2.100
9  3.2.100   3.2.100
10 3.2.100   3.2.100
11 3.2.100   3.2.100
12 3.2.100   3.2.100
13     2.2       2.2
14     2.2       2.2
15     2.2       2.2
16     2.2       2.2
17     2.2       2.2
18     2.3       2.3
19     2.3       2.3
20     2.3       2.3
21     2.3       2.3
22     2.3       2.3
23     2.3       2.3
24     2.3       2.3
25    2.18      2.18
26    2.18      2.18
27    2.18      2.18
28    2.18      2.18
29    2.18      2.18
30    2.18      2.18
31 2.7.3.A   2.7.3.A
32 2.7.3.B   2.7.3.B

I also tried loading pkg::readxl and running without the here() call and removing the name-cleanup without solving the mystery. I'm using readxl Version: 1.3.1 in R version 3.6.3 (2020-02-29) so I'm a bit behind the times.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I did not understand your answer, I already mentioned that saving as .csv worked. But would need it to work with read_excel. – user5249203 Mar 09 '21 at 08:26
  • It's easy to understand my answer if you allow to the possibility that people may not read your question test in it's entirety. My guess about why your efforts at csv transport are giving you strange characters is that the encoding assumed by your R installation is different than the encoding produced by your installation of Excel. The `read.csv` function should be able to set a parameter for the read operation. Perhaps `fileEncoding = "latin-1242"` or "latin-9". – IRTFM Mar 09 '21 at 22:38