0

This is the my code. I have an xlsx file with multiple worksheets and I am trying to load the first worksheet.

I used the RStudio Excel IDE to load the worksheet

Rstudio Excel IDE

Which generated the below code but it considered all the attributes as character but it should have identified the first and the fourth as numeric so I had changed it manually before importing.

> PharmacyMaster <- read_excel("~/Desktop/R/ABDataSet(WithDataSetMaster).xlsx", 
+     sheet = "PHRMCY MASTER", col_types = c("numeric", 
+         "text", "text", "numeric"))

The dataset got imported but I received 50 warnings, which I am not sure what are they referring to.

There were 50 or more warnings (use warnings() to see the first 50)
> View(PharmacyMaster)
> warnings()
Warning messages:
1: In read_fun(path = path, sheet = sheet, limits = limits,  ... :
  Coercing text to numeric in A2 / R2C1: '1017330163607345979'
2: In read_fun(path = path, sheet = sheet, limits = limits,  ... :
  Coercing text to numeric in D2 / R2C4: '070'
3: In read_fun(path = path, sheet = sheet, limits = limits,  ... :
  Coercing text to numeric in A3 / R3C1: '1041420479647471411'
4: In read_fun(path = path, sheet = sheet, limits = limits,  ... :
  Coercing text to numeric in D3 / R3C4: '071'
5: In read_fun(path = path, sheet = sheet, limits = limits,  ... :
  Coercing text to numeric in A4 / R4C1: '1048827871928328746'

I have checked the structure of my loaded dataset,

> str(PharmacyMaster)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1097 obs. of  4 variables:
 $ PHRMCY_NBR: num  1.02e+18 1.04e+18 1.05e+18 1.06e+18 1.06e+18 ...
 $ PHRMCY_NAM: chr  "GNP PHARMACY #1" "GNP PHARMACY #2" "GNP PHARMACY #3" "GNP PHARMACY #4" ...
 $ ST_CD     : chr  "NJ" "NJ" "MA" "NJ" ...
 $ ZIP_3_CD  : num  70 71 21 70 100 11 70 193 75 70 ...

I have two questions, 1. Why couldn't readxl identify the numeric columns and load as numeric, which you can see from the IDE screenshot I had changed it manually. (Can find snapshot of the dataset below) 2. What do the warnings mean?

P.S. There are no quoted values or empty values in the numeric column of the dataset, I have checked manually all 1097 rows for that.

Dataset Worksheet snapshot

cyborg
  • 431
  • 1
  • 6
  • 20
  • In the snapshot, your numeric columns are left-aligned in excel... usually the default for numeric columns in excel is right-alignment. Maybe that's a hint that there is a subtle issue in the formatting of the excel sheet. – Alex P Jul 26 '17 at 21:54
  • The ZIP_3 column has something like 070. This is not a number but a number stored as text. If you open the file and look at one cell it probably has an apostrophe before the number making it text. The little green triangles in each cell is a dead giveaway. – Kevin Jul 27 '17 at 01:48
  • @Kevin thats right about green triangles though I couldnt find any apostrophe, the data in the column is stored as text. I feel like an idiot. I think this file was generated somehow with those columns to be text, which you clearly pointed out as green triangles and I forcing them to load as numeric by specifying the col_type as numeric might have caused all the warnings. – cyborg Jul 27 '17 at 05:36
  • @AlexP I am not sure whether the alignment is the reason to blame. – cyborg Jul 27 '17 at 05:43
  • @Kevin thank you for the clarification. – cyborg Jul 27 '17 at 05:49
  • Glad to help. Sometimes being too close the problem makes it hard to see. – Kevin Jul 28 '17 at 20:26

0 Answers0