-1

I'm trying to read in an excel file to R using read_excel(it's a xlsx file), I have columns that contain letters and numbers, for example things like P765876. These columns also have cells with just numbers i.e 234654, so when it reads in to R it reads as an Unknown (not character or numeric) but this means that it gives any cell which has a letter and number a value of NA, how can I read this in correctly?

My code at the moment is

tenant<-read_excel("C:/Users/MPritchard/Repairs Projects/May 2017/Tenant Info/R data 1.xlsx")
Mark Amery
  • 143,130
  • 81
  • 406
  • 459
MLPNPC
  • 454
  • 5
  • 18
  • Can't reproduce your error.... whether using `read_excel` from `readxl` or `read.xlsx` from `xlsx`. With the first option, my column containing `"P1", "123", "P1235", "2352154"` is read with type `chr` and with the second one as `factor`. By providing some example data it would be easier to help you. – wolf_wue May 11 '17 at 08:22
  • Also see the `col_types` argument... – Axeman May 11 '17 at 08:24
  • @wolf_wue are you reading this in as the only column? I have multiple columns of different variation some char some numbers some a mix, my problem is that the mixed ones don't read in values with Characters in them? – MLPNPC May 11 '17 at 08:44

2 Answers2

0

Would also recommend to use the col_types argument, by specifying it as "text" you should avoid getting NAs introduced by coercion. So your code would be like:

tenant<-read_excel("C:/Users/MPritchard/Repairs Projects/May 2017/Tenant Info/R data 1.xlsx", col_types = "text")

Please let me know if this solved your problem. Regards, /Michael

Michael
  • 16
  • 2
  • When I enter the code above I get the error: Error in read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, : Need one name and type for each column – MLPNPC May 11 '17 at 08:47
  • That is strange. When provided only one value, the col_types argument is supposed to reiterate that over all columns. However, when given a vector, the length of that vector must match the number of columns. I assume you're also using version 1.0.0 of readxl? – Michael May 11 '17 at 09:12
  • Depending on the structure of your excel file this might work as a solution for your problem: http://stackoverflow.com/a/34015430/7932155 – Michael May 11 '17 at 09:21
0

Not really an answer but too much for a comment...

1:

> library(xlsx)
> tenant <- read.xlsx("returns.xlsx", sheetIndex = 1)
> str(tenant)
'data.frame':   9 obs. of  3 variables:
 $ only_integer: num  1 2 34 5 546931 ...
 $ int_char    : Factor w/ 9 levels "2545","2a","2d",..: 6 4 9 3 5 1 7 2 8
 $ only_char   : Factor w/ 6 levels "af","dd","e",..: 2 1 5 6 3 2 4 3 1

2:

> library(readxl)
> tenant2 <- read_excel("returns.xlsx")
> str(tenant2)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   9 obs. of  3 variables:
 $ only_integer: num  1 2 34 5 546931 ...
 $ int_char    : chr  "d5" "5" "ff2ad2f" "2d" ...
 $ only_char   : chr  "dd" "af" "h" "ha" ...

The column int_char is a mixture of both, starting/ending with numbers or characters

wolf_wue
  • 296
  • 1
  • 15
  • I can't really do this because I'd have to enter all the data by hand and I have 60,000+ observations. I might try and read it in separately and merge it on afterwards. Thanks for the suggestions though! – MLPNPC May 11 '17 at 09:07