0

After reading a csv file

data<-read.table(paste0('C:/Users/data/','30092017ARB.csv'),header=TRUE, sep=";")

I get for rather all numeric variable factor as the type, specially for the last column. enter image description here

I tried all suggestion here However, I get a warning for all suggestions

Warning message:
NAs introduced by coercion 

Some one mentioned even in this post:

"Every answer in this post failed to generate results for me , NAs were getting generated."

any idea how can I solve this problem?

Addendum: in the following pic you can see one possible approach suggested in here

enter image description here

However, I get always the same NA .

maniA
  • 1,437
  • 2
  • 21
  • 42
  • can out look at which are na? It is possible that you missing that data... – Adam Warner Jun 19 '18 at 15:17
  • 1
    1) The percent sign is clearly the problem. Do `data[[3]] <- sub("%", "", data[[3]])` then convert to numeric. 2) When reading, in order to avoid problems with factors use argument `stringsAsFactors = FALSE`. – Rui Barradas Jun 19 '18 at 15:26
  • 1
    3) Are your data coming from countries where the decimal point is a comma? If so, consider `read.csv2`. See `help("read.table")`. for details. (`read.csv` and `read.csv2` are just `read.table` with some defaults changed.) – Rui Barradas Jun 19 '18 at 15:29
  • @RuiBarradas thanks a lot. You are principally right. The problem is just with "%" . Would you write your comment as an answer, then I can accept your answer. – maniA Jun 20 '18 at 08:50
  • @maniA Done, glad it helped. – Rui Barradas Jun 20 '18 at 09:52

1 Answers1

0

The percent sign is clearly the problem. Replace the "%" by the empty string, "", and then convert to numeric.

data[[3]] <- sub("%", "", data[[3]]) 
data[[3]] <- as.numeric(data[[3]])

You can do this in one line of code,

data[[3]] <- as.numeric(sub("%", "", data[[3]]))

Also, two notes on reading the data in.

First, some files use the semi-colon as a column separator. This is very used in countries where the decimal point is the comma. That is why R has two functions to read files in the CSV format.

These functions are both calls to read.table with some defaults changed.

  • read.csv - Sets arguments header = TRUE and sep = ",".
  • read.csv2 - Sets arguments header = TRUE, sep = ";" and dec = ",".

For a full explanation see read.table or at an R prompt run help("read.table").

Second, you can avoid factor problems if you use argument stringsAsFactors = FALSE from the start, when reading in the data.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66