How to convert a factor type into a numeric type in R after reading a csv file?

Question

After reading a csv file

data<-read.table(paste0('C:/Users/data/','30092017ARB.csv'),header=TRUE, sep=";")

I get for rather all numeric variable factor as the type, specially for the last column.

I tried all suggestion here However, I get a warning for all suggestions

Warning message:
NAs introduced by coercion

Some one mentioned even in this post:

"Every answer in this post failed to generate results for me , NAs were getting generated."

any idea how can I solve this problem?

Addendum: in the following pic you can see one possible approach suggested in here

However, I get always the same NA .

can out look at which are na? It is possible that you missing that data... — Adam Warner, Jun 19 '18 at 15:17
1) The percent sign is clearly the problem. Do `data[[3]] <- sub("%", "", data[[3]])` then convert to numeric. 2) When reading, in order to avoid problems with factors use argument `stringsAsFactors = FALSE`. — Rui Barradas, Jun 19 '18 at 15:26
3) Are your data coming from countries where the decimal point is a comma? If so, consider `read.csv2`. See `help("read.table")`. for details. (`read.csv` and `read.csv2` are just `read.table` with some defaults changed.) — Rui Barradas, Jun 19 '18 at 15:29
@RuiBarradas thanks a lot. You are principally right. The problem is just with "%" . Would you write your comment as an answer, then I can accept your answer. — maniA, Jun 20 '18 at 08:50

score 0 · Accepted Answer · answered Jun 20 '18 at 09:52

The percent sign is clearly the problem. Replace the "%" by the empty string, "", and then convert to numeric.

data[[3]] <- sub("%", "", data[[3]]) 
data[[3]] <- as.numeric(data[[3]])

You can do this in one line of code,

data[[3]] <- as.numeric(sub("%", "", data[[3]]))

Also, two notes on reading the data in.

First, some files use the semi-colon as a column separator. This is very used in countries where the decimal point is the comma. That is why R has two functions to read files in the CSV format.

These functions are both calls to read.table with some defaults changed.

read.csv - Sets arguments header = TRUE and sep = ",".
read.csv2 - Sets arguments header = TRUE, sep = ";" and dec = ",".

For a full explanation see read.table or at an R prompt run help("read.table").

Second, you can avoid factor problems if you use argument stringsAsFactors = FALSE from the start, when reading in the data.

How to convert a factor type into a numeric type in R after reading a csv file?

1 Answers1