61

I have a data frame taken from a .csv-file which contains numeric and character values. I want to convert this data frame into a matrix. All containing information is numbers (the non-number-rows I deleted), so it should be possible to convert the data frame into a numeric matrix. However, I do get a character matrix.

I found the only way to solve this is to use as.numeric for each and every row, but this is quite time-consuming. I am quite sure there is a way to do this with some kind of if(i in 1:n)-form, but I cannot figure out how it might work. Or is the only way really to already start with numeric values, like proposed here(Making matrix numeric and name orders)?

Probably this is a very easy thing for most of you :P

The matrix is a lot bigger, this is only the first few rows... Here's the code:

cbind(
as.numeric(SFI.Matrix[ ,1]),
as.numeric(SFI.Matrix[ ,2]),
as.numeric(SFI.Matrix[ ,3]),
as.numeric(SFI.Matrix[ ,4]),
as.numeric(SFI.Matrix[ ,5]),
as.numeric(SFI.Matrix[ ,6]))  

# to get something like this again:

Social.Assistance Danger.Poverty GINI S80S20 Low.Edu        Unemployment 
0.147             0.125          0.34    5.5   0.149        0.135 0.18683691
0.258             0.229          0.27    3.8   0.211        0.175 0.22329362
0.207             0.119          0.22    3.1   0.139        0.163 0.07170422
0.219             0.166          0.25    3.6   0.114        0.163 0.03638525
0.278             0.218          0.29    4.1   0.270        0.198 0.27407825
0.288             0.204          0.26    3.6   0.303        0.211 0.22372633

Thank you for any help!

Community
  • 1
  • 1
PikkuKatja
  • 1,101
  • 3
  • 13
  • 21
  • Converting numerics-stored-as-strings back to numerics is trivial. Converting other strings to numerics is impossible (unless they're factors, in which case it's a terrible practice, statistically). As to factors, you didn't mention them, but converting factors to numeric is the only interesting part of this question. – smci Jul 27 '15 at 21:27

6 Answers6

67

Edit 2: See @flodel's answer. Much better.

Try:

# assuming SFI is your data.frame
as.matrix(sapply(SFI, as.numeric))  

Edit: or as @ CarlWitthoft suggested in the comments:

matrix(as.numeric(unlist(SFI)),nrow=nrow(SFI))
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • yes, SFI was the data.frame, and yes, it solved the problem! Thank you! – PikkuKatja May 13 '13 at 09:22
  • 4
    Why not simply `matrix(as.numeric(unlist(SFI)),nr=nrows(SFI))` ? – Carl Witthoft May 13 '13 at 11:26
  • @CarlWitthoft, due to doubt of how the coercion of `unlist` would affect the final result, but you might be right in that regardless of the intermediate coercion, the final coercion from `as.numeric` should produce the same results. Answer updated – Ricardo Saporta May 13 '13 at 16:37
60
data.matrix(SFI)

From ?data.matrix:

Description:

 Return the matrix obtained by converting all the variables in a
 data frame to numeric mode and then binding them together as the
 columns of a matrix.  Factors and ordered factors are replaced by
 their internal codes.
flodel
  • 87,577
  • 21
  • 185
  • 223
  • 3
    this will interpret "123" as a factor and convert it to the related integer level. – antonio Mar 17 '14 at 00:29
  • @antonio. What you say is not true. If the data.frame contains characters, they are converted to numerics, try: `data.matrix(data.frame(x = "123", stringsAsFactors = FALSE))`. It is only if the data.frame contains factors that they are represented by their internal value (as quoted above), try `data.matrix(data.frame(x = "123", stringsAsFactors = TRUE))`. So everything is behaving as I would expect and as documented. – flodel Mar 17 '14 at 00:55
  • Sorry, I meant you don't get straight a number out of string, unless you use `stringsAsFactors` or `as.is` for `read.csv`. – antonio Mar 18 '14 at 19:26
  • data.matrix(as.data.frame(SFI,stringsAsFactors = F) ) – Zhilong Jia Feb 13 '15 at 00:07
  • 1
    one more subtlety: if all values were integer (or can be interpreted as such), the end result is an integer matrix, not a numeric matrix (which e.g. cannot be clustered using `hopach`, and `as.numeric` looses the dimensions again ...). I think in this respect the documentation is unclear in that 'numeric mode' also includes integers. And now that I think about it, it is weird that `as.numeric` always returns a double, that is not very consistent since in all other contexts, `numeric` means `integer-or-double` ... – plijnzaad Jul 05 '16 at 11:52
  • converting a data.table which has factor column types to a matrix with data.matrix will result in an integer matrix, not numeric. – Ahmadov Nov 17 '16 at 19:03
  • data.matrix() is slow compared to as.matrix() – user3503711 Mar 03 '20 at 22:49
  • @user3503711, OP mentions how his data converts to a *character* matrix, so `as.matrix` is not enough. It's the necessary conversion to numeric that slows things down. – flodel Mar 15 '20 at 14:38
10

Here is an alternative way if the data frame just contains numbers.

apply(as.matrix.noquote(SFI),2,as.numeric)

but the most reliable way of converting a data frame to a matrix is using data.matrix() function.

TPArrow
  • 1,518
  • 18
  • 25
0

Another way of doing it is by using the read.table() argument colClasses to specify the column type by making colClasses=c(*column class types*). If there are 6 columns whose members you want as numeric, you need to repeat the character string "numeric" six times separated by commas, importing the data frame, and as.matrix() the data frame. P.S. looks like you have headers, so I put header=T.

as.matrix(read.table(SFI.matrix,header=T,
colClasses=c("numeric","numeric","numeric","numeric","numeric","numeric"),
sep=","))
Capt.Krusty
  • 597
  • 1
  • 7
  • 26
0

I had the same problem and I solved it like this, by taking the original data frame without row names and adding them later

SFIo <- as.matrix(apply(SFI[,-1],2,as.numeric))
row.names(SFIo) <- SFI[,1]
-2

I manually filled NAs by exporting the CSV then editing it and reimporting, as below.

Perhaps one of you experts might explain why this procedure worked so well (the first file had columns with data of types char, INT and num (floating point numbers)), which all became char type after STEP 1; but at the end of STEP 3 R correctly recognized the datatype of each column).

# STEP 1:
MainOptionFile <- read.csv("XLUopt_XLUstk_v3.csv",
                            header=T, stringsAsFactors=FALSE)
#... STEP 2:
TestFrame <- subset(MainOptionFile, str_locate(option_symbol,"120616P00034000") > 0)
write.csv(TestFrame, file = "TestFrame2.csv")
# ...
# STEP 3:
# I made various amendments to `TestFrame2.csv`, including replacing all missing data cells with appropriate numbers. I then read that amended data frame back into R as follows:    
XLU_34P_16Jun12 <- read.csv("TestFrame2_v2.csv",
                            header=T,stringsAsFactors=FALSE)

On arrival back in R, all columns had their correct measurement levels automatically recognized by R!

smci
  • 32,567
  • 20
  • 113
  • 146
  • You replaced missing data with numbers? How'd that analysis go? – Rich Scriven Sep 03 '14 at 00:06
  • The data missing were stock price quotes in two blocks of cells, Richard. So I manually supplied them. I am guessing that what was key was the outputting of the file by R at Step 2, which must have facilitated R's correct interpretation of every column when the file was returned to it at Step 3. Anyway, it was a *big* file, so i was really happy to avoid having to describe data structures for individual columns. – user3315638 Sep 03 '14 at 02:26
  • @user3315638: exporting and reimporting was totally unnecessary, all you are doing is `sapply(df[,StringColsToChangeToNumeric], as.numeric)` – smci Jul 27 '15 at 21:35
  • @RichardScriven: in real-world datasets (financial, weblog etc.), filling or imputing NAs is not only important but necessary (obviously, caveats apply). Having said that, this export-CSV-edit-reimport is unnecessary and error-prone and can be replaced with the one-liner above. – smci Jul 27 '15 at 21:36