-3

I'm trying to help my friend, Director of Sales, make sense of his logged call data. There is one column in particular in which he is interested, "Disposition". This column has string values and I'm trying to convert them to numeric values (i.e. "Not Answered" converted to 1, "Answered" converted to 2, etc.) and remove any row with no values entered. I've created data frames, used as.numeric, created and deleted columns/rows, etc. to no avail. I'm just trying to run simple R code to give him some insight. Any and all help is much appreciated. Thanks in advance!

P.S. I'm unsure as to whether I should provide some code due to the fact that there is a lot of delicate information (personal phone numbers and emails).

Chaz
  • 1
  • 1
  • 1
  • 1
  • I'd say you should look at `?factor`. Otherwise, build a lookup table (as a named vector). [this post](https://stackoverflow.com/questions/37239715/convert-letters-to-numbers/37239786) might be worth reading. Beyond this, please provide a reproducible example. Otherwise, it may be difficult to help. You might put together some dummy data if you can't share actual data. – lmo May 10 '18 at 22:57
  • Some packages you can use to make fake data: [generator](https://github.com/paulhendricks/generator), [wakefield](https://github.com/trinker/wakefield), [charlatan](https://github.com/ropensci/charlatan) – neilfws May 10 '18 at 23:30

2 Answers2

0

First off: You should always provide representative sample data; if your data is sensitive in nature, provide mock-up data.

That aside, to recode a character vector as numeric you could convert to factor and then use as.numeric. For example:

# Sample data
column <- c("Not Answered", "Answered", "Something else", "Others")

# Convert character vector to factor
column <- factor(column, levels = as.character(unique(column)))

# Convert to numeric
as.numeric(column);
#[1] 1 2 3 4

The numbering can be adjusted by changing the order of the factor levels.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
0

Alternatively, you can create a new column and fill it with the numeric values using an ifelse statement. To illustrate, let's assume this is your dataframe:

df <- data.frame(
  Disposition = c(rep(c("answer", "no answer", "whatever", NA),3)),
  Anything = c(rnorm(12))
)
df

   Disposition    Anything
1       answer  2.54721951
2    no answer  1.07409803
3     whatever  0.60482744
4         <NA>  2.08405038
5       answer  0.31799860
6    no answer -1.17558239
7     whatever  0.94206106
8         <NA>  0.45355501
9       answer  0.01787330
10   no answer -0.07629330
11    whatever  0.83109679
12        <NA> -0.06937357

Now you define a new column, say df$Analysis, and assign to it numbers based on the information in df$Disposition:

df$Analysis <- ifelse(df$Disposition=="no answer", 1,
                      ifelse(df$Disposition=="answer", 2, 3))
df

      Disposition    Anything Analysis
1       answer  2.54721951        2
2    no answer  1.07409803        1
3     whatever  0.60482744        3
4         <NA>  2.08405038       NA
5       answer  0.31799860        2
6    no answer -1.17558239        1
7     whatever  0.94206106        3
8         <NA>  0.45355501       NA
9       answer  0.01787330        2
10   no answer -0.07629330        1
11    whatever  0.83109679        3
12        <NA> -0.06937357       NA

The advantage of this method is that you keep the original information unchanged. If you now want to remove Na values in the dataframe, use na.omit. NB: this will remove not only the NA values in df$Disposition but any row with NA in any column:

df_clean <- na.omit(df)
df_clean

   Disposition    Anything Analysis
1       answer  2.5472195        2
2    no answer  1.0740980        1
3     whatever  0.6048274        3
5       answer  0.3179986        2
6    no answer -1.1755824        1
7     whatever  0.9420611        3
9       answer  0.0178733        2
10   no answer -0.0762933        1
11    whatever  0.8310968        3
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34