extract weights from a RWeka SMOreg model

Question

I am using the awesome RWeka package in order to fit a SMOreg model as implemented in Weka. While everything is working fine, I have some problem extracting the weights from the fitted model.

As all Weka classifier object, my model has a nice print method that shows me all the features and their relative weights. However, I am not able to extract this weights in any way.

You can see for yourself by running the following code:

library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
model_SMOreg <- SMOreg_classifier(mpg ~ ., data = mtcars)

Now, if you simply call the model

model_SMOreg

you'll see that it prints all the features used in the model with their relative weight. I would like to access those weights as a vector or, even better, as a 2-columns table with one column containing the names of the features and the other containing the weights.

I am working on a Windows 7 x64 system, using RStudio Version 1.0.153, R 3.4.2 Short Summer and RWeka 0.4-35.

Does someone know how to do this ?

score 1 · Answer 1 · answered Nov 08 '17 at 13:30

I think you cannot get this in numeric format.

attr(model_SMOreg, "meta")$class                      #  "Weka_classifier"

getAnywhere("print.Weka_classifier")

Result:

A single object matching ‘print.Weka_classifier’ was found
It was found in the following places
  registered S3 method for print from namespace RWeka
  namespace:RWeka
with value

function (x, ...) 
{
    writeLines(.jcall(x$classifier, "S", "toString"))
    invisible(x)
}
<bytecode: 0x8328630>
<environment: namespace:RWeka>

So we see: print.Weka_classifier() makes a .writeLines() call which in turn makes a rJava::.jcall call, which returns a string.

Thus, I think you need to parse the weights yourself, perhaps by calling the capture.output() method.

Thanks! I didn't know the `getAnywhere` command nor the `capture.output` and I see how useful they can be. — fednem, Nov 08 '17 at 16:23

score 0 · Accepted Answer · answered Nov 09 '17 at 13:09

Based on the suggestion of @knb I have wrote a function to extract the weights from a SMOreg model and return a tibble with one column for the features name and one for the features weight, with the row arranged following the absolute value of the weight.

Note that this function only works for the SMOreg classifier, as the output of other classifiers is slightly different in terms of layout. However, I think the function can be easily adapted for other classifiers.

library(stringr)
library(tidyverse)

extract_weights_from_SMOreg <- function(model) {

  oldw <- getOption("warn")
  options(warn = -1)


  raw_output <- capture.output(model)
  trimmed_output <- raw_output[-c(1:3,(length(raw_output) - 4): length(raw_output))]
  df <- data_frame(features_name = vector(length = length(trimmed_output) + 1, "character"), 
                   features_weight = vector(length = length(trimmed_output) + 1, "numeric"))

  for (line in 1:length(trimmed_output)) {


    string_as_vector <- trimmed_output[line] %>%
      str_split(string = ., pattern = " ") %>%
      unlist(.)


    numeric_element <- trimmed_output[line] %>%
      str_split(string = ., pattern = " ") %>%
      unlist(.) %>%
      as.numeric(.)

    position_mul <- string_as_vector[is.na(numeric_element)] %>%
      str_detect(string = ., pattern = "[*]") %>%
      which(.)

    numeric_element <- numeric_element %>%
      `[`(., c(1:position_mul))

    text_element <- string_as_vector[is.na(numeric_element)]


    there_is_plus <- string_as_vector[is.na(numeric_element)] %>%
      str_detect(string = ., pattern = "[+]") %>%
      sum(.)

    if (there_is_plus) { sign_is <- "+"} else { sign_is <- "-"}



    feature_weight <- numeric_element[!is.na(numeric_element)]

    if (sign_is == "-") {df[line, "features_weight"] <- feature_weight * -1} else {df[line, "features_weight"] <- numeric_element[!(is.na(numeric_element))]}

    df[line, "features_name"] <- paste(text_element[(position_mul + 1): length(text_element)], collapse = " ")

  }

  intercept_line <- raw_output[length(raw_output) - 4]


  there_is_plus_intercept <- intercept_line %>%
    str_detect(string = ., pattern = "[+]") %>%
    sum(.)

  if (there_is_plus_intercept) { intercept_sign_is <- "+"} else { intercept_sign_is <- "-"}

  numeric_intercept <- intercept_line %>%
    str_split(string = ., pattern = " ") %>%
    unlist(.) %>%
    as.numeric(.) %>%
    `[`(., length(.))

  df[nrow(df), "features_name"] <- "intercept"

  if (intercept_sign_is == "-") {df[nrow(df), "features_weight"] <- numeric_intercept * -1} else {df[nrow(df), "features_weight"] <- numeric_intercept}

  options(warn = oldw)

  df <- df %>%
    arrange(desc(abs(features_weight)))

  return(df)
}

Here an example for one model

library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
mpg_model_weights <- extract_weights_from_SMOreg(SMOreg_classifier(data = mtcars, mpg ~ .))
mpg_model_weights

extract weights from a RWeka SMOreg model

2 Answers2