how to optimize my MIP model to find the best combination of columns and rows with minimum of NAN values?

Question

I have a dataframe with lots of NAN values. My objective is to find the best conbination of columns and rows to maximize my data and minimize the NAN values. One solution I found is to use the ompr package and create a MIP model to solve the problem.

Here's the model :

m <- +!is.na(M) # gets logical matrix; 0 if NA else 1    
nr <- nrow(m)
nc <- ncol(m)
n_years <- 17

model <- MIPModel() %>% 
  
  # keep[i,j] is 1 if matrix cell [i,j] is to be kept else 0
  add_variable(keep[i,j], i = 1:nr, j = 1:nc, typ = "binary") %>% 
  
  # rm_row[i] is 1 if row i is selected for removal else 0
  add_variable(rm_row[i], i = 1:nr, type = "binary") %>% 
  
  # rm_col[j] is 1 if column j is selected for removal else 0
  add_variable(rm_col[j], j = 1:nc, type = "binary") %>% 
  
  # maximize good cells kept
  set_objective(sum_expr(keep[i,j], i = 1:nr, j = 1:nc), "max") %>% 
  
  # cell can be kept only when row is not selected for removal
  add_constraint(sum_expr(keep[i,j], j = 1:nc) <= 1 - rm_row[i], i = 1:nr) %>%
  
  # cell can be kept only when column is not selected for removal
  add_constraint(sum_expr(keep[i,j], i = 1:nr) <= 1 - rm_col[j], j = 1:nc) %>%
  
  # only non-NA values can be kept
  add_constraint(m[i,j] + rm_row[i] + rm_col[j] >= 1, i = 1:nr, j = 1:nc) %>% 
  
  # keep at most n columns i.e. remove at least (nc - n_years) columns

  add_constraint(sum_expr(rm_col[j], j = 1:nc) >= nc - n_years) 

model

This model works just fine as long as my data is not huge which is not the case for my data (3500 rows x 180 columns)

this model has a lot of contsraints which is why it takes too long to solve. Is there another way to recreate this model so it has less constraints and calculates faster?

how to optimize my MIP model to find the best combination of columns and rows with minimum of NAN values?

0 Answers0