I'm doing a discrete wavelet transform using the following code in R. My data in data.table format is fetched from a hive table in chunks and converted into a matrix and then the wavelet transform is applied as below.
library(parallel)
library(wavelets)
# Function to create discrete wavelet transform from data in matrix format
createWt <- function(d_matrix){
wtScore <- NULL
for (i in 1:nrow(d_matrix)){
a <- d_matrix[i,]
wt <- dwt(a, filter= "haar" , boundary = "periodic" )
wtScore <- rbind(wtScore, unlist(c(wt@W,wt@V[[wt@level]])))
}
return(wtScore)
}
# applying the function to a list of matrices parallely using mclapply
wtScore <- parallel::mclapply(m_score, createWt, mc.cores = 28)
The discrete wavelet transform takes about 3 hrs, 30 minutes for a dataset of 10 million x 52 columns on a LINUX machine with 32 cores where I'm using 28 cores for my processing as noted above. But I have to do this on a dataset of 30-35 million rows X 52 columns and it takes about 26 hrs to run for a dataset of 30 million rows. m_score
above is a list of chunked matrices converted from the data.table.
Any ideas on doing it faster in R :
- looking for any specialized libraries in R or other languages.
- since the data is coming from a hive table, I'm also open to doing the transform using a Hive UDF, but couldn't find a UDF for wavelet transform.