I want to execute some python functions using data from '.RData' file. I am using the 'pyreadr' python package for the same.
Here is example of R Code
library(data.table)
# Example
data <- data.table(x_num=c(1,1.5,2),
x_int=c(1,2,3))
data$x_int <- as.integer(data$x_int) # Making sure the data is in integer type
data_missing <- data.table(x_num=c(1.5,2,NA,5,6),
x_int=c(1,2,3,NA,5))
data_missing$x_int <- as.integer(data_missing$x_int) # Making sure the data is in integer type
# checking the classes
sapply(data,class)
sapply(data_missing,class)
# Storing the data in RData file
save(data, file = "test_data.RData")
save(data_missing, file = "test_missing_data.RData")
The reason I am storing it in different files is because the 'test_data.RData' is successfully loaded in python, however the 'test_missing_data.RData' is converting values with NA data to object rather than integer datatype.
Here is the Python Code
# Working example
import pyreadr
result=pyreadr.read_r('test_data.RData')
data=result['data']
data.dtypes
# Output
# x_num float64
# x_int int32
# Example where NA values are converted to object datatype
import pyreadr
result=pyreadr.read_r('test_missing_data.RData') # Error
data=result['data_missing']
data.dtypes
# Output
# x_num float64
# x_int object
There is no error message, however I need the datatype to remain in integer even with missing or NA values.
Thank you for your time and help.