-2

How can I import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix?

For example, if I have saved this matrix in R:

A = matrix( 
     c(2, 4, 3, 1, 5, 7), # the data elements 
     nrow=2,              # number of rows 
     ncol=3,              # number of columns 
     byrow = TRUE)        # fill matrix by rows 

dimnames(A) = list( 
     c("row1", "row2"),         # row names 
     c("col1", "col2", "col3")) # column names 

A
save (A, file = 'matrix.RData')

outputs:

> A
     col1 col2 col3
row1    2    4    3
row2    1    5    7

Then loaded in python with rpy2 as follows:

from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects

def main():
    pandas2ri.activate()
    r['load']('matrix.RData')
    variables = tuple(robjects.globalenv.keys())
    print('variables: {0}'.format(variables))
    matrix = robjects.globalenv['A']
    frame = pandas2ri.ri2py(matrix)
    print(frame)
    print('type(frame): {0}'.format(type(frame)))

if __name__ == "__main__":
    main()

which prints:

variables: ('A',)
[[ 2.  4.  3.]
 [ 1.  5.  7.]]
type(frame): <type 'numpy.ndarray'>

The matrix has lost his column names. I would like to keep them by loading the R into a pandas data frame.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

2 Answers2

1

There is a package called feather which saves dataframes in a format which is readable as R and Pandas dataframes.

In R:

write_feather(as.data.frame(A), 'path/df.feather')

In Python:

df = pd.read_feather('path/df.feather')

.

You can find more details here:

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
Deena
  • 5,925
  • 6
  • 34
  • 40
0

You could use colnames (tested with python 2.7):

from __future__ import print_function
from rpy2.robjects import pandas2ri,r
import rpy2.robjects as robjects
import pandas as pd

def load_r_matrix_into_pandas_dataframe(r_matrix):
    '''
    Import a matrix from R saved as RData to a pandas data frame without losing the column names of the R matrix
    https://stackoverflow.com/q/45392308/395857
     - Input: R matrix object
     - Output: Pandas DataFrame
    '''
    numpy_matrix = pandas2ri.ri2py(r_matrix)
    frame_column_names = r_matrix.colnames
    frame = pd.DataFrame(data=numpy_matrix, columns=list(frame_column_names))
    return frame

def main():
    pandas2ri.activate()
    r['load']('matrix.RData')
    variables = tuple(robjects.globalenv.keys())
    print('variables: {0}'.format(variables))
    matrix = robjects.globalenv['A']

    frame = load_r_matrix_into_pandas_dataframe(matrix)
    print('frame: {0}'.format(frame))

if __name__ == "__main__":
    main()
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501