2

I use Rmarkdown with the reticulate package to weave python and R together. However, the process of converting Pandas DataFrames to R Dataframes doesn't appear to work consistently.

Here is a reproducible example:

---
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{python}
import pandas as pd
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
print(df)
```

```{r}
library(reticulate)
df2 <- reticulate::py$df
print(df2)
print(reticulate::py$df)
```

Expected Result:
I expect a crude rendering of the dataframe (3-times) as follows:

##    a  b  c
## 0  4  5  9

##    a  b  c
## 0  4  5  9

##    a  b  c
## 0  4  5  9

Actual Result:

import pandas as pd
df = pd.DataFrame({'a':4, 'b':5, 'c':9}, index=[0])
print(df)
##    a  b  c
## 0  4  5  9
library(reticulate)
df2 <- reticulate::py$df
print(df2)
##                                   a                                 b
## 1 <environment: 0x000000001dddb808> <environment: 0x000000001decdc58>
##                                   c
## 1 <environment: 0x000000001e000918>
print(reticulate::py$df)
##                                   a                                 b
## 1 <environment: 0x000000001e807f78> <environment: 0x000000001e8fd480>
##                                   c
## 1 <environment: 0x000000001e9ee608>
```

Note, the dataframe prints correctly from python. Once we get into R, it appears as though the R dataframe object is corrupt.

Here is my session information:

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
##  
## Matrix products: default
##  
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
##  
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
##  
## other attached packages:
## [1] reticulate_1.10.0.9004
##  
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0      lattice_0.20-38 digest_0.6.16   rprojroot_1.3-2
##  [5] grid_3.5.2      jsonlite_1.6    backports_1.1.2 magrittr_1.5   
##  [9] evaluate_0.11   stringi_1.1.7   Matrix_1.2-15   rmarkdown_1.10 
## [13] tools_3.5.2     stringr_1.3.1   yaml_2.2.0      compiler_3.5.2 
## [17] htmltools_0.3.6 knitr_1.20
Community
  • 1
  • 1
John
  • 149
  • 3
  • 14
  • A link to a post in the rstudio community https://community.rstudio.com/t/converting-pandas-dataframes-to-r-dataframe-using-reticulate-not-working-consistently/23858/5 – John Feb 14 '19 at 15:46
  • Did you ever figure out the root cause? I spent the last couple of hours looking everywhere. – ba_ul Apr 11 '19 at 22:34
  • Not sure what the root cause was, but I ended up doing a clean swipe, uninstalling and reinstalling everything, anaconda, R, R studio Version 1.2.1280. HTH – John Apr 13 '19 at 01:17

2 Answers2

0

I was able to do it, but the order of functions is a bit different for me. I load the reticulate package early on with the other R packages.

I do the vast majority of the work in Python and then convert it to R to use the DT package for a datable view with Excel and .CSV export buttons.

output:
    html_document:
    toc: false
    toc_depth: 1
---

```{r, loadPython, echo=F}
library(reticulate)
library(tidyverse)
library(DT)

```


```{python, echo=T}
# working with pandas df objects continues from other work
predictions = ts.make_predictions(model,
                               series + ' SARIMAX',
                               start=len(train),
                               end= len(train) + len(oos_exog)-1,
                               exog_data=oos_exog)

# make the OOS intervals
intervals = ts.get_oos_conf_interval(model=model,
                                     steps_ahead=short_horizon,
                                     exog_data = oos_exog)
# this is raw output                                     
print(intervals)
```


```{r, echo=T}
# convert the pandas df object to R DF
r_df <- reticulate::py$intervals


# make a function to make fancy tables in R Markdown using DT package
makeTable <- function(df, end_col){
    datatable(df, extensions = 'Buttons',
              options = list(dom = 'Bfrtip',
                                 buttons = list("excel", "csv")
    )) %>% 
        formatRound(columns = c(1:end_col), digits = 0) 
}
r_df

# output the table
makeTable(r_df, end_col=4)
```
Bryan Butler
  • 1,750
  • 1
  • 19
  • 19
0

I have noticed similar <environment: ...> columns when reading python pandas data frames from R using reticulate. Transforming the column using df["x"].str.strip() for example or df.reset_index() helped in some cases.

Note: the python and R chunks you provide in your question return the expected output on my machine:

> print(reticulate::py$df)
  a b c
0 4 5 9

So it could be that these issues are due to inconsistencies in the pandas or reticulate dependencies.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110