4

Where am I going wrong here? I am in RStudio and I want to do some processing on some text data in Python and bring it back to R for some final analysis / plots but I get the error:

NameError: name 'df_py' is not defined

Data & Code:

text <- c("Because I could not stop for Death -",
              "He kindly stopped for me -",
              "The Carriage held but just Ourselves -",
              "and Immortality")

    ID <- c(1,2,3,4)

    df <- data.frame(cbind(ID, text))


    library(reticulate)

    df_py <- r_to_py(df)

    repl_python()

    df_py_clean['text'] = df_py['text'].str.replace("[^a-zA-Z]", " ")
    df_py_clean['text'] = df_py['text'].str.lower()
    exit
user113156
  • 6,761
  • 5
  • 35
  • 81
  • Isn't `paste(text, collapse = " ")` what you want to do? – pogibas Jul 14 '19 at 15:21
  • Why do you need Python for this? R has `gsub`, `trimws()`, and `tolower()` for your needs. – Parfait Jul 14 '19 at 15:58
  • I know R has awesome cleaning capabilities. I will train a Word2Vec model on the text data and theres a lot more tutorials/documentation on w2v models in Python than in R. I will bring the data back into R after training to use `ggplot` etc. I just used a simple text cleaning example here since it was the first few lines of Python code I needed to run. – user113156 Jul 14 '19 at 16:02

1 Answers1

9

Once we are in the python REPL, use r. to access the object

library(reticulate)
df_py <- r_to_py(df)
repl_python()
>>> r.df_py
#  ID                                    text
#0  1    Because I could not stop for Death -
#1  2              He kindly stopped for me -
#2  3  The Carriage held but just Ourselves -
#3  4                         and Immortality

and now do the transformation

>>> r.df_py['text'] = r.df_py['text'].str.replace("[^a-zA-Z]", " ")
>>> r.df_py['text'] = r.df_py['text'].str.lower()
>>> exit

call the object from R

df_py
# ID                                    text
#0  1    because i could not stop for death  
#1  2              he kindly stopped for me  
#2  3  the carriage held but just ourselves  
#3  4                         and immortality

NOTE: Not clear when the df_py_clean object was created. So, instead of that, here we are updating the same object column from python

NOTE2: The reverse to access python objects from R environment would be py$

data

text <- c("Because I could not stop for Death -",
              "He kindly stopped for me -",
              "The Carriage held but just Ourselves -",
              "and Immortality")
ID <- c(1,2,3,4)
df <- data.frame(ID, text)
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for clarifying the `r.` notation! It was so hard to find the opposite of `py$` in the documentation - is this part of the `reticulate` package? – mfg3z0 May 25 '23 at 16:47
  • 1
    @mfg3z0 you can check [here](https://rstudio.github.io/reticulate/) i.e. the Python REPL where it is mentioned about the `r.` – akrun May 26 '23 at 06:16