1

I have a function where I'm reading an xdf file using rxXdfToDataFrame and using a variable in my expression for rowSelection. If I don't pass transformEnvir=environment(), the variable is not found. My problem is that after calling the function with transformEnvir, I can't seem to reliably access .GlobalEnv. If I hardcode a number into rowSelection I don't need to use transformEnvir and everything works correctly. I tried setting the environment, but I'm not sure I was even doing it correctly.

The following code reproduces my problem:

envirtest = function()
{
   require(data.table)
   df = data.frame(x=1:10)
   selectnum = 5
   rxDataFrameToXdf(df, "testxdf.xdf")
   testdf = rxXdfToDataFrame("testxdf.xdf",rowSelection=(x==selectnum),transformEnvir=environment())
   testdt = setDT(testdf)
}

The error that occurs:

Error in envirtest() : could not find function "setDT"

However, if instead of setDT(), data.table::setDT() is used, then the function executes.

edit: I forgot to mention that I had tried it without transformEnvir set and everything worked properly. Also, tables() was changed to setDT() to avoid possible confusion.

Andrie
  • 176,377
  • 47
  • 447
  • 496
user3747260
  • 465
  • 1
  • 5
  • 14
  • can you try with `require(data.table)` outside the function call – Silence Dogood Jun 18 '14 at 07:21
  • It doesn't change anything. Also, objects inside the workspace are not found. – user3747260 Jun 18 '14 at 07:25
  • see if you can access the workspace objects using `get('object_name')` – Silence Dogood Jun 18 '14 at 07:44
  • Ah, so `get('object_name')` doesn't work, but I just found out that `get('object_name', envir=globalenv())` works. – user3747260 Jun 18 '14 at 07:50
  • I don't think this has anything to do with your xdf functions. The same behaviour occurs when you remove all RevoScaleR functions. I can't work on this now, but will take a look again tomorrow morning. – Andrie Jun 18 '14 at 20:47
  • I have tried it without `transformEnvir` and there is no error, whereas with `transformEnvir` there's an error. tables() was probably a poor choice for an example as it leads to an object not being found, but I was trying to demonstrate that the function wasn't being found. – user3747260 Jun 19 '14 at 00:39
  • You are correct. I have traced the problem and posted an answer. I will also contact our support team with this information, so at the very least we have better information in the manuals. – Andrie Jun 19 '14 at 07:17

1 Answers1

2

Here is a solution to your problem, together with a partial explanation:

  • At the completion of the transformation, the transformation environment gets cleared.
  • This means it is safer to create an environment and then adding any objects into this environment before starting the rx-function.

Concretely:

env <- new.env()
env$selectnum = 5

Set up your function like this:

envirtest = function()
{
  require(data.table)
  df = data.frame(x=1:10)
  env <- new.env()
  env$selectnum = 5

  rxDataFrameToXdf(df, "testxdf.xdf", overwrite=TRUE)
  testdf <- rxXdfToDataFrame("testxdf.xdf",
                             rowSelection=(x==selectnum),
                             transformEnvir=env
  )
  setDT(testdf)
}

Now try it:

x <- envirtest()

Rows Read: 10, Total Rows Processed: 10, Total Chunk Time: 0.006 seconds 
Rows Processed: 1
Time to read data file: 0.00 secs.
Time to convert to data frame: less than .001 secs.

str(x)

Classes ‘data.table’ and 'data.frame':  1 obs. of  1 variable:
 $ x: int 5
 - attr(*, ".internal.selfref")=<externalptr> 
Andrie
  • 176,377
  • 47
  • 447
  • 496