1

I have a SQL connection to a table on my SQLServer, which I have imported with the following line:

master_table <- RxSqlServerData(etc...)

Then, my goal is to save/import this table using rxImport and save it to a .xdf file, which I have called readTest <- 'read_test.xdf

The table is quite large, so I have set this in my rxImport:

rxImport(master_table, outFile=readTest, rowsPerRead=100000,reportProgress=1)

However, it has been running for 10 minutes now, and NO progress of rows being read/imported is being printed on the screen. Did I do this correctly? I wanted to output similar "progress" that is printed when a ML algorithm is run like RxForest or similar?

Thanks.

1 Answers1

1

It's possible that the connection to your SQL Server database is relatively slow, report progress will only show progress when a batch of rows is complete. If the rows are relatively large, you could see nothing returned on the terminal for quite some time.

For best performance with rxImport(), ensure that rowsPerRead is the largest possible size that your local machine memory can handle. This will make progress reports less frequent, but, it will give you a faster import time. The only case where this isn't true is when importing an XDF file.

  • Hi thanks for that. Even in cases, where I'm doing something like rxSummary() on an xdf file, and my file has 10,000,000 rows, I set rowsperRead = 10,000, and reportProgress=1, but it only ever prints out: rxImport('sampleonemil.csv',outFile=readTest,rowsPerRead=10000,reportProgress=1,overwrite=T) Rows Processed: 10000000, it doesn't show that rows are being processed in batches of 10,000 as I specified, it just does the whole thing, is this correct behavior? – Dr. Ikjyot Singh Kohli Jun 15 '17 at 18:00
  • 1
    I have just tested an example like this locally, and when I do not specify reportProgress, I get a report per batch of rows. If I specify reportProgess = 1, I simply get output when rxImport() completes. I believe you should specify reportProgress = 4 (which is the default, if unspecified). The documentation could be clearer on this. – Kirill Glushko - Microsoft Jun 15 '17 at 18:42
  • I think I figured it out. I installed R-Server on my machine (The developer version) and the progress messages started to appear. R-Client seems to be extremely limited in terms of chunk processing (as the documentation suggests...), were you running on R-Server as well? – Dr. Ikjyot Singh Kohli Jun 15 '17 at 19:33
  • Yes, I am running full Microsoft R Server 9.1 – Kirill Glushko - Microsoft Jun 15 '17 at 22:03
  • 1
    Okay. That seems to be the issue then. When you are running in SQL server or R-Server the progress shows, but R-Client tries to load the whole dataset into memory (without breaking into chunks), so that would explain it. Wish MS would make this more clear in the initial docs! :) – Dr. Ikjyot Singh Kohli Jun 15 '17 at 22:05