2

I found out that you can execute R within U-SQL. So i took a R-script from one of our data-scientists and build a U-SQL script based on this sample script.

The adapted script:

DECLARE @INPUT_DAT string = 
@"/Samples/Data/dat2json/validationData.dat.201805271617";
DECLARE @OUTPUT string = @"/Samples/Output/validationdata.out";

REFERENCE ASSEMBLY [ExtR];

DECLARE @myRScript = @"
datavector <- as.vector(readBin(@INPUT_DAT, "double", size = 4, n = 99000))
Size <- length(datavector)
numberOfPixels <- Size / 84
MaterialBase <- factor(rep(c("Plastic", "Aluminum"), each = (Size / 2)))
ThicknessBase <- factor(rep(c(rep(c(0, 10, 20, 30, 40, 50), times = 7), 
rep(c(0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0), each = 6)), each = numberOfPixels))
ThicknessIterated <- factor(rep(c(rep(c(0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0), 
each = 6), rep(c(0, 10, 20, 30, 40, 50), times = 7)), each = numberOfPixels))
Pixel <- rep(1:numberOfPixels, times = 84)
dflabel <- data.frame(MaterialBase, ThicknessBase, ThicknessIterated, Pixel, 
Value = datavector)
";

@RScriptOutput = REDUCE @myRScript USING new 
Extension.R.Reducer(command:@myRScript, rReturnType:"dataframe");
OUTPUT @ScriptOutput
TO @OUTPUT
USING Outputters.Tsv();

The problem is that when I build the code, Visual Studio stops on line 6, after @". Intellisense also show a red ~ sign indicating that something is wrong. The error it generates is: Expected one od: OPTION ';'

The R-script works perfectly in R-studio.

Update 2018-07-19: I have narrowed it a bit down. The problem is the double quotes in the @myRScript variable. So I changed the code to the following:

DECLARE @INPUT_DAT string = 
@"/dat2json/data/validationData.dat.201805271617";
DECLARE @OUTPUT string = @"/dat2json/data/validationdata.out";

DECLARE @vartype string = "double";

DECLARE @var1 string = "Plastic";
DECLARE @var2 string = "Aluminum";

REFERENCE ASSEMBLY [ExtR];

DECLARE @myRScript string = @"
datavector <- as.vector(readBin(@INPUT_DAT, @vartype, size = 4, n = 99000))
Size <- length(datavector)
numberOfPixels <- Size / 84
MaterialBase <- factor(rep(c(@var1, @var2), each = (Size / 2)))
ThicknessBase <- factor(rep(c(rep(c(0, 10, 20, 30, 40, 50), times = 7), 
rep(c(0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0), each = 6)), each = numberOfPixels))
ThicknessIterated <- factor(rep(c(rep(c(0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0), 
each = 6), rep(c(0, 10, 20, 30, 40, 50), times = 7)), each = numberOfPixels))
Pixel <- rep(1:numberOfPixels, times = 84)
dflabel <- data.frame(MaterialBase, ThicknessBase, ThicknessIterated, Pixel, 
Value = datavector)
";

@RScriptOutput = REDUCE @myRScript ON MaterialBase USING new 
Extension.R.Reducer(command:@myRScript, rReturnType:"dataframe");
OUTPUT @ScriptOutput
TO @OUTPUT
USING Outputters.Tsv();

But now I get an other error: E_CSC_USER_ROWSETVARIABLENOTFOUND: Rowset variable @myRScript was not found. Description: Rowset variables must be assigned to before they can be referenced. Resolution: Assign a rowset to the rowset variable or remove the reference.

Looks like I have to put the rsult of the R-script into a variable an use that one in the REDUCE statement. But how to do that?

jbazelmans
  • 283
  • 1
  • 6
  • 16
  • I have come tot he conclusion that what I want is not going to work. U_SQL needs a structured input and that's not what I have. So we need to ge for another solution – jbazelmans Jul 20 '18 at 06:59

1 Answers1

0

Deploy R script as resource. The script file is deployed into the vertex workspace and is accessible from any custom code.

DECLARE @rScriptFile string = @"MyR2.R";
DECLARE @rScriptDeploy string = @"/rscripts/" + @rScriptFile;

DEPLOY RESOURCE @rScriptDeploy;

@inputQuery2 =
    REDUCE @inputQuery1
    ON Par
    PRODUCE Par,
            ...
    READONLY Par
    USING new Extension.R.Reducer(scriptFile : @rScriptFile, rReturnType : "dataframe");
Miguel Domingues
  • 440
  • 3
  • 11