3

I'm trying to create a cluster on EC2. I have an account setup and validated with AWS. I have successfully downloaded and installed the segue package and related packages and set my AWS credentials. My problem starts when I try to create a cluster and I get the following:

> library(segue)
Loading required package: rJava
Loading required package: caTools
Loading required package: bitops
Segue did not find your AWS credentials. Please run the setCredentials() function.
> setCredentials('', '') #keys hidden

> myCluster <- createCluster(numInstances=5)
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  com.amazonaws.AmazonClientException: Can't turn bucket name into a URI: Illegal character in authority at index 8: https://c:\users\backup~1\appdata\local\temp\rtmp4u0n8yqaaoducils-segue.s3.amazonaws.com

Any ideas?

screechOwl
  • 27,310
  • 61
  • 158
  • 267
  • what OS are you running? I don't think segue works for windows as of yet. – Chase Nov 17 '11 at 05:47
  • I'm running windows 7. Does that effect local machine or just cluster? In other words, can I run R on local windows machine and just use Linux instance on EC2? – screechOwl Nov 17 '11 at 06:04
  • @JD Long can tell us for certain, but when he mentioned it to me I assumed he was referring to the local machine not working on windows due to PATH issues...which looks like what's going on above perhaps. – Chase Nov 17 '11 at 06:10
  • That line in bold on the Segue site that says, "Segue runs on Mac or Linux. Not currently on Windows." is actually accurate :) The user has no ability to change the OS on the backed, so the local desktop environ is the only OS the user has control over. – JD Long Nov 17 '11 at 13:57
  • Someone want to slap that in an answer for some pts? – screechOwl Nov 17 '11 at 19:00

1 Answers1

6

acesnap, I'm the author of Segue and I can say with confidence that the issue you're running into is that the Segue package has not been implemented to run on the Windows platform. I'm suspicious that the issue is that windows does funny things with file paths, temp files, and the like. The server side of the Segue package is always the Amazon Elastic Map Reduce service which runs Linux, but temporary files are built on the client machine and so Segue must talk nice with the local operating system.

There are several work-arounds I can think of:

  1. Set up Virtual Box on your local machine and get Ubuntu and R installed.

  2. Set up an EC2 machine and install R and Segue and then use that machine to fire off Segue jobs.

  3. Buy a Mac or install Linux on a desktop machine (kinda obvious, I guess)

Even though my desktop machines are Mac and Linux, I use #2 above frequently. I do this because it speeds up the communication between the machine running Segue and the backend cluster. It also reduces the probability that the Segue main machine will lose connectivity to the EMR backend. This is valuable because if communication is lost between Segue and the amazon cloud while a job is running then the job will run on the cloud cluster, but have no way of returning results to the Segue main machine (the machine you submit jobs from).

Iterator
  • 20,250
  • 12
  • 75
  • 111
JD Long
  • 59,675
  • 58
  • 202
  • 294