5

I have been trying to create connection between the Google cloud storage and RStudio server(The one I spinned up in Google cloud), so that I can access the files in R to run sum analysis on. I have found three different ways to do it on the web, but I don't see many clarity around these ways so far.

  1. Access the file by using the public URL specific to the file [This is not an option for me]
  2. Mount the Google cloud storage as a disc in RStudio server and access it like any other files in the server [ I saw someone post about this method but could not find on any guides or materials that shows how it's done]
  3. Using the googleCloudStorageR package to get full access to the Cloud Storage bucket.

The step 3 looks like the pretty standard way to do it. But I get following error when I try to hit the gcs_auth() command

Error in gar_auto_auth(required_scopes, new_user = new_user, no_auto = no_auto, : Cannot authenticate - options(googleAuthR.scopes.selected) needs to be set to includehttps://www.googleapis.com/auth/devstorage.full_control or https://www.googleapis.com/auth/devstorage.read_write or https://www.googleapis.com/auth/cloud-platform

The guide on how to connect using this is found on https://github.com/cloudyr/googleCloudStorageR but it says it requires a service-auth.json file to set the environment variables and all other keys and secret keys, but do not really specify on what these really are.

If someone could help me know how this is actually setup, or point me to a nice guide on setting the environment up, I would be very much grateful.

Thank you.

yabtzey
  • 391
  • 1
  • 7
  • 17
  • There seems to be a 3rd way to do this using bigqueryR package as well. Just download and load the pacakge "bigqueryR" in R. Then run the command bqr_auth() . This generates the authentication files that gcs_auth() can use to authorize as well. – yabtzey Sep 10 '18 at 19:11

2 Answers2

4

Before using any services by google cloud you have to attach your card.
So, I am assuming that you have created the account, after creating the account go to Console ,if you have not created Project then Create Project, then click on sidebar find APIs & Services > Credentials.
Then,
1)Create Service Account Keys save this File in json you can only download it once.
2)OAuth 2.0 client ID give the name of the app and select type as web application and download the json file.

Now For Storage go to Sidebar Find Storage and click on it.
Create Bucket and give the name of Bucket.
I have added the single image in bucket, you can also add for the code purpose.

lets look how to download this image from storage for other things you can follow the link that you have given.

First create environment file as .Renviron so it automatically catches the json file and save it in a working directory.

In .Renviron file add those two downloaded json files like this
GCS_AUTH_FILE="serviceaccount.json"  
GAR_CLIENT_WEB_JSON="Oauthclient.json"

#R part
library(googleCloudStorageR)
library(googleAuthR)

gcs_auth()   # for authentication

#set the scope
gar_set_client(scopes = c("https://www.googleapis.com/auth/devstorage.read_write",
                      "https://www.googleapis.com/auth/cloud-platform"))    

gcs_get_bucket("you_bucket_name") #name of the bucket that you have created
gcs_global_bucket("you_bucket_name") #set it as global bucket
gcs_get_global_bucket() #check if your bucket is set as global,you should get your bucket name

objects <- gcs_list_objects()  # data from the bucket as list
names(objects)
gcs_get_object(objects$name[[1]], saveToDisk = "abc.jpeg")   #save the data 



**Note :**if you dont get json file loaded restart the session using .rs.restartR()
 and check the using 
Sys.getenv("GCS_AUTH_FILE")
Sys.getenv("GAR_CLIENT_WEB_JSON")
#it should show the files 
geekzeus
  • 785
  • 5
  • 14
  • Thank you! This is much clearer, I was not being able to tie all this processes together by looking at the instructions available. I will let you know how it goes. – yabtzey Aug 17 '18 at 18:54
  • 2
    Glad it helped you!.:) – geekzeus Aug 18 '18 at 08:08
  • Thank you for that - I followed all of the code - and everything ran fine - except the # set the scope chunk - running 'gar_set_client(scopes = ...)' returned the error below: "Error: $installed not found in JSON - have you downloaded the correct JSON file? (Service account client > Desktop, not Service Account Keys)" I typically don't like asking a question by replying to comments but i think i am very close to the answer- so it's not worth opening a new questions. Thank you again ! – AofWessex Oct 20 '20 at 23:54
  • Hi! I'm running in to a similar issue: I follow the instructions up until gcs_auth() and I still get an Authentication error (Error 400: invalid_request Missing required parameter: client_id) when the browser window opens and am unable to continue. Any idea what the client_id is? Thanks! – Ana May 31 '21 at 17:49
2

You probably want the FUSE adaptor - this will allow you to mount your GCS bucket as a directory on your Server.

  1. Install gcsfuse on the R server.
  2. create a mnt directory.
  3. run gcsfuse your-bucket /path/to/mnt

Be aware though that RW performance isnt great vis FUSE

Full documentation

https://cloud.google.com/storage/docs/gcs-fuse

Dan
  • 224
  • 1
  • 5
  • 2
    You probably want to add more detail, at the moment it is a "link only" answer. – zx8754 Aug 17 '18 at 15:24
  • Thanks!! I was able to mount the cloud storage into the RStudio server using Fuse and read the files! That was great, thanks again :) For the RW performance, so you are saying there will be some delay when I want to read in files from GCS into R correct? I guess once I've got the file pulled into R memory (doing something like a read.file or read.csv), then its all upto the R server's memory on how the further operations perform. Let me know if I am wrong – yabtzey Aug 17 '18 at 16:26