0

I want to connect R to Athena in AWS so that I can get a table from the database into R. So I went online and I googled how to do this. I found this website here. That told me that I need to install drivers. I have a mac (which is also new to me) and I found under the section mac on this website that I need to install homebrew which I did. I then followed these next steps in the terminal.

  1. Install UnixODBC, which is required for all databases

    brew install unixodbc

  2. Install common DB drivers (optional)

    brew install freetds --with-unixodbc

    brew install psqlodbc

I dont usually work in the terminal. So Im not too familiar with it. Anyways I thought that did it so I ran the following code.

con <- DBI::dbConnect(
  odbc::odbc(),
  Driver             = "FreeTDS",
  S3OutputLocation   = " etc..",
  AwsRegion          = "etc..",
  AuthenticationType = "...",
  Schema             = "...",
  UID                = rstudioapi::askForPassword("AWS Access Key"),
  PWD                = rstudioapi::askForPassword("AWS Secret Key")
  )

When I ran this code I got the following error:

Error: nanodbc/nanodbc.cpp:983: 00000: [unixODBC][Driver Manager]Can't open lib 'FreeTDS' : file not found 

Of course I googled the error and I found some interesting stuff on stack exchange. After playing around in the terminal though I got these responses:

sudo Rscript -e 'odbc::odbcListDrivers()'
[1] name      attribute value    
<0 Zeilen> (oder row.names mit Länge 0)

Showing zero rows and row.names with a length of 0.

I also ran this

cp /etc/odbcinst.ini ~/.odbcinst.ini && Rscript -e 'odbc::odbcListDrivers()

and I get this

cmdand quote> '
cp: /etc/odbcinst.ini: No such file or directory

I don't understand why this is the case because I completed steps one and two.

Nick
  • 369
  • 1
  • 3
  • 18
  • 1
    It looks like the second link for instructions you included is for connecting to traditional ODBC databases, such as Microsoft SQL Server, which is what FreeTDS is used to connect to. Have you considered trying the RAthena package which was made for this? https://cran.r-project.org/web/packages/RAthena/readme/README.html – FlipperPA Jul 25 '20 at 11:47
  • Thank you for sending the link to me. This will probably work better for me but I just had a question about the parameters that it shows here ```con <- dbConnect(RAthena::athena(), aws_access_key_id='YOUR_ACCESS_KEY_ID', aws_secret_access_key='YOUR_SECRET_ACCESS_KEY', s3_staging_dir='s3://path/to/query/bucket/', region_name='eu-west-1')``` Is an access key id simply my user log in details or is this something special that someone has to give me access to. – Nick Jul 31 '20 at 09:04
  • I also wanted to ask if you know as well where I can see the path to query bucket is. I see for example in Athena in setting there is a path for Query result location but I don't know if this is the same thing – Nick Jul 31 '20 at 09:12
  • An `ACCESS_KEY_ID` and `SECRET_ACCESS_KEY` are generated together in Amazon Web Services IAM. It is somewhat equivalent to a username and password for an API. You'll need to generate a pair and then give it the appropriate permissions. Unfortunately, this is the end of my knowledge here; I'm not a heavy R or AWS user. :) – FlipperPA Jul 31 '20 at 15:18

1 Answers1

1

This is to extend what @FlipperPA has mentioned earlier. The s3_staging_dir is the AWS S3 bucket where AWS Athena outputs it's results. By default RAthena tries to keep this tidy by removing adhoc queries results. However this can be stopped by using query caching (https://dyfanjones.github.io/RAthena/articles/aws_athena_query_caching.html).

If you want to get the AWS S3 path of a query you can do the following:

library(DBI)

# connect to AWS Athena, credentials are stored in:
# .aws/credentials or environmental variables
con <- dbConnect(RAthena::athena(), s3_staging_dir="s3://athena/output/")

# Start caching queries
RAthena_options(cache_size = 10)

# Execute a query on AWS Athena
res <- dbExecuteQuery(con,  "select * from sampledb.elb_logs")

# AWS S3 Location of query
sprintf("%s%s.csv",res@connection@info$s3_staging, res@info$QueryExecutionId)

Linked stackoverflow question: Can you use Athena ODBC/JDBC to return the S3 location of results?

Dyfan Jones
  • 229
  • 2
  • 9
  • Thanks for the explanation and showing how one can get the s3_staging_dir. I also just wanted to add for anyone reading that the s3_staging_dir can also be found in athena where you write SQL queries. If you click on settings it will show you the path right next to where it says "Query result location" – Nick Aug 24 '20 at 10:46