1

I am trying to create a custom ETL data pipeline. I am using Amazon QLDB as my source. But I don't know how to read data from Amazon QLDB using Spark-Scala or Spark-Python.

QLDB documentation providing driver dependencies from below link.

https://docs.aws.amazon.com/qldb/latest/developerguide/getting-started-driver.html

Can anyone help me please. Thanks in advance.

Mohan Kumar
  • 63
  • 1
  • 8
  • The link you shared contains step by step guide how to use QLDB with Java/Scala and Python. What answer do you expect to receive here? – nickolay.laptev Dec 23 '19 at 20:37
  • @nickolay.laptev Hi, I need to extract data from QLDB using spark. But spark not providing any qldb format option for creating dataframe. And I am newbie to this ETL things, so I have little bit confusion also. – Mohan Kumar Dec 24 '19 at 05:21
  • Do you have any code that you had written up for this? That would help people to answer better. – Aurgho Bhattacharjee Mar 15 '20 at 04:37

1 Answers1

0

From the QLDB Python documentation, here is how you would read data from QLDB:

def read_documents(transaction_executor):
    cursor = transaction_executor.execute_statement("SELECT * FROM Person WHERE GovId = 'TOYENC486FH'")

    for doc in cursor:
        print(doc["GovId"]) # prints TOYENC486FH
        print(doc["FirstName"]) # prints Brent

qldb_driver.execute_lambda(lambda executor: read_documents(executor))

Does this tell you what you need to do to read from QLDB?

alpian
  • 4,668
  • 1
  • 18
  • 19