How to read data from Amazon QLDB using Spark and Scala/PySpark?

Question

I am trying to create a custom ETL data pipeline. I am using Amazon QLDB as my source. But I don't know how to read data from Amazon QLDB using Spark-Scala or Spark-Python.

QLDB documentation providing driver dependencies from below link.

https://docs.aws.amazon.com/qldb/latest/developerguide/getting-started-driver.html

Can anyone help me please. Thanks in advance.

The link you shared contains step by step guide how to use QLDB with Java/Scala and Python. What answer do you expect to receive here? — nickolay.laptev, Dec 23 '19 at 20:37
@nickolay.laptev Hi, I need to extract data from QLDB using spark. But spark not providing any qldb format option for creating dataframe. And I am newbie to this ETL things, so I have little bit confusion also. — Mohan Kumar, Dec 24 '19 at 05:21
Do you have any code that you had written up for this? That would help people to answer better. — Aurgho Bhattacharjee, Mar 15 '20 at 04:37

score 0 · Answer 1 · answered Jun 21 '21 at 22:05

From the QLDB Python documentation, here is how you would read data from QLDB:

def read_documents(transaction_executor):
    cursor = transaction_executor.execute_statement("SELECT * FROM Person WHERE GovId = 'TOYENC486FH'")

    for doc in cursor:
        print(doc["GovId"]) # prints TOYENC486FH
        print(doc["FirstName"]) # prints Brent

qldb_driver.execute_lambda(lambda executor: read_documents(executor))

Does this tell you what you need to do to read from QLDB?

How to read data from Amazon QLDB using Spark and Scala/PySpark?

1 Answers1