Query on SAP table from Azure Databricks

Question

I want to query the SAP table from databricks. I have installed the JDBC library for connecting to the SAP server. I am able to connect and fetch records using spark.read.JDBC(url = jdbcUrl, table = query, properties = connectionProperties). In the table parameter, I am providing the schema and table name. I want to fetch selective records. Is there any way for providing a query?

selective you mean - reading specific record? or do `WHERE = ...` ? — Alex Ott, Nov 17 '21 at 08:57
Yes, use where condition. Instead of providing just the schema and table name, execute a query — Aswad, Nov 17 '21 at 12:05
SAP is a software company, they published many different software. "SAP table" doesn't mean a lot, you'd possibly indicate the name of the database software instead. — Sandra Rossi, Nov 17 '21 at 18:15

score 1 · Answer 1 · answered Nov 17 '21 at 12:59

1

You just need to use standard Spark functions like, .filter/.where - Spark will convert it into corresponding SQL query and will push it down to the database. This functionality (predicate pushdown) is enabled in Spark by default as you may see in documentation.

df = spark.read.JDBC(url = jdbcUrl, table = query, 
       properties = connectionProperties)
filtered = df.filter("condition")

answered Nov 17 '21 at 12:59

Alex Ott

80,552
8
87
132

Thanks for the response. But this way I am querying on the dataframe, is there any way to query on the table itself as the table contains a huge amount of data I don't want to fetch all the records. – Aswad Nov 18 '21 at 05:56
Querying the dataframe will be translated automatically into querying the table - this is functionality called predicate pushdown. When you do .read, table isn’t read completely because Spark is lazy. So you can do filtering, etc. actual reading happens when you trigger “action”. I recommend to read first chapters of the Learnign Spark book - it’s freely available from Databricks site – Alex Ott Nov 18 '21 at 06:25
I have tried using a filter, but still, it's taking too much time for the data load. Data load that should be under a minute is taking around 30 mins. – Aswad Nov 19 '21 at 12:53
what `filtered.explain()` produces? – Alex Ott Nov 19 '21 at 13:19

Query on SAP table from Azure Databricks

1 Answers1