How to use variables in SQL queries?

Question

Since in SQL Server ,we can declare variables like declare @sparksql='<any query/value/string>' but in spark sql what alternative can be used . So that we don't need to hard code any values/query/strings.

Alex Ott · Answer 1 · 2021-12-24T08:43:47.350

There is support for the variables substitution in the Spark, at least from version of the 2.1.x. It's controlled by the configuration option spark.sql.variable.substitute - in 3.0.x it's set to true by default (you can check it by executing SET spark.sql.variable.substitute).

With that option set to true, you can set variable to specific value with SET myVar=123, and then use it using the ${varName} syntax, like: select ${myVar}...

On Databricks, parser also recognizes that syntax, and creates a field to populate value, although it would be easier to use widgets from SQL as described in documentation

P.S. According to the code, besides variables themselves, it also supports getting the data from environment variables & from the Java system properties, like this:

select '${env:PATH}';
select '${system:java.home}';

P.S. This answer is about using variables defined in Spark SQL itself. If you're looking about using variables defined in Python/Scala in Spark SQL, then please refer to this answer.

score 1 · Answer 2 · answered Nov 26 '20 at 22:37

1

If you are using a Databricks Notebook then one easy way is to use Scala or Python to declare the variable and execute the SQL statement.

Here's a simple example in Scala:

val x = 1

val df = spark.sql(s"select * from t where col1 = $x")

df.show()

answered Nov 26 '20 at 22:37

wBob

13,710
3
20
37

Yup thanks for it,I am using databricks but mine one is purely based on spark sql(only sql queries are been used),so any alternative with that would be great.i have tried with widget option but there moreover i found manual intervention. – Shrince Nov 27 '20 at 10:14
It has nothing to do with Spark SQL but is a Scala feature called [string interpolation](https://docs.scala-lang.org/overviews/core/string-interpolation.html). Downvoting... – Jacek Laskowski Jul 25 '22 at 10:51
Thanks @JacekLaskowski! Similar outcomes though, to what the OP wants? Which is why I posted it. – wBob Jul 25 '22 at 11:02

score 1 · Answer 3 · edited Dec 24 '21 at 03:24

1

The following widget simple solution works well within Databricks Spark SQL. Cluster runs on Spark 3.0.1 | Scala 2.12. Once you establish a widget, the databricks cluster will list them at the top and display their values. This comes in handy when you establish multiple.

CREATE WIDGET TEXT tableName DEFAULT 'db.table'

SELECT * from  $tableName

edited Dec 24 '21 at 03:24

Sergey Bushmanov

23,310
7
53
72

answered Jul 28 '21 at 17:29

alex oro

11
1

Here is the documentation https://docs.databricks.com/notebooks/widgets.html#widgets-in-sql – Climbs_lika_Spyder Feb 28 '22 at 20:26

score 0 · Answer 4 · answered Sep 27 '22 at 14:27

0

An example, on a databricks SQL notebook.

Use two commands:

Cmd 1
CREATE WIDGET TEXT myVariable DEFAULT "1234";

Cmd 2
SELECT 1234 value, 1234 = ${myVariable} comparison
UNION ALL
SELECT 4567, 4567 = ${myVariable}

answered Sep 27 '22 at 14:27

Chris Amelinckx

4,334
2
23
21

score -2 · Answer 5 · answered Nov 26 '20 at 19:10

The short answer is no, Spark SQL does not support variables currently.

The SQL Server uses T-SQL, which is based on SQL standard extended with procedure programming, local variables and other features.

Spark SQL is a pure SQL, partially compatible with SQL standard. Since Spark 3.0, Spark SQL introduces two experimental options to comply with the SQL standard, but no variables support introduced there.

https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html

How to use variables in SQL queries?

5 Answers5