3

I have a Dataframe userdf as

val userdf = sparkSession.read.json(sparkContext.parallelize(Array("""[{"id" : 1,"name" : "user1"},{"id" : 2,"name" : "user2"}]""")))

scala> userdf.show
+---+-----+
| id| name|
+---+-----+
|  1|user1|
|  2|user2|
+---+-----+

I want to retrieve user with id === 1 and same I can achieve using code like

scala> userdf.filter($"id"===1).show
+---+-----+
| id| name|
+---+-----+
|  1|user1|
+---+-----+

What I want to achieve is like

val filter1 = $"id"===1
userdf.filter(filter1).show

These filters are fetch from configuration files and I am trying to achieve a more complex query using this building block, something like

userdf.filter(filter1 OR filter2).filter(filter3).show 

where filter1, filter2, filter3, AND and OR condition are fetched from configurations

Thanks

RAGHHURAAMM
  • 1,099
  • 7
  • 15
user811602
  • 1,314
  • 2
  • 17
  • 47

1 Answers1

2

the filter method can also accept a string that it a sql expression.
this code should produce the same result

userdf.filter("id = 1").show

so you can just get that string from config

lev
  • 3,986
  • 4
  • 33
  • 46
  • this solution will not work with multiple "and" and "or" condition. ie, userdf.filter($"name"==="user1" || $"id" === 1) works fine, but userdf.filter("id=2" || "id=1") is not working. https://stackoverflow.com/questions/35881152/multiple-conditions-for-filter-in-spark-data-frames – user811602 Oct 15 '18 at 08:04
  • as long as it is a valid sql statement it should work: `userdf.filter("id=2 or id=1")` – lev Oct 15 '18 at 08:41
  • thanks. the statement "as long as it is a valid sql statement" is very helpful. – user811602 Oct 15 '18 at 09:02
  • @user811602 any luck on resolving this issue? – JeffLL May 17 '19 at 00:11
  • @JellfLL I have created valid sql string statement as commented by lev. – user811602 May 17 '19 at 06:01