1

Can I join two tables using custom condition rather than equal? Eg, I have two tables in BigQuery. Table A has 3 columns start_range, end_range and grade. Table B has data coming from Storage using cloud functions and has a particular column "marks". I am creating a flow in DataPrep such that join these tables which should satisfy the condition:

SELECT data.grade FROM  data  INNER JOIN student_data ON student_data.marks BETWEEN data.start_range AND data.end_range 

and then paste the data to a new table. But I could not find any provision for custom condition. Only two columns can be matched for join. Any idea?

hamedazhar
  • 990
  • 10
  • 26

1 Answers1

1

I think that your best chance would be to implement your solution in BigQuery:

SELECT
  grade, marks
FROM
  `project.dataset.student_data` a
JOIN
  `project.dataset.data` b
ON
  a.marks > b.start_range AND a.marks < b.end_range

If you insist on Dataprep, I don't believe there's any way to join on a range. If I understood your case correctly, what you have is something like this:

enter image description here enter image description here

...and you want to get As and Bs instead of the integer marks in student_data. It's probably not the most elegant solution, but you can use a new formula with a case, something like this:

  CASE([(marks >= 0) && (marks < 50),
    'D',
    (marks >= 50) && (marks < 100),
    'C',
    (marks >= 100) && (marks < 150),
    'B',
    (marks >= 150) && (marks <= 200),
    'A'])

You'd just need to click "New step" on the recipe, chose "New formula" and then add it under "Formula". It should look like so: enter image description here

Then you can even use "Lookup" to join any other useful data you might have in the data table using "grade" as the lookup key.

Lefteris S
  • 1,614
  • 1
  • 7
  • 14
  • But grade and marks are different, right? For example `grade` is in string like "A", "B+", etc whereas `marks` in integer like 100,85,72 etc. So how can be both considered as join keys? – hamedazhar Nov 06 '18 at 14:32
  • Ok, I think I see how I missed your point, I'm going to edit my answer now. – Lefteris S Nov 06 '18 at 16:50
  • Sorry, but this looks like a switch case right? But if there are number of ranges like more than 20-25, you are saying that all conditions should be included in the CASE statements? – hamedazhar Nov 07 '18 at 04:18
  • Let us assume there are more than 100 ranges! So what will you do? All "CASE"s cannot be written right? – hamedazhar Nov 07 '18 at 04:19
  • @DeviOS I added an alternative solution in BigQuery. – Lefteris S Nov 07 '18 at 10:15