0

I have two tables, one from teradata and one from odbc, which I want to join into a new table using a key value along with a filtering in Python using Visual Studio Code. The two tables can be explained as follows:

teradata The table from teradata is data collected from a survey where each row is a question in the survey, here the columns are custID, Question, Answer, StartDate and EndDate.

odbc The table from odbc is a table with data about the customer. Here are the columns CustomerID, Name, Age, Gender, Date etc. Each row in this table corresponds to a customer, but a customer can have several rows as it is updated with a new row when a change takes place, for example a change of address, change of name or age. The Date column is the date on which the information was updated.

Hence, CustID and CustomerID in the two respective tables are the same, ie the client number, but they are named differently. This is the key value that I want to use to join these two tables. So what I want to do is the following: Using the key values ​​in the table from teradata, I want to match these with the correct customer in the list from odbc, but since there are several rows for each customer in the table with data about the customers, I only want to use the row where the Date in odbc is between the StartDate and EndDate in teradata, i.e. teradata[StartDate]<odbc[Date]<teradata[EndDate]. Then I want to merge these into a new table so that I have the customer information as well as the survey responses in the same table. This is so that I can later create a report where I can, among other things, sort the answers based on age and/or gender.

Does anyone want to help me with this? Unfortunately, I can't share the real data as it is job-related, but roughly the layout is as follows:

odbc:

CustomerID Name Age Gender Date

225 Adam 25 M June 4 2021

225 Adam 26 M Sept 6 2022

223 Bill 33 F June 10 2020

221 Carl 46 M April 14 2021

teradata:

custID Question Answer StartDate EndDate

225 "A question" "The answer" July 4 2022 July 5 2022

etc

So, for example for customer 225 we want to use the information given in the row with date June 4 2021 since he was 25 at the time he answered the survey. Note: The dates will never overlap each other since they usually finish the survey within a couple of minutes.

rioV8
  • 24,506
  • 3
  • 32
  • 49
idlatva
  • 29
  • 7

1 Answers1

1

This looks like a join on inequality conditions. Assuming that any given survey start/end date will never overlap two different customer rows:

select t.*, o.name, o.age, o.gender, o.date
from teradata t 
inner join odbc o 
    on  o.customerId = t.custId
    and o.date >= t.start_date
    and o.date <  t.end_date
GMB
  • 216,147
  • 25
  • 84
  • 135
  • Thank you! But how do I do if the two tables are on different databases? I have imported one via `pyodbc.connect` and then the other one via `teradatasql.connect` to Python – idlatva Jun 13 '23 at 19:14
  • 1
    @idlatva: ahh, I did not get that from your question. Obviously if your data is spread like that, a single query cannot execute on both databases at once (unless you can set up some kind o db link between the two databases, but that's another topic...) You might need to implement the join logic in your application code instead. – GMB Jun 14 '23 at 21:52