3

First time poster hoping to get some assistance.

Very minimal coding experience so jargon may be confusing.

I am attempting to use SQL in Zoho to clean up data.

Data consists of A) Transactional Data (premium, fees, net earnings) per policy B) Claims Data (incurred amounts)

Issue is with the unique identifier - same client may have multiple policy numbers with either A) or B) not stored together. What I have been using is the systems own client code (different to policy number) which stores all the policy numbers under the same client. Second issue is Claims data is not mapped out to this 'client code'. Excel index/match/ vlookup have been my go tos in the meantime and has worked fine, however we are moving to Zoho which functions via SQL.

e.g.

| Client Code | Policy Number  | Premium  | Claims |
| --------    | -------------- | -------- | ------ |
| C1          | 123            | 500      | 300    |
| C2          | 456            | 100      |        |
| C1          | 767            | 0        |        | <---
|             | 767            |          | 800    | <--- want these columns put all under C1

Question: How can I Fill in the bottom left blank as C1 using SQL, and then Group each of the clients (C1 & C2) with a Total Premium & Claims amount for them?

GOAL:

| Client Code |  Premium  | Claims |
| --------    | --------- | ------ | 
| C1          |  500      | 1100   |
| C2          |  100      |        |

I've thought of using a Self Join -

SELECT 
t1."Client Code", 
t1."Policy Number", 
t1."Premium", 
t1."Claims", 
t2."Client Code"
FROM table1 as t1
FULL OUTER JOIN 
   (SELECT 
   "Policy Number", 
   "Client Code" 
   FROM table1) t2 
ON t1."Policy Number" = t2."Policy Number"

which clearly does not work, not to mention when I try to include sums by premium, I start receiving group by clause error messages.

Any help would be appreciated.

result:

t1.Client Code t1.Policy Number t1.Premium t1.Claims t2.Client Code
C1 123 500 300 C1
C2 456 100 C2
C1 767 0 C1
C1 767 0
767 800 C1
767 800

Other factors to consider which I've excluded: year of policy, further lines of transaction data due to monthly/ annual payments etc.

  • Your text table and SQL is hard to read so please use mono formatting for the SQL. And fix up the table maybe it is markdown error. – Michael Currin Mar 12 '21 at 06:57
  • Where does the 300 go? That is, why are the claims 800 instead of 1100? – Gordon Linoff Mar 12 '21 at 13:33
  • Hi Gordon, apologies you are correct total is meant to be 1100. @Strawberry I will attempt this shortly - had only started looking into SQL recently due to Zoho which does not permit create table functions (only select). – insurancenoob Mar 14 '21 at 11:58

1 Answers1

1

If you have a unique identifier, that tells me you want an INNER join. Have a look for venn diagrams of inner vs outer join. An inner join will only you the intersection so no blanks. A left or right outer join will show values in one and possible blanks in the other. You probably don't need full other join. I've rarely come across that.

Regarding aggregating or group. You need GROUP BY and SUM. By grouping by the client code, that field will become unique.

In this case you only have one table of all data so don't need see the need for a join of table1 on itself. Oh I see an issue where multiple claims against the same policy are going to make the policy premium cost appear multiple times and then it would be added up incorrectly.

I am going to leave policy number out of this because your ideal table does use it and want totals across policies not by policy.

Putting that all together.

First, leaving out premium. Don't do too much at once. Build things up.

SELECT 
  "Client Code", 
  SUM("Claims") AS `Total Claims`
FROM table1
GROUP BY "Client Code"
Client code    Total claims
C1                   800
C2                   0

Now just tackling premium. I'm assuming a premium will be fixed for policy number (maybe not?) But that multiple policies for the same client might happen to have the same premium. I don't know how you add up premiums over time...? You'll have to figure that out based on business logic.

SELECT DISTINCT
  "Client Code", 
  "Policy Number",
  "Premium"
FROM table1

The result will be unique rows. Client ID will be repeated but policy number should unique if keep premium value constant.

Then you add aggregation to get the total premiums for a client across policies. But we'll leave that to the end below.

Then you join the two tables together to handle claims and premiums.

SELECT 
    table1."Client Code", 
    SUM(table1.Claims) AS `Total Claims`,
    SUM(Policy.Premium) AS `Total Premiums`
FROM table1
INNER JOIN (
    SELECT DISTINCT
        "Client Code", 
        "Policy Number",
        Premium
     FROM table1
) AS Policy ON Policy."Client Code" = table1."Client Code"
GROUP BY "Client Code"

If you want to look at the underlying data to see joins make sense, then you could remove SUM and SUM and take out the GROUP BY line.


Also your tables and fields are awkwardly named. table1 and t1 and t2 are vague. All your data comes from table1 which is bad for modeling. You rather want a table of clients and their codes and the other tables reference client by row ID like "1" or "789".

And you need field quotes all over the place.

Better structure would be like this. Maybe a table for claims (event based) and policies (contract based).

client.code
policy.client_id

policy.policy_number
policy.premium_value

claim.value
claim.policy_id

Maybe Zoho doesn't let you remodel like this. Hope you can do that.

Michael Currin
  • 615
  • 5
  • 14
  • Thanks for this Mike. Only concern I have is if I were to sum first, I would have total amounts for both prem and claim by policy number, but then certain clients may have different policy numbers for their premium and claim data (where the claim data entry has client code missing for some reason). I am currently looking for a method to tie in these different policy numbers under one client code. (Note it is only claims data that has client codes missing randomly). Hope this makes sense. – insurancenoob Mar 14 '21 at 11:49
  • 1
    Okay. If bad data is your issue, then I would suggest you go through your data and update your missing client codes. You can use a known client code for a policy number and fill it in for the same policy number where client code is missing. Also the benefit of modeling your table as multiple tables is that each claim has a policy number which has a client which has a client code. So to rename a client code you change one record only. And you can use referential integrity to give you a clear error if you make a policy without a client. – Michael Currin Mar 15 '21 at 20:02
  • 1
    There was a bit missing which I added now. `AS Policy ON Policy."Client Code" = table1."Client Code"`. – Michael Currin Mar 15 '21 at 20:03
  • 1
    I'd say don't try and make your query work with missing client code because the query becomes more complex and error prone and you have to accommodate that every time you make query compared to fixing missing data. – Michael Currin Mar 15 '21 at 20:05