-1

I'm new to MySQL and I'm currently trying to learn to become more efficient in it. In this case, I have a database that has people's incomes and I want to create a new column income_level that says low, middle or high income based on the income. I have done this in four queries, but I'm sure that it could be done more efficiently? Could someone give me tips to this? Thank you!

Here are my queries so far:

ALTER TABLE Chile
ADD COLUMN `income_level` VARCHAR(50) NULL DEFAULT NULL AFTER `income`;

UPDATE Chile SET income_level = "Low income"
WHERE income < 10000;

UPDATE Chile SET income_level = "Middle income"
WHERE income > 10000;

UPDATE Chile SET income_level = "High income"
WHERE income > 100000;
GMB
  • 216,147
  • 25
  • 84
  • 135
coderfrombiz
  • 115
  • 8
  • You wouldn't normally store derived data – Strawberry Oct 31 '20 at 09:56
  • Just out of curiosity: Do you have one table per country as `Chile`suggests? – Filburt Oct 31 '20 at 10:06
  • @Filburt , Yes that's correct. But the datasets that I'm using are purely for practice and don't contain accurate and/or real data. – coderfrombiz Oct 31 '20 at 10:16
  • 1
    @coderfrombiz If you are aware of the implications, that's of course fine - however it might help to add this context (preparing unstructured data for analysis) to your question, As you can see answers and comments have a tendency to *"But you're doing it wrong!"* that is triggered by some seemingly bad design decisions for an RDBMS. – Filburt Oct 31 '20 at 10:27

3 Answers3

3

GMB's solution is 100% correct but I still wouldn't do it that way. You don't need the extra column income_level because you can dynamically select it. This way you will also avoid the need of updating the redundant column when the definition of income levels changes due to e.g. inflation or other causes.

So this year your query might be this:

SELECT CASE WHEN income > 100000 THEN 'High income'
            WHEN income >  10000 THEN 'Middle income'
            ELSE 'Low income' 
        END AS income_level
FROM chile

In five years from now it might look like this:

SELECT CASE WHEN income > 150000 THEN 'High income'
            WHEN income >  15000 THEN 'Middle income'
            ELSE 'Low income' 
        END AS income_level
FROM chile

UPDATE: if income may be null and you don't want to consider this to be 'Low income' you can do this:

SELECT CASE WHEN income IS NULL  THEN 'Unknown income level'
            WHEN income > 100000 THEN 'High income'
            WHEN income >  10000 THEN 'Middle income'
            ELSE 'Low income' 
        END AS income_level
FROM chile
rf1234
  • 1,510
  • 12
  • 13
2

Use a case expression:

update chile
set income_level = case
    when income > 100000 then 'High income'
    when income >  10000 then 'Middle income'
    else 'Low income'
end

If income may be null and you don't want to update the corresponding rows, then you can add a where clause to the query.

GMB
  • 216,147
  • 25
  • 84
  • 135
0

Actually, although GMB's answer is correct, I disagree rather strongly with rf1234. You don't want to implement this logic in SELECT queries, because the logic gets spread across too much code. You could use a view, but there is a better solution.

The best approach for calculating income_level in my opinion is to use a generated column:

alter table chile
    add income_level varchar(255) generated always as
        (case when income > 100000 then 'High income'
              when income >  10000 then 'Middle income'
              else 'Low income'
         end);

Why is this an improvement? The income_level and income columns are always in synch, because the income_level is calculated when it is retrieved. So, if you add new rows or update existing rows, then the value automatically changes.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786