0

I have a table called "tax_info", this table stores the information of the land tax of my city, it goes like this:

taxpayer_code  |  condominium_num  |  lot_area  |  built_area
-------------------------------------------------------------
0010030078-2   |     00-0          |   143      |    130
0010030079-1   |     02-7          |   283      |    57
0010030080-1   |     02-7          |   283      |    48
0010030081-1   |     02-7          |   283      |    50

the taxpayer code first 3 numbers refer to the city district, the next 3 to the block within the district, and the next 4 can refer to the lot in a block if the condo number is 00-0, or to an apartment, or store, etc if the condo number is different than 00-0, in which case all equal condo numbers refer to the same lot within the block.

what I want to do is pass a list of "taxpayer_code" and get the "lot_area" and "built_area" for the lots. the problem is, if the person lives in a condo, her apartment is a fraction of the total built area for the lot. So, if I search for code 0010030078% (the -X number doesn't matter) the result is: Lot Area = 143 and Built Area = 130

But if I search for 0010030080%, the result I expect is: Lot Area = 283 and Built Area 155

And if I search for 0010030078%, 0010030079%, the result: Lot Area = 426 and Built Area 285

So the database should get the taxpayer codes, then look if the condominium number is different from 00-0 for each code passed, if so, it should add to the sum all the other taxpayer codes that share the same condo number within the same district and block. (ideally, if tax codes belonging to different districts or blocks are passed a warning should be returned, and if more tax codes are added to the sum, a listing with all codes added would be nice, but it's okay if that's too much of a hassle!).

I am new to SQL and can't wrap my head around this, I appreciate every help you can give me, thanks!

Calil
  • 57
  • 2
  • 7
  • "*the taxpayer code first 3 numbers refer to the city district, the next 3 to the block within the district*" - this is a really bad design. You shouldn't store multiple different things in a single atomic value –  Apr 02 '20 at 12:12
  • I agree with this, but that's how the city stores, or at least how they provide the data, you think it is best if I create a new table breaking the code in different columns? – Calil Apr 02 '20 at 12:23

1 Answers1

1

Hmmm . . . use a subquery and window functions to add up the values that you want:

select ti.*
from (select ti.*,
             (case when condominium_num <> '00-0'
                   then sum(built_area) over (partition by condominium_num)
                   else built_area
              end) as real_built_area
      from tax_info ti
     ) ti
where . . . 
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • @LaurenzAlbe . . . That would not be consistent with the data shown in the question. As for scanning the whole table, the OP is using wildcards for selecting rows. Although they can be optimized for the specific patterns in the question, that suggests that multiple taxpayers might be chosen at one time. – Gordon Linoff Apr 02 '20 at 12:31
  • sorry, i get the following error with this query: ERROR: missing FROM-clause entry for table "ti" LINE 2: from (select ti.*, – Calil Apr 02 '20 at 12:34
  • also, do you think it would be better to create two different tables, one with the pre made sum, and a code like 00100300-0 and 00100302-7 etc. and a second table that makes a relation of this new table with taxpayer codes passed, then the query just have to look for the relation and return the results? I could create this pre processing with python and load the two tables to postgre, would this make the performance better? – Calil Apr 02 '20 at 12:43
  • @HelderCalil . . . As Laurence points out, your question is not about *performance* but *functionality*. This query is probably the best you can do if you want information about *all* tax payers. However, if there is a small subset, be clear on how you define the subset and ask a *new* question. That said, normalizing the data so the condos are in another table seems like a good design. – Gordon Linoff Apr 02 '20 at 13:18