4

I got different vectors where I want to set constraints for different medians. Where some medians are calculated for different subsets of the vector.

Eg, I want a constraint for

    age = IntVector('age', 10)
    male = BoolVector('male', 10)
    salary = IntVector('salary', NUM)

where I want a salary median of 50 for all female aged greater 50 and an age median of 40 for all male with salary > 70

So I know how to filter out the relevant data.

    If(And(male[i] == False, age[i] > 50)

I know how to get the mean eg:

    Sum([If(And(male[i] == False, salary[i] > 50), age[i], 0) for i in range(10)]) / (10 - NUM_MALE) == 50

however for the median I kneed a sorted list so I can say something like:

    (age[4] + age[5])/2 = MEAN

However, I cannot model a constraint to ensure the ordered age AND ordered salary, since person_1 will not be the youngest AND have the least salary.

So I would need to have a temporal ordering for all my vectors by either age or salary.

alias
  • 28,120
  • 2
  • 23
  • 40
mojado
  • 370
  • 1
  • 12
  • What part of your data is symbolic? And what part is concrete? i.e., Do you already know which people are males/females? Or is the salary, age data concrete constants? It's not clear at all what you are trying to achieve here. If you want z3 to generate you data with those characteristics, I recommend instead fixing certain of these parameteters. Like you can start with a separate male/female vectors of fixed length. That is, 4 males, 6 females etc; instead of going fully symbolic. It'd be a lot easier to express these constraints if you fixed that number. – alias Nov 03 '20 at 18:26
  • Is it mandatory that persons are not in sorted order, agewise? Normally in this scenario, I would assume, just because I want, that the persons are ordered in the result. Then I can easily model my median constraint. – florian.isopp Nov 23 '20 at 10:10
  • got this interessting approach here https://cacm.acm.org/magazines/2019/3/234925-understanding-database-reconstruction-attacks-on-public-data/fulltext – florian.isopp Nov 23 '20 at 14:16

1 Answers1

0

I recently come up with a slightly different approach than in the paper, that is to maintain a list of indices:

in your example, let's say if we have a list of indices for female and salary > 50:

indices = [0, 5, 7, 9]

Then adding mean & median constraints are relatively straightforward:

# adding mean constraint
Sum([age[indices[idx]] for idx in indices]) / 4 == MEAN

# adding median constraint
(ages[indices[1]] + ages[indices[2]] == median * 2)

The next question is how do we come up with the indices above, well we can achieve that by adding some more constraints to the indices:

  1. indices must within the range (e.g, 0 - 9)
  2. indices must be distinct
  3. indices must be in order
  4. male[i] == False for i in indices
  5. salary[i] > 50 for i in indices

Once we have all the constraints in place, z3 will try to find the indices we need, and adding mean & median constraints become a easy task

MHL
  • 81
  • 6
  • But you use hardcoded indices [1] and [2] for the median. I don't know my indices list length, do you have a tip? How to get the middle index of the indices list with unknown list length? – trinity420 Oct 30 '22 at 17:35
  • Hi @trinity420, this approach does assume you know the length of the indices before hand, and if you think about it, median constraint on unknown indices length is really useless, because you always add/remove an item to the beginning/end of the list to fulfill the constraint. – MHL Nov 04 '22 at 07:31
  • sry, I did not describe it good, I meant that I know the length of my list but I don't have a ordered list of indices and don't know how to get the median with z3 when I have such a ordered list of known length... – trinity420 Nov 05 '22 at 11:06
  • 1
    Hi @trinity420, yes that's right, keep in mind that the list of indices is unknown and computed via z3, but you need to know the length of the ordered list of indices before hand, for example, in the above case, this approach only works when you know the number of `female` employees that have a `salary > 50` is *4* – MHL Nov 09 '22 at 03:03