Let's say I have a table with 10.000 lines (representing 10.000 persons) and the following columns:
id qualification gender age income
When I select all persons having a certain qualification (say "plumber") I get 100 lines, having a certain gender, age and income distribution.
What I now want to do is select some kind of test group to check if the income is influenced by qualification or by the distribution of the other attributes.
That means (and now I come to my question) I want to get another set of 100 lines, having the same gender and age distribution (but a different qualification value). These 100 lines should of course been chosen by random.
My primary problem is that I don't know how to write an SQL command that would take care of the distributions (which of course could and maybe should be seen as probabilities in this context) when I select random lines.
Thank you in advance!