1

My problem is as follows:

I have a dataset with 3 types of variables (say, A, B and C) for every subject. For each variable specifically, I can pool patients into having, say, high, mid and low values of the variable.

Now I want to pool subjects into classes of either having low A, B or C, or mid, or high levels (e.g. combining patients into a single low/mid/high group for all 3 variables simultaneously).

This, however, does not work properly, when I use the following code:

IF ((A <= 10) OR (B <= 15) OR (C <= 20)) pool = 1. /* low levels
IF ((A > 10 AND A <= 100) OR (B > 15 AND B <= 150) OR (C > 20 AND C <= 200)) pool = 2. /* mid levels
IF ((A > 100) OR (B > 150) OR (C > 200)) pool = 3. /* high levels
VARIABLE LABELS pool "pooled subjects (A/B/C)".
EXECUTE.

When I now run a frequency table, subjects with either low levels of A, B or C are not combined into one group. It seems the case that only patients with, specifically, low levels of all three variables are pooled. There seems to be some selection of subjects into other groups e.g. the result is one very large group of subjects (for pool = 2) and 2 very small subject groups.

What I expected (and what I want) is to have all patients into each respective group who have the aforementioned characteristics (either low levels of variable A or B or C).

Does anybody know how to solve this problem, or can see what I am doing wrong?

Thanks in advance,

A.

eli-k
  • 10,898
  • 11
  • 40
  • 44
  • the code is executed sequentially, so if a subject is assigned to pool=1 and then meets the condition for pool=2, he will be re-assigned. Your groups are not mutually exclusive; for example A=5, B=150, C=150 qualifies to pools 1 and 3. but your syntax will assign it to pool 1, then overwrite that and assign to pool 3 – horace_vr Aug 17 '17 at 10:02

1 Answers1

1

The logic of your present syntax enables one subject to belong to more than one pool - even all three. For example, if a subject has low level in A, mid level in B and high level in C - all three conditions are true. The order of your conditions dictates that if a subject belongs to more than one pool, the higher pool is the one kept for that subject. The results you are describing mean that only subjects with three low values get to keep their pool=1 definition.

Since the pools aren't mutually exclusive (assuming you don't want to change the definition), you should define them separately:

compute pool1 = ((A <= 10) OR (B <= 15) OR (C <= 20)). /* low.
compute pool2 = (range(a,11,100) or range(b,16,150) or range(c,21,200)). /* mid.
compute pool3 = ((A > 100) OR (B > 150) OR (C > 200)). /* high .
eli-k
  • 10,898
  • 11
  • 40
  • 44
  • Thanks for your help – alexanderjansma Aug 17 '17 at 11:04
  • The outcome I now get using your separately defined variables is the same as when I compute: `code IF ((A <= 10) OR (B <= 15) OR (C <= 20)) pool = 1. /* low levels IF ((A > 10 AND A <= 100) OR (B > 15 AND B <= 150) OR (C > 20 AND C <= 200)) pool = 2. /* mid levels IF ((A > 100) AND (B > 150) AND (C > 200)) pool = 3. /* high levels code` which also makes it mutually exclusive I guess? – alexanderjansma Aug 17 '17 at 11:09
  • you can not get the same outcome since I'm defining three separate variables (pool1, pool2, pool3) and you are only defining one variable (pool=1,2,3). Since **the conditions** are not mutually exclusive, each subject could belong to all or either of the pools - when you calculate three separate variables you'll be able to see for each subject a 0/1 indication of belonging to pool1, and the same for the other two. – eli-k Aug 17 '17 at 11:48