4

I have a variable named A in SPSS database.

A
--
102102
23453212
142378
2367890654
2345
45

I want to split this variable by 2 lengths and create multiple variables as follows.

A_1   A_2   A_3   A_4   A_5
---   ---   ---   ---   ---
10    21    02
23    45    32    12
14    23    78
23    67    89    06    54
23    45
45

Can anyone write SPSS macro to compute this operation?

1 Answers1

4

Using STRING manipulations (after converting the NUMERIC field to STRING, if necessary), specifically SUBSTR you can extract out pairs of digits as you wish.

/* Simulate data */.
data list list / x (f8.0).
begin data.
102102
23453212
142378
2367890654
2345
45
end data.
dataset name dsSim.

If you have a known maximum value, in your example a value of 10 digits long then you'll need 5 variables to store the pairs of digits, which the follow does:

preserve.
set mxwarns 0 /* temporarily supress warning messages */ .
string #xstr (a10).
compute #xstr=ltrim(string(x,f18.0)).
compute A_1=number(substr(#xstr,1,2), f8.0).
compute A_2=number(substr(#xstr,3,2), f8.0).
compute A_3=number(substr(#xstr,5,2), f8.0).
compute A_4=number(substr(#xstr,7,2), f8.0).
compute A_5=number(substr(#xstr,9,2), f8.0).
exe.
restore.

However, you may prefer to code something like this more dynamically (using python) where the code itself would read the maximum value in the data and create as many variables as needed.

begin program.
import spssdata, math
spss.Submit("set mprint on.")

# get maximum value 
spss.Submit("""
dataset declare dsAgg.
aggregate outfile=dsAgg /MaxX=max(x).
dataset activate dsAgg.
""")

maxvalue = spssdata.Spssdata().fetchone()[0]
ndigits=math.floor(math.log(maxvalue,10))+1

cmd="""
dataset close dsAgg.
dataset activate dsSim.
preserve.
set mxwarns 0.
string #xstr (a10).
compute #xstr=ltrim(string(x,f18.0)).
"""

for i in range(1,int(math.ceil(ndigits/2))+1):
    j=(i-1)*2+1
    cmd+="\ncompute B_%(i)s=number(substr(#xstr,%(j)s,2), f8.0)." % locals()
cmd+="\nexe.\nrestore."

spss.Submit(cmd)

spss.Submit("set mprint off.")
end program.

You would need to weigh up the pros on cons of each method to asses which suits you best, for how you anticipate your data to arrive and how you then go onto work with in later. I haven't attempted to wrap either of these up in a macro but that could just as easily be done.

Jignesh Sutar
  • 2,909
  • 10
  • 13
  • 2
    While it doesn't matter in this particular case, the substr function is deprecated. Use char.substr instead. Substr works on bytes while char.substr works on characters. – JKP Mar 02 '15 at 02:52