Detecting unit differences in data (SAS)

Question

I have two sets of financial data that tend to contain differences due to unit errors e.g. $10000 in one dataset may be $1000 in the other.

I'm trying to code a check for such differences, but the only way I can think of is to divide the two variables and see if the difference is in a table of 0.001, 0.01, 0.1, 10, 100 etc, but it would be hard to catch all of the differences.

Is there a smarter way to do this?

So you want to classify a difference between 10,000 and 1,000 as another type of change than when the difference is jsut a change in value? Like between 10,000 and 9,500? — Tom, Mar 02 '20 at 18:57
Is it possible to go back to your original source data and see if there's a way to read the units from the table somehow. Usually in the axis or variable label somewhere? — Reeza, Mar 02 '20 at 20:57

score 1 · Answer 1 · answered Mar 02 '20 at 18:08

Use proc compare. Be sure the two datasets are sorted in identical order, either by row or by specific groups. Use the by statement as needed. More info on options can be found in the documentation.

Example - compare a modified cars dataset with sashelp.cars:

data cars_modified;
    set sashelp.cars;

    if(mod(_N_, 2) = 0) then msrp = msrp - 100;
run;

proc compare base    = sashelp.cars 
             compare = cars_modified 
             out     = out_differences 
             outnoequal 
             outdif
             noprint;
    var msrp;
run;

Only the observations with differences are output in out_differences:

_TYPE_  _OBS_   MSRP
DIF     2      $-100
DIF     4      $-100
DIF     6      $-100
DIF     8      $-100
DIF     10     $-100
...

score 0 · Answer 2 · answered Mar 02 '20 at 19:21

So you appear to be asking to find cases where X/Y is a number that is exactly 1.00Exx where XX is an integer, other than 0.

data _null_;
do x=1,10,100,1000;
  do y=1,2,3,10.1,10 ;
   ratio = x/y;
   power = floor(log10(ratio));
   if power ne 0 and 1.00 = round(ratio/10**power,0.01) then 
     put 'Ratio of ' x 'over ' y 'is 10**' power '.' 
   ;    
  end;
end;
run;

Results:

Ratio of 1 over 10 is 10**-1 .
Ratio of 10 over 1 is 10**1 .
Ratio of 100 over 1 is 10**2 .
Ratio of 100 over 10 is 10**1 .
Ratio of 1000 over 1 is 10**3 .
Ratio of 1000 over 10 is 10**2 .

score 0 · Answer 3 · answered Mar 03 '20 at 09:29

For a numeric value X you can compute the nearest the rational expression, p/q.

If you calculate ratio

X = amount_for_source_A / amount_from_source_B;
status = math.rational(X,1e5,p,q);

the ratio will be a multiple of 10 if p=1 or q=1

Example:

proc ds2;
  package math / overwrite = yes;
    method rational(double x, double maxden, in_out integer p, in_out integer q) returns double;
      /*
      ** FROM: https://www.ics.uci.edu/~eppstein/numth/frap.c
      ** FROM: https://stackoverflow.com/questions/95727/how-to-convert-floats-to-human-readable-fractions
      **
      ** find rational approximation to given real number
      ** David Eppstein / UC Irvine / 8 Aug 1993
      **
      ** With corrections from Arno Formella, May 2008
      **
      ** Modified for Proc DS2, Richard DeVenezia, Jan 2020.
      **
      ** usage: rational(r,d,p,q)
      **   x is real number to approx
      **   maxden is the maximum denominator allowed
      **   p is return for numerator
      **   q is return for denominator
      **   returns 0 if no problems
      **
      ** based on the theory of continued fractions
      ** if x = a1 + 1/(a2 + 1/(a3 + 1/(a4 + ...)))
      ** then best approximation is found by truncating this series
      ** (with some adjustments in the last term).
      **
      ** Note the fraction can be recovered as the first column of the matrix
      **  ( a1 1 ) ( a2 1 ) ( a3 1 ) ...
      **  ( 1  0 ) ( 1  0 ) ( 1  0 )
      ** Instead of keeping the sequence of continued fraction terms,
      ** we just keep the last partial product of these matrices.
      */

      declare integer m[0:1,0:1];
      declare double startx e1 e2;
      declare integer ai t result p1 q1 p2 q2;

      startx = x;

      /* initialize matrix */
      m[0,0] = 1; m[1,1] = 1;
      m[0,1] = 0; m[1,0] = 0;

      /* loop finding terms until denom gets too big */
      do while (1);

        ai = x;

        if not ( m[1,0] * ai + m[1,1] < maxden ) then leave;

        t = m[0,0] * ai + m[0,1];
        m[0,1] = m[0,0];
        m[0,0] = t;

        t = m[1,0] * ai + m[1,1];
        m[1,1] = m[1,0];
        m[1,0] = t;

        if x = ai then leave;     %* AF: division by zero;

        x = 1 / (x - ai);

        if x > 2147483647 /*x'7FFFFFFF'*/ then leave;  %* AF: representation failure;
      end;

      /* now remaining x is between 0 and 1/ai */
      /* approx as either 0 or 1/m where m is max that will fit in maxden */
      /* first try zero */

      p1 = m[0,0];
      q1 = m[1,0];
      e1 = startx - 1.0 * p1 / q1;

      /* now try other possibility */

      ai = (maxden - m[1,1]) / m[1,0];

      m[0,0] = m[0,0] * ai + m[0,1];
      m[1,0] = m[1,0] * ai + m[1,1];

      p2 = m[0,0];
      q2 = m[1,0];
      e2 = startx - 1.0 * p2 / q2;

      if abs(e1) <= abs(e2) then do;
        p = p1;
        q = q1;
      end;
      else do;
        p = p2;
        q = q2;
      end;

      return 0;
    end;
  endpackage;
run;
quit;

* Example uage;

proc ds2;
  data _null_;
    declare package math math();
    declare double x;
    declare int p1 q1 p q;

    method run();
      streaminit(12345);

      x = 0;
      do _n_ = 1 to 20;
        p1 = ceil(rand('uniform',9));
        q1 = ceil(rand('uniform',9));

        x + 1. * p1 / q1;

        math.rational (x, 10000, p, q);

        put 'add' p1 '/' q1 '  ' x=best16. 'is' p '/' q;
      end;
    end;
  enddata;
run;
quit;

----- LOG -----
add 4 / 1    x=               4 is 4 / 1
add 4 / 2    x=               6 is 6 / 1
add 2 / 7    x=6.28571428571429 is 44 / 7
add 4 / 6    x=6.95238095238095 is 146 / 21
add 5 / 2    x=9.45238095238095 is 397 / 42
add 5 / 2    x= 11.952380952381 is 251 / 21
add 7 / 1    x= 18.952380952381 is 398 / 21
add 8 / 6    x=20.2857142857143 is 142 / 7
add 9 / 3    x=23.2857142857143 is 163 / 7
add 8 / 2    x=27.2857142857143 is 191 / 7
add 3 / 1    x=30.2857142857143 is 212 / 7
add 9 / 3    x=33.2857142857143 is 233 / 7
add 4 / 3    x=34.6190476190476 is 727 / 21
add 4 / 6    x=35.2857142857143 is 247 / 7
add 1 / 9    x=35.3968253968254 is 2230 / 63
add 8 / 3    x=38.0634920634921 is 2398 / 63
add 2 / 4    x=38.5634920634921 is 4859 / 126
add 5 / 1    x=43.5634920634921 is 5489 / 126
add 1 / 2    x=44.0634920634921 is 2776 / 63
add 2 / 7    x=44.3492063492064 is 2794 / 63

DS2 math package

Detecting unit differences in data (SAS)

3 Answers3