0

Good afternoon, I would like to drop these observations from my waterfall, however only the first observation is shown in the output. I would like to drop all observations in SID and PID, which is why I chose the ridiculous ratio.

Please advise. Thank you!

Also, how do I allow for more room in the drop_condition observation column in the output box. Sometimes it cuts of my text. Thanks!

data temp;
set mydata.ames_housing_data;
format drop_condition $30.;

if (SID in (0:10000000)) then drop_condition = '01: SID';
else if (PID in (0:10000000)) then drop_condition = '02: PID';
else if (Neighborhood) then drop_condition = '03: Neighborhood';
else if (Zoning in ('A', 'C', 'FV', 'I')) then drop_condition = '04: Non-Residential Zoning';
else drop_condition = '05: Sample Population';

run;

proc freq data=temp;
tables drop_condition;
title 'Sample Waterfall';
run; quit;

Output

Log

   1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
    55         
    56         data temp;
   57         set mydata.ames_housing_data;
   NOTE: Data file MYDATA.AMES_HOUSING_DATA.DATA is in a format that is        native to another host, or the file encoding does not match 
   the session encoding. Cross Environment Data Access will be used, which        might require additional CPU resources and might 
   reduce performance.
    58         format drop condition $30.;
    59         
    60         if (SID in (0:10000000)) then drop_condition = '01: SID';
    61         else if (PID in (0:10000000)) then drop_condition = '02: PID';
    62         else if (Neighborhood) then drop_condition = '03: Neighborhood';
    63         else if (Zoning in ('A', 'C', 'FV', 'I')) then drop_condition = '04: Non Residential Zoning';
    64         else drop_condition = '05: Sample Population';
    65         
    66         run;

    NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
   62:10   
    NOTE: Variable drop is uninitialized.
    NOTE: Variable condition is uninitialized.
    NOTE: There were 2930 observations read from the data set MYDATA.AMES_HOUSING_DATA.
    NOTE: The data set WORK.TEMP has 2930 observations and 85 variables.
    NOTE: DATA statement used (Total process time):
   real time           0.05 seconds
   cpu time            0.06 seconds


    67         
    68         proc freq data=temp;
    69         tables drop_condition;
    70         title 'Sample Waterfall';
    71         run;

    NOTE: There were 2930 observations read from the data set WORK.TEMP.
    NOTE: PROCEDURE FREQ used (Total process time):
   real time           0.06 seconds
   cpu time            0.06 seconds

    71       !      quit;

    72         
    73         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
    85         
Starbucks
  • 1,448
  • 3
  • 21
  • 49
  • Post your log please, your output doesn't make sense given your data step provided. Also, please provide some sample data to test with. Regarding the length issue, see my comment from your last question, use `LENGTH` instead of `FORMAT`. – Reeza Jan 02 '16 at 02:16
  • Thank you, I will use length over format. I also provided my log. Thanks! – Starbucks Jan 02 '16 at 02:21
  • 1
    You have 3 notes to fix - the last 2 indicate your format line uses drop condition as two words, rather than drop_condition. The first note indicates that your third conditio doesn't make sense - if neighbourhood what? As is, it will evaluate to true if not missing. Your if/then conditions are also staggered, which means if the first one evaluates as true then the rest are not checked. In this case it means that all your observations meet your first condition. As mentioned previously, you can test this out with a specified proc freq. – Reeza Jan 02 '16 at 02:25
  • I made the correction for drop_condition. For neighborhood, I want to remove ALL values for this observation. – Starbucks Jan 02 '16 at 03:00
  • Can you explain the Neighbourhood a bit more? An example would help. – Reeza Jan 02 '16 at 03:03
  • Neighborhood is a nominal variable. Examples would be Blmngtn for Bloomington Heights, Blueste for Bluestem, BrDale for Briardale. Is there anything specific you would like to know? – Starbucks Jan 02 '16 at 03:08
  • Ok, added some info to the answer. This doesn't solve your initial problem however. – Reeza Jan 02 '16 at 03:45

1 Answers1

1

Since you haven't use a missing in your PROC FREQ it means that all your data falls into your first condition. IF conditions with IF/ELSE IF are evaluated in order and stops at the first true condition.

You can check the distribution of your variable with a proc freq or proc means.

proc means data=MYDATA.AMES_HOUSING_DATA min max;
var SID;
run;

EDIT: I think your third condition would work as is, but to avoid the Note in the log and for better code I would suggest using:

Else if not missing(neighbourhood) then...
Reeza
  • 20,510
  • 4
  • 21
  • 38
  • Thanks for the suggestion. Instead of typing every variable from my table, is there a way I can ask it to show the min and max for every variable? I'm still reviewing the comments you made about my log errors. – Starbucks Jan 02 '16 at 02:39
  • 1
    Remove the var statement and it does that by default. – Reeza Jan 02 '16 at 02:40