1

I am writing a SAS query to QA some data views. Part of the QA is determining what percentage of the values are populated. Unfortunately SAS is counting empty character cells as populated rather than NULL or having no data. For example, an ID field has some blank cells and I run a COUNT() function, I get the same result as though I ran a COUNT(*) If I run a CASE WHEN statement to exclude "" values I get the correct results, but needing to do that for every single text field in the SAS query seems like overkill and that I'm not aware of some function or some way to preprocess the data so that the COUNT function won't count empty cells.

Some example data that provide the idea is:

data QA_Test;
Input Name $ ID_Number;
Robert 1AY
Shirley ""
Tammy XB3
Reeza
  • 20,510
  • 4
  • 21
  • 38
D_man
  • 113
  • 3
  • 9
  • Do you get the same results when using PROC FREQ? – Reeza Dec 17 '21 at 18:40
  • 1
    Does not really seem possible, so need to see more concrete example of the query you are using. You could try nesting a TRIMN() function. COUNT(TRIMN(string)). Or use MISSING() function. SUM(MISSING(string)). Most likely your string is not empty, instead it just has invisible characters, like TAB or 'A0'x non-breaking space. – Tom Dec 17 '21 at 19:44
  • If the variable has two double quote characters in it then it is by definition not empty. Are you really reading the data from a text file like in your example? If you want to read values like your example use `dsd dlm=' '` on the INFILE statement. – Tom Dec 17 '21 at 23:44

3 Answers3

1

Use the DSD option when reading values that have quotes around them from a text file.

data QA_Test;
  infile cards dsd dlm=' ' truncover;
  input Name $ ID_Number $;
cards;
Robert 1AY
Shirley ""
Tammy XB3
;

Now ID_NUMBER will not contain the quotes.

Or use a period to represent the missing values in your text file.

data QA_Test;
  input Name $ ID_Number $;
cards;
Robert 1AY
Shirley .
Tammy XB3
;

If you already have those '""' strings in your data and you don't want to count them then use a different method of counting.

sum(not (id_number in (' ','""')))
Tom
  • 47,574
  • 2
  • 16
  • 29
0

You need to provide a reproducible example. Please follow the instructions here or use the hex example I previously showed.

So somewhat fixing the non-working code you posted, I did this:

data QA_Test;
Input Name $ ID_Number $;
cards;
Robert 1AY
Shirley ""
Tammy XB3
;;;;
run;

proc sql;
select count(*) as total_count, count(Id_number) as n_id
from QA_TEST;
quit;

Results:

total_count n_id
3   3

But this creates a data set with actual quotes in the cell, I'm assuming that isn't the case in your actual data? So if I read it in as missing:

data QA_Test;
infile cards truncover;
Input Name $ ID_Number $;
cards;
Robert 1AY
Shirley 
Tammy XB3
;;;;
run;

proc sql;
select count(*) as total_count, count(Id_number) as n_id
from QA_TEST;
quit;

Results in:

total_count n_id
3   2

So I think SAS is right, your data quality tests are correct and your data has data quality issues that need to be resolved - specifically in this case, fields that has likely tab or invisible characters in the data.

You can test this with the following and post your output here or on communities.sas.com.

proc freq data=qa_test;
table id_number / out=check missing;
format Id_number $hex.;
run;
Reeza
  • 20,510
  • 4
  • 21
  • 38
  • Edit your question with the example please. And make sure to use a code block to ensure that the HTML doesn't change your data. You can get an exact replica of your data using this method here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-data-AKA-generate/ta-p/258712 – Reeza Dec 17 '21 at 21:57
0

You can use compress also to remove them, inside the count, if it's correct to leave them in but you don't want them to count:

proc sql;
  select count(compress(id,'"'))
...
;
quit;
Joe
  • 62,789
  • 6
  • 49
  • 67