0

When I use PROC SQL statement in SAS, sometimes I need to convert variable from character to numeric or vice versa. I normally use the following two queries:

INPUT(A.KEY_ID, 8.) = B.KEY_ID

OR

A.KEY_ID = PUT(B.KEY_ID, 8.)

My question is, if the length of the variable is either 7 or 8, what length should I put after the KEY_ID? I tried A.KEY_ID = PUT(B.KEY_ID, 7.) and it lost all the records with KEY_ID length=8. And when I used A.KEY_ID = PUT(B.KEY_ID, 8.), it will not be able to find the corresponding KEY_ID with length=7.

Thank you very much!

Added:

Below is the query that I am currently using. LibnameA is a local library and tableA is a local SAS table. DatabaseB is a database that I am connected with. Key_id column is in both tables and used as the key to link them. For example, key_ids are 1234567, 12345678, and ect.

When I used the following query, which I put length 7 in the end of the put statement, all the records with 8 digits key_ids will not able to find a match.

PROC SQL;
CREATE TABLE LIBNAMEA.WORKTABLE AS
SELECT
A.*,
B.VAR1,
B.VAR2
FROM LIBNAMEA.TABLEA A
LEFT JOIN DATABASEb.TABLEB B
ON A.KEY_ID = PUT(B.KEY_ID,8.)
;
QUIT;

Update Results:

If I use the following query, it returns 192758 rows

ON A.KEY_ID = PUT(B.KEY_ID,7.)

If I use the following query, it returns 192923 rows

ON A.KEY_ID = PUT(B.KEY_ID,8.)

If I use the following query, it returns 192757 rows

ON INPUT(A.KEY_ID,8.) = B.KEY_ID

If I use the following query, it returns 192757 rows

ON A.KEY_ID = COMPRESS(PUT(B.KEY_ID,8.))

If I use the following query, it returns 192757 rows

ON COMPRESS(A.KEY_ID) = COMPRESS(PUT(B.KEY_ID,8.))

If I use the following query, it returns 192757 rows

ON INPUT(CATS(A.KEY_ID),8.) = INPUT(CATS(B.KEY_ID),8.)

If I use the following query, it returns 192757 rows

ON A.KEY_ID = PUT(B.KEY_ID,8.-L)

As you can see, only when I use ON A.KEY_ID = PUT(B.KEY_ID,8.), it would return 192923 rows, which is all rows from the table A. However, it will not find the corresponding key_ids in table B, and returns null value in the end results.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
mumu.W
  • 53
  • 5
  • 11
  • Most likely A.KEY_ID does not have the 7 digit strings stored with leading spaces, but the PUT() function with 8. format will generate leading spaces for a number less than 10,000,000. – Tom Jan 11 '17 at 21:25
  • @Tom - That's my thinking, too, but OP says it doesn't work when he strips the blanks off in the join criteria. – vknowles Jan 11 '17 at 21:30

2 Answers2

0

If I understand your need, the only problem was comparing the string version of the keys--the numeric one with INPUT worked when I tried it.

So for string keys, this is what worked for me:

a.key_id = trim(left(put(b.key_id,8.)))

You can simplify to:

a.key_id = compress(put(b.key_id,8.))

The issue seems to be where there is a blank in the string key value. You might need to strip that out on both sides.

vknowles
  • 754
  • 4
  • 19
  • Hi, thanks for your comment. I tried the one you provided, and it returned the same results as if I ran a.key_id = put(a.key_id,7.). – mumu.W Jan 11 '17 at 20:32
  • Sorry if I didn't describe my question well. I will try to rephrase it. The key_id that I am matching on would be either 7 digits or 8 digits. In the first, I used the query a.key_id = put(a.key_id,7.), but all the records with key_id equals to 8 digits will not be able to find a match. Then I used the query a.key_id = put(a.key_id,8.), but all the ones with 7 digits cannot be matched. I am not sure why and I am looking for a solution to solve this. – mumu.W Jan 11 '17 at 20:35
  • What if you use this: 'compress(a.key_id) = compress(put(b.key_id,8.))' – vknowles Jan 11 '17 at 20:36
  • It is not due to any blank in the variable. It's the variable itself have both 7 digits and 8 digits value. For example, the key_id can be xxxxxxx and xxxxxxxx. – mumu.W Jan 11 '17 at 20:40
  • If you are joining SAS datasets, and one of them has a string variable as your join key, and the key may be 7 or 8 digits, then there will be a leading or trailing blank when you try to join them. I don't know if that's true if you're using `PROC SQL` to join non-SAS data. But in the SAS dataset case, if you remove the blank on both sides (my previous comment), it should work. If not, please explain your data more thoroughly and maybe post some sample data. Thanks! – vknowles Jan 11 '17 at 20:58
  • Thank you for your patience. I tried putting compress on both sides, and it doesn't seem to work. I will add more details above. Thank you! – mumu.W Jan 11 '17 at 21:09
  • `trim(left(put()))` is antiquated and should no longer be used unless you're running a very old version of SAS. Instead please use the appropriate `cat()` function... ie. `cats()`, `catt()`, etc... – Robert Penridge Jan 11 '17 at 21:10
  • @RobertPenridge - Thanks for the update. I'm coming off an extended absence from SAS, so I did not know about the `CATx` series of functions. Though I suspect the antiquated approach still works just as well.... – vknowles Jan 11 '17 at 21:37
0

If you are talking about integers then you are better off converting the values to numbers and comparing them. You can read the 7 character strings and the 8 character strings using the same informat.

 input(a.char_key_id,8.) = b.num_key_id

As to why you are having troubles it probably is because of either leading spaces and/or leading zeros in the character versions. SAS comparisons ignore trailing spaces so you don't need to worry about those.

Leading Zeros

This is the bigger problem. It is more likely to see character variables with leading zeros than leading spaces since most input methods will remove the leading spaces. But the presence of leading zeros can make the same integer value have multiple character representations. So you could represent 123 as '123','0123','00123', etc. This will cause troubles beyond just being able to merge.

Leading Spaces

This will more likely be an issue when you try to convert your integers to strings. The PUT() function normally right aligns the value (hence it generates leading space) whereas most input methods will end up with the value left aligned (trailing spaces). So if you convert the integer 12345 to a string using put(12345,8.) you will result in three leading spaces ' 12345' and it will not match the value you have in your character variable which would have trailing spaces '12345 '. You could add alignment command to the format. And again since SAS ignores the trailing spaces you can just use the longer format.

put(b.num_key_id,F8.-L) = a.char_key_id

Now if you don't know whether your variables are numeric or character and you want code that could work for either than you could use something like this to convert to character and back to numbers. But watch out if your integers are larger than can be represented in 12 digits because SAS will use BEST12. format to convert the numbers.

input(cats(a.key_id),8.) = input(cats(b.key_id),8.)
Tom
  • 47,574
  • 2
  • 16
  • 29
  • Thank you for sharing all those information with us. I tried the ones that you suggested, it doesn't give what I expected. I will share the detailed results with you, and maybe I am looking at the wrong direction. Thank you! – mumu.W Jan 11 '17 at 22:29
  • Show some example values that should match that do not. It is also possible that your character strings have other "invisible" characters besides spaces in them that are causes the values to look the same but be different. – Tom Jan 12 '17 at 03:49
  • I think I know why I couldn't get the correct information returned when I used ON A.KEY_ID = PUT(B.KEY_ID,8.). It's so embarrassing. I believe it's because I have a constrain in the end of the query WHERE B.EXP_DT IS NULL. I tried splitting the query into two parts. First I created another version of table B where exp_dt is null, and then I left joined A and B. It seems worked. I am so sorry for any confusion it caused because I left out the information in the beginning (I thought it was irrelevant and I want to only show the essential part of the query that I was running). – mumu.W Jan 12 '17 at 20:51
  • Your answer is the most complete one and I think it could have solved the 'issue' that I thought I had. Therefore I selected it as the best answer. Thank you again for anyone's time and generous help. Thank you!! – mumu.W Jan 12 '17 at 20:52