7

I've a dataset called 'input' with the following observations

ID Salary
10 1000
20 2000
30 3000
40 4000

I need an output dataset with following observations

ID Salary Next_row_Salary
10 1000 2000
20 2000 3000
30 3000 4000
40 4000 null

Note: The scenario is next obersavtion's salary should be the current observation's value for the column Next_Row_salary. If there is no next observation then the current observation's value for the column Next_Row_salary should be 'null'.

Kindly help me out in creating a sas code for this scenario.

athresh
  • 553
  • 6
  • 11
  • 24
  • 2
    You should go back to your old questions and accept the answers that were most helpful. This gives people an incentive to answer your new questions. – itzy Jul 20 '12 at 13:11
  • I'm new to this community. May I know what should I do to accept the answers? – athresh Jul 20 '12 at 13:16
  • 1
    Sure, just go back to your earlier questions and click the the check mark to the left of the answer that you found most helpful. This gives reputation points to the person who answered it. Welcome! – itzy Jul 20 '12 at 13:18

3 Answers3

19

There are a few ways to achieve this, here's how I would do it.

data have;
   input ID Salary;
   cards;
10 1000
20 2000
30 3000
40 4000
;
run;

data want;
   recno=_n_+1;
   set have end=last;
   if not last 
           then set have (keep=salary rename=(salary=next_row_salary)) point=recno;
      else call missing(next_row_salary);
run;
BellevueBob
  • 9,498
  • 5
  • 29
  • 56
Longfish
  • 7,582
  • 13
  • 19
  • Good solution. I tried to edit the answer in insert some indentation but could not get it to work. I'm new here also and still have lots to figure out. – BellevueBob Jul 20 '12 at 14:08
  • Wow. That is awesome I wish I could vote this up more. I've never used multiple set statements as I've never understood how they worked until I saw your very elegant answer. Thankyou! – Robert Penridge Jul 20 '12 at 14:56
  • Very interesting. Do you know about efficiency of this approach? Relative to, say, proc expand? – itzy Jul 20 '12 at 15:11
  • This is about as efficient as you can get, given the question asked. The key is to realise that each `SET` statement opens the dataset independently. The `POINT` option allows you to reach out the the "next" observation for all but the last observation in the main dataset (because of the `END=LAST` option). However, it is very important to remember that this solution works because the HAVE dataset is organized exactly that way. – BellevueBob Jul 20 '12 at 17:22
  • 1
    Interesting approach I've not seen before. Will have to keep this in mind--although I would usually reverse the sort and use lag since that would allow you to still use BY-processing (so if there were multiple by groups, you could set the last value to missing for each by group) – o.h Jul 20 '12 at 18:22
  • 1
    @itzy tested the efficiency on 10 million observations and it took 18 seconds to build the test dataset then 20 seconds to iterate over it again and perform the next record lookup. So I'd say it's very efficient and scalable. I didn't test how this compared to proc expand. – Robert Penridge Jul 21 '12 at 05:07
  • @Keith If there was some way I could applaud on here I'd do it, but until SO adds that functionality I'll just upvote you and use this technique in the future. – CaffeineConnoisseur Jun 13 '13 at 05:08
2

There's no direct way to do this in a data step. There's two approaches you can use:

Option 1: Sort in reverse, use the lag function

proc sort data=your_dataset;
 by descending id;
run;

data your_dataset;
 set your_dataset;
 next_row_salary = lag(salary);
run;

proc sort; by id; run;

Options 2: Use proc expand

proc expand data=your_dataset method=none;
 by id;
 convert salary = next_row_salary / transformout=(lead 1);
run;
itzy
  • 11,275
  • 15
  • 63
  • 96
  • data out (keep = p_id p_salary next_row_salary); set input end = last; retain p_id; retain p_salary; next_row_salary = salary; if _n_ ne 1 then output; p_id = id; p_salary = salary; run; I executed this code. This displays all the observations in the output dataset except the last one (40 4000 null) – athresh Jul 20 '12 at 13:19
  • In the above code the 6th statement is n – athresh Jul 20 '12 at 13:27
  • what if the input is 20 2000 10 1000 40 4000 30 3000 – athresh Jul 20 '12 at 13:34
  • you could solve the problem with sorting the data by adding an additional id column (sort_id = ____n____)...then sort sort_id descending, output your next_row_salary..then sort it back and remove the sort_id....but by that time the point= solution would probably be more efficient – Jay Corbett Jul 20 '12 at 16:42
2

This code is from Paul Dorfman on SAS-L and it does what you are looking for as well with one pass through the data

Data Salary;
input id salary;
Datalines;
10 1000
20 2000
30 3000
40 4000
;
Run;

data need ;
retain id salary;
set salary (rename=(id = origid salary=next_salary)) end=end ;
if _n_ > 1 then output;
salary = next_salary;
id = origid;
if not end then return;
call missing(next_salary);
output;
drop origid;
run ;
highlycaffeinated
  • 19,729
  • 9
  • 60
  • 91
RosaryGuy
  • 21
  • 1