0

I am working on a project that involves two separate CSV files. The first data set, "Trips" has seven columns, with a trip_id, bike_id, duration, from_station_id, to_station_id, capacity and usertype. User type is the only character values, the rest are numerical. The second csv file has station_id and station_name. The objective is to merge the files in some way that will input the station name from the second csv file into the "from" and "to" station sections in the first, based on station id. I know that this would be extremely easy in excel with an xlookup, but I am wondering the correct way to approach this in SAS.

I am using the SAS university edition (the free online one) if that makes any difference. Our code so far is as follows:

data DivvyTrips;
    infile '/home/u59304398/sasuser.v94/DivvyTrips.csv' dsd;
    input trip_id
          bikeid
          tripduration
          from_station_id
          to_station_id 
          capacity 
          usertype $;
    title "Trips";
run;

data DivvyStations;
    infile '/home/u59304398/sasuser.v94/Divvy_Stations.csv' dsd;
    input station_id
          station_name $;
    title "Stations";
run;

All this does is import the data. I do not think a merge with a sort will work because we need both from and to station names.

Tom
  • 47,574
  • 2
  • 16
  • 29
  • Looks like the same question as https://stackoverflow.com/questions/17326182/sas-code-that-works-like-excels-vlookup-function which @Joe has given a nice solution using a FORMAT. – Tom Dec 06 '21 at 21:31
  • @Tom Thanks for the pointer. Just a reminder, you are a gold badge holder and can do the same as what I just did - close->duplicate, which is the correct option when it's this clearly a duplicate! – Joe Dec 07 '21 at 20:57

2 Answers2

0

SAS uses formats to control how values are displayed as text. It uses informats to control how text is converted to values.

Since your station ID is numeric you can use a FORMAT to display the station names for the station id numbers.

You can create a CNTLIN dataset for PROC FORMAT to build a format from your station list dataset. To define a numeric format you just need to have the FMTNAME, START and LABEL variables in your CNTLIN dataset.

data format;
   fmtname='STATION';
   set divvystations;
   rename station_id=start station_name=label;
run;
proc format cntlin=format;
run;

Now you can use the format with your station variables. For most purposes you will not even need to modify your dataset, just tell SAS to use the format with your variable.

Let's create some example data:

data DivvyTrips;
    infile cards dsd;
    input trip_id
          bikeid
          tripduration
          from_station_id
          to_station_id 
          capacity 
          usertype :$20.
    ;
cards;
1,1,10,1,2,2,AAA
2,1,20,2,3,1,BBB
;

data DivvyStations;
    infile cards dsd ;
    input station_id
          station_name :$20.
    ;
cards;
1,Stop 1
2,Station 2
3,Airport
;

Now create the STATION format.

data format;
   fmtname='STATION';
   set divvystations;
   rename station_id=start station_name=label;
run;
proc format cntlin=format;
run;

Now let's print the trip data and display the stations using the new STATION format.

proc print data=divvytrips;
  format from_station_id to_station_id station. ;
run;

Result:

                                            from_
                                            station_     to_station_
Obs    trip_id    bikeid    tripduration    id               id         capacity    usertype

 1        1          1           10         Stop 1        Station 2         2         AAA
 2        2          1           20         Station 2     Airport           1         BBB

If you do want to create a new character variable you use the PUT() function.

data want;
  set DivvyTrips;
  from_station = put(from_station_id,station.);
  to_station = put(to_station_id,station.);
run;
Tom
  • 47,574
  • 2
  • 16
  • 29
0

In SAS when you "look up" values you join the two "arrays" or in this case tables together.

The simplest way to do this is using a proc sql step:

proc sql;
    create table DivvyTrips_withnames as
    select
     a.*
    ,b.station_name as from_station_name
    ,c.station_name as to_station_name   
    from DivvyTrips a
    left join DivvyStations b
        on a.from_station_id = b.station_id
    left join DivvyStations c
        on a.to_station_id = c.station_id
    ;
quit;

We end up having to do 2 joins onto your original table as we are doing 2 different "lookups", from_station_id and to_station_id.

Lawrence
  • 81
  • 3