-1

I'm working on an assignment & not allowed to use PROC SQL. I imported data from two csv files & renamed columns at the same time (see code below) but when I tried to PROC APPEND the two tables, there's an issue with the date format. When I PROC CONTENTS I get the following info:

Work.2019data: Variable:date Type:Num Length:8 Format/Informat:MMDDYY10 When I open this file in notepad, the dates appears like this: 12/31/2019

For the second table:

Work.2020data: Variable:date Type:Num Length:8 Format/Informat:YYMMDD10 But when I open this file in notepad, the dates appears like this: 2020-11-16

PROC IMPORT
    DATAFILE= "&export_mtl/2019data.csv"
    OUT= WORK.2019data
    (RENAME=(new_cases=nouveaux_cas
             new_deaths=nouveaux_deces
             new_tests=nouveaux_tests
             total_tests=nb_total_tests
             female_smokers=femmes_fumeuses
             male_smokers=hommes_fumeurs
            ))
    DBMS= csv
    REPLACE;guessingrows=10000;
    GETNAMES= YES;
RUN;

PROC IMPORT
    DATAFILE= "&export_mtl/2020data.csv"
    OUT= WORK.2020data
(RENAME=(new_cases=nouveaux_cas
             new_deaths=nouveaux_deces
             new_tests=nouveaux_tests
             total_tests=nb_total_tests
             female_smokers=femmes_fumeuses
             male_smokers=hommes_fumeurs
            ))
    DBMS= csv
    REPLACE;guessingrows=10000;
    GETNAMES= YES;
RUN;

What's the simplest way to cast the date in the 2020data table so that I can concatenate the two tables after? I've seen so many ways of doing this & tried them with no luck.

LuizZ
  • 945
  • 2
  • 11
  • 23
Earl
  • 3
  • 2
  • Why are you using PROC IMPORT to read a text file? Just write your own data step to read the file(s). Then you can control both what formats are attached to the variables and what names are used for the variables. The data step will probably end up being less code than the proc import steps. – Tom Nov 30 '20 at 05:22
  • My professor specifically requested that we use the PROC IMPORT to import the 3 CSV files - I had no choice. – Earl Nov 30 '20 at 20:51

2 Answers2

0

If you just want to stack both files, you can go with a data step. It will do the task, assigning to the second dataset the same formats of the first:

data data_2019_2020;
set 2019data 2020data;
run;

If you want to change the format of date, you can also do it with a datastep. For example:

data data_2019_2020;
set 2019data 2020data;
FORMAT date YYMMDD10.;
run;

Or, if you prefer, you can change just in one dataset first and then use PROC APPEND to stack.

LuizZ
  • 945
  • 2
  • 11
  • 23
  • Hi Luiz! Thanks for taking the time to answer. Both methods work based on the Output Data, but it doesn't save the stacked files into a new dataset. My fault, I forgot to add that in my initial question. – Earl Nov 30 '20 at 00:12
  • @Earl, the methods do save the stacked file in a new dataset. I named the dataset as `data_2019_2020` . But you may change it for the name you prefer. Take a look into your working directory, for files starting with `data_`, the file will be there. If you want a different library, you just need to specify it: (e.g: `mydata.data_2019_2020` to save it in a previously defined library called `mydata`) – LuizZ Nov 30 '20 at 00:20
  • 1
    I realized about 10 mins after posting... thank you so much!! – Earl Nov 30 '20 at 00:32
0

If you combine two datasets that have the same variable then the default format attached to the variable will be first non missing format that the data step compiler sees. So for your example if you run

data want;
   set '2019data'n '2020data'n ;
run;

then the MMDDYY10. format will be used. But if you reverse the order that the input datasets are referenced

data want;
   set '2020data'n '2019data'n ;
run;

then the YYMMDD10. format will be used.

But if you add a FORMAT statement to the data step then you can control what format will be attached in the new dataset. So if you used:

data want;
   set '2020data'n '2019data'n ;
   format date date9.;
run;

Now the default style that will be used to display the values of DATE in WANT will be ddMONyyyy.

Tom
  • 47,574
  • 2
  • 16
  • 29