Pages

Friday, August 2, 2013

Widgets

Detect missing by statements in merges

There is an option you can set to detect missing by statements in merges. You can specify mergenoby=nowarn|warn|error. It might be a good idea to always set this to mergenoby=error at the head of your code but you have got to keep in mind that you might be calling standard macros that do unusual merges and so you might get an error message coming out of one of them. If you don't, then all well and good. Maybe you have an autoexec that you could specify this option in or a piece of code you always "%inc"lude at the top of every program and so you could put it in there. If you can do this so you are sure it is there, then you will never hit the problem I am describing next.
This is one of these errors you make when working under pressure. You are merging two datasets together. You are being very careful to have only the variables you want in the keep lists. You make sure the "where" clause is correct. You carefully code the rest of the data step. You run it. It works. There are no errors or warning. You move on to the next task in the certain knowledge that your code is correct. BUT - you have forgotten the "by" statement.

Take a look at this log and print. It is the simplest illustration of this type of error. I am merging a treatment dataset with an AE dataset but have missed out the "by" statement. Do you see any errors in the log? Any warnings? No, there are none. The observations will be merged one on one in this case since no "by" statement was specified. You end up with subj=1 having two different treatment groups and subj=2 having none.

47 data tmt;
48 subj=1;tmt=0;output;
49 subj=2;tmt=1;output;
50 run;
NOTE: The data set WORK.TMT has 2 observations and 2 variables.
NOTE: DATA statement used: real time 0.04 seconds

51
52 data aes;
53 length ae $ 10;
54 subj=1;ae='cough';output;
55 subj=1;ae='cold';output;
56 subj=2;ae='flu';output;
57 run;

NOTE: The data set WORK.AES has 3 observations and 2 variables.
NOTE: DATA statement used: real time 0.00 seconds

58
59 data bad;
60 merge tmt aes;
61 run;

NOTE: There were 2 observations read from the data set WORK.TMT.
NOTE: There were 3 observations read from the data set WORK.AES.
NOTE: The data set WORK.BAD has 3 observations and 3 variables.
NOTE: DATA statement used: real time 0.05 seconds

62
63 options nocenter;
64 proc print data=bad;
65 run;

NOTE: There were 3 observations read from the data set WORK.BAD.
NOTE: PROCEDURE PRINT used: real time 0.05 seconds

Obs subj tmt ae

1 1 0 cough
2 1 1 cold
3 2 . flu

No comments:

Post a Comment