Pages

Sunday, August 4, 2013

Widgets

Rules for writing good SAS codes

All the rules below stem from this one basic rule which is useful for programs in general.
 Programs should not be made so that they are easy to write, they should be made so that they are easy to read.
 Violating this rule is the most common problem with programming and in my opinion is the reason why program architecture has not risen to the level of building architecture. By extenstion it is also the reason why progams are 99% less stable than buildings.
 When an architect is trained on how to create a building, they have a litany of rules and regulations that they have been trained to follow which have one goal in mind, allow those who read the plans to be able to follow them, tweak them, and edit them without the need to consulting the architect unless you need to. In buildings this is very important because if something goes wrong because poor planning in the construction of a building, people die.

And that's not good.
 These same rules do not exist in the world of programming. Programs can be written in a myriad of styles because the languages we write allow such things. For example you can write
 proc sort data=in; by store;
 or
 proc sort data=in;
by store;
run;
and they will do the same thing because they are literally interperted as the same code.
The only makes them equivilent to the computer. It does not make them equivilent to a human and it is the human that will make the mistake when they try to edit it. People may not die because of these mistakes but time and money will be wasted. 
Which one is easier to read?
I think it's the second because each section of thought is on a single line and the indentation and the word "run" allows you to see where the complete thought comes to an end. Others may say the first because the complete thought is on one line.
It is important to make the code easy to read for one other simple reason.
A program is read 20 times more than it is written.
Since a program is read much more often than it is written the reading should be easy at the expense of the writing.
It is easy to forget this when you are in the midst of writing a large bit of code. I forget it all the time becuase I just want to get the writing done. I tend to regret it later when I go back and read the code.
So what are the rules of good programming in SAS?
1. SAS allows 32 characters for dataset names and variable names. Use them.
Nothing is more frustrating than having a variable called xx or aa or junk5 or other nonsense especially if it turns out to be valuable later. Use the 32 characters that SAS gives you to make you programs more like documents.
2. Always end your data steps or proc statements with run unless it is proc sql or proc datasets in which case you should use quit.
Using these does three things.
a. It sends a clear message to the reader that the proc or datastep is beginning.
b. It sends a clear message to SAS to do the work specified in the proc or data step.
c. It makes the log file easier to read because you won't be in the middle of one statement when another is finishing.
It's a very small word but it makes a big deal of difference.
3. When using loops or if-then-else statements, always use a do end block.
You can write
if store=23 then display=1;
and it will work just fine
You can also write
if store=23 then do;
display=1;
end;
and not only will it also work fine but it will also be easier to edit if you need to add a second statement after display=1.
4. SAS allows indentation and white space. Use them to make blocks of code line up with each other. Never use tabs to do the indenting.
Not everyone agrees on the amount of spaces to use when indenting. I like three, some like two. Two things I do know is that tab is bad since the tab character appears different on every text editor and zero is bad in general.
The indentation should make everything look like what is in between the beginning and end of a block is indented to the right.
So instead of this
data changeinput;
set originalinput;
if store=23 then display=1;
run;
It should be this
data changeinput;
set originalinput;
if store=23 then do;
display=1;
end;
run;
Instead of this
proc mixed method=mivque0 data=Mirror_Data noclprint;
model &MODELVARIABLEz. = %do g=1 %to &Totalvar.;
&&var&g. %end;
%if &TotalPrd. gt 0 %then %do;
%do h=1 %to &TotalPrd.; &&Prd&h. %end;
%end;
/ s ddfm=betwithin outpred=pred_res;
ods output solutionf=Fixed_res;
run;

You have this
proc mixed method=mivque0 data=Mirror_Data noclprint;
model &MODELVARIABLEz. = %do g=1 %to &Totalvar.;
&&var&g.
%end;
%if &TotalPrd. gt 0 %then %do;
%do h=1 %to &TotalPrd.;
&&Prd&h.
%end;
%end;
/ s ddfm=betwithin outpred=pred_res;
ods output solutionf=Fixed_res;
run;
You should set it up so that the proc or datastep or loop within a datastep has a clear beginning and end. The run (or quit) on a proc or datastep should start in the same column as the letter p or d. When using a do loop or and if then else statement, the end should be in the same column as the d or i or e.

No comments:

Post a Comment