Pages

Thursday, March 1, 2012

How to Improve SQL Merges


If you are looking to improve performance of your SQL code consider the following options:
Include all your numeric, date or datetime variables first in the SELECT statement and leave the character ones for the end. Character variables are not padded when included last, meaning fewer I/O and lower CPU consumption.
Add the parameter _METHOD to your SQL code. It will write debugging messages to the log about the type of join used by the SQL optimizer.

sqxjm: sort-merge join. Algorithm most commonly used by the optimizer. First it will sort the data files and then perform the join. You can influence the optimizer to try to select this method by adding MAGIC=102 to your proc sql statement.
proc sql _method magic=102;
sqxjndx: index join. Using indexes present in the data sets. You can try to influence the optimizer for this method by adding the data set option IDXWHERE=YES.
dir="ltr" style="margin-right: 0px;" mce_style="margin-right: 0px;" mce_style="margin-right: 0px;">
from library.index_data

(idxwhere=yes)
sqxjhsh: hash join. The smaller table in the join is loaded in memory. Influence the optimizer by increasing the default value of the 64K for the BUFFERSIZE option
proc sql _method

buffersize=1048576;

For more such SAS Tips follow us on Facebook:

Simulating Proc FREQ counts with Preloaded Formats


Proc FREQ is commonly used to create frequency counts on cross tabulations. The following code demonstrates this by counting the frequency of Initials by Gender:
* make up some data... count the initials by gender;
data class1;
set sashelp.class;
initial=substrn(name,1,1);
run;
* tabulate frequencies;
proc freq data=class1;
table initial*sex / out=class_freq noprint;
run;
proc print;
run;
However, what if we wanted to always show a count for both genders, even when there are none present in the raw data? One possibility is to use the SPARSE option on the TABLE statement. This works well given there is at least one record somewhere in the cross for each level to be reported.
What can we do if no data exist on the set being analysed for a given level? For example, in a survey "Strongly Disagree" was a valid response but no records were ever observed. This is where preloaded formats provide a robust solution.
Several procedures such as Proc MEANS, SUMMARY, TABULATE and REPORT allow the use of preloaded formats. Each value that is needed in the output is defined in the format. The following code shows an example of creating the same frequency counts as Proc FREQ:
* Ensure all the permutations are reported using a format
and proc summary;
proc format ;
value $gender
'F'='Female'
'M'='Male';
run;
proc summary data=class1 completetypes nway ;
class initial;
class sex / preloadfmt;
output out=class_summ(drop=_type_) / autoname;
run;
proc print;
run;

For more such SAS Tips follow us on Facebook: