Pages

Friday, August 30, 2013

SUBSTR to replace

SUBSTR can also to used to replace values of a string. In the below example the variable ID, where the first two characters can replaced by XX, with the help of the SUBSTR function

LAST      NAME    ID
Barczak  George  CP436
Adams   Linda     LA543
Preiss    Gloria    AU942
To change the first two letters of the  ID to "XX", use the SUBSTR function on the left side of the assignment statement:
substr(employee_id,1,2)='DA';
Result:
LAST       NAME     ID
Barczak   George   XX436
Adams    Linda      XX543
Preiss     Gloria     XX942

Monday, August 26, 2013

No Warning Messages

When you write a code ensuring no warning messages are thrown in log but this get the error message below.

WARNING 32-169: The quoted string currently being processed has become
                more than 262 characters long.  You may have unbalanced 
                quotation marks.

If so, check out the QUOTELENMAX/NOQUOTELENMAX system option. This option is particularly appropriate when you have long text strings in, for example, the ODS HTML TEXT= statement.

Monday, August 12, 2013

Save Disk Space in SAS

Save disk space and make your data sets easier to understand by inputting only the data you need. Drop variables you no longer need.

DATA mylib.yearly(DROP=Rain1-Rain12);
SET Old(DROP=Snow1-Snow12);
Total = SUM(of Rain1-Rain12);
programming statements
RUN;

Drop DO loop indexing variables.

data Mylib.NewCost (DROP=i);
set Mylib.Cost;
array Amt(100) Amt1-Amt100;
do i=1 TO 100;
Amt(i)=MAX(0,Amt(i));
end;
run;

Store numeric categorical data in character variables to save space.
length quest1-quest40 $ 1; 

Increase EFFICIENCY of a SAS code

THINK Before You Type. One of the best practices of a good programming is planning before you type. Try to keep the code as simple as possible. Use procedures more than datasteps. For 
example, permanently save only the final data set. Use temporary data sets to store intermediate results.

Instead of this:

data Mylib.New;
set Mylib.Sept Mylib.Oct;
programming statements
run; 
data Mylib.District;
set Mylib.All Mylib.New;
programming statements
run;

Use this:

data New;
set Mylib.Sept Mylib.Oct;
programming statements
run; 
data Mylib.District;
set Mylib.All New;
programming statements
run;

Saturday, August 10, 2013

Check for the existence of a dataset

So often there are so many datasets in a library that it becomes impossible to look out for a particular dataset. To help find a dataset, we can use a simple DATA step function for checking the existence if the dataset. The function is EXIST. Here is the macro to achieve this.


 %MACRO DATA_EXIST(INPUT);
  %IF %SYSFUNC(EXIST(&INPUT)) %THEN %DO;
  %PUT DATA EXISTS;
  %END;
  %ELSE %DO;
  %PUT DATA DOES NOT EXISTS;
  %END;
 %MEND;
 
 %DATA_EXIST(SASUSER.TEST);
Further simplification can be done by:
%MACRO D_EXIST(INPUT);
 %SYSFUNC(EXIST(&INPUT))
%MEND D_EXIST;
The macro %D_EXIST could be called from within an expression.
 %IF %D_EXIST(SASHELP.CLASS) %THEN %DO;

The Essence of PROC APPEND

The APPEND procedure (and APPEND statement of the DATASETS procedure) is an efficient method for concatenating observations from a smaller data set to a larger data set. The BASE= data set option is reserved for the larger of the two data sets with the DATA= option for the smaller data set. Essentially, the APPEND procedure avoids reading any observations in the BASE= data set by positioning the record pointer at the end of the BASE= data set. Each observation from the smaller data set is then applied one at a time to the end of the BASE= data set. In the first example, the BASE= data set identifies a larger data set called MASTER and the DATA= data set identifies the smaller TXN data set.

Code:

PROC APPEND

BASE=master

DATA=txn;

run;

When two or more data sets need to be concatenated, multiple APPEND procedures are issued. In the next example, two separate PROC APPEND steps are specified to concatenate the two smaller data sets (TXN1 and TXN2) at the end of the larger BASE= (MASTER) data set.

Code:

PROC APPEND

BASE=master

DATA=txn1;

RUN;

PROC APPEND

BASE=master

DATA=txn2;

RUN;


Forcing the Concatenation of Data Sets
When one or more variables in the input data set, represented by the DATA= option, are not present in the BASE= data set, or the variables in the input data set are not of the same type as the variables in the BASE= data set, an optional FORCE option can be specified on the PROC APPEND statement to force the concatenation of the data sets. Using the FORCE option also prevents an error from being generated in these situations. The next example illustrates the FORCE option being specified in the concatenation of the TXN data set to the MASTER data set.

Code:

PROC APPEND

BASE=master

DATA=txn

FORCE;

RUN;

Selecting Observations for Concatenation
By default, all the rows of an input data set are typically concatenated to the end of the base data set. But when only selected rows of an input data set are desired, an optional WHERE clause can be specified for the associated input data set as a data set option. This is accomplished by specifying an optional WHERE clause. The WHERE clause defines the logical conditions that control which observations will be selected from an input data set, and ultimately concatenated to the end of the base data set. In the next example, a WHERE clause is specified to select only ‘PG’ and ‘PG13’ movies from the input data set for concatenation to the end of the base data set.

Code:

PROC APPEND

BASE=master

DATA=txn (WHERE=(rating in ('PG', 'PG-13')));

RUN;

Designating the Master
It is not at all a given that BASE= should point to the longer of the two data sets and DATA= to the shorter. One might do the opposite and use the shorter as the base

§ because one has been established as the master and the other is merely a container for a subset

§ to preserve structural properties (such as variable order)

§ because the two are in different libraries and the concatenation belongs in one of those and not the other

§ to maintain BY group ordering

§ to preserve a name for the concatenation without the need to invoke PROC DATASETS to swap names

If none of these concerns arise, and one is indifferent about the order, and one knows which data set is the larger of the two, it does make sense to reduce I/O by making the larger data set the base.

Conclusion
The APPEND procedure is a powerful tool for SAS users to use for data set concatenation purposes. Users can also concatenate unlike data sets using the FORCE option on the PROC APPEND statement as well as select which observations are captured from the input data set using a WHERE clause and concatenated to the end of the base data set. Users are encouraged to explore the many capabilities the APPEND procedure (or the APPEND statement of the DATASETS procedure) has to offer.

Proc Datasets Tip

Delete all labels in th.pnsmutaa

proc datasets library = th nolist;
modify pnsmutaa;
attrib _all_ label='';
quit;

Rename th.bstpns_org to th.bstpns_org1

proc datasets library = th nolist;
change stpns_org = bstpns_org1;
quit;

Delete all in a library

proc datasets library = work kill nolist;
quit;

Delete one specific dataset from a library
proc datasets library = work nolist;
delete tst1;
quit;

Copy all in library th to work

proc datasets nolist;
copy out=work in=th;
quit;

Copy th.rappformat to work.rappformat
proc datasets nolist;
copy out=work in = th;
select rappformat;
quit;

Creating SAS dataset using datalines gives error if written inside a macro

The error is something like this:

ERROR: The macro X generated cards (data lines) for the DATA step which could cause incorrect results.
The DATA step and macro will stop executing.

The solution is to insert the values in a table using proc sql.
The sample code below demonstrates the same.

options nomlogic mprint nosymbolgen;

%macro insertData;
proc sql;
create table class
(Semester char(25) ,
Professor char(24),
Location char(12),
Cost num(5));
quit;
proc sql;
insert into class
(Semester,Professor,Location, Cost)
values('Fall','Dr. Armstrong','London',3000)
values('Spring','Dr. Armstrong','Prague', 3300)
values('Spring','Dr. Hannity','Beijing',3000)
values('Fall','Dr. Hannity','London',3000)
values('Spring','Dr. Smith','Prague',3300)
values('Fall','Dr. Smith','Los Angeles',3800)
;
quit;
%mend insertData;
%insertData;

Using the ANY and ALL Operators in Proc SQL

The ANY and ALL operators are Proc SQL operators which can be used to compare the result of a query expression against either ANY or ALL of the results returned from another.
The following example demonstrates its use as part of a subquery comparing the annual sales of sports shoes from our 'Mozart' company with the top brands nationwide. The ANY operator is used initially to determine which of our company brands are outselling any one or more of the market leaders.
data mozart_shoes;
format id z3. name $char24.;
input id name $ & sales;
datalines;
001 Super-tread trainer 50000
002 X-Pro sports trainer 60000
003 Road runner deluxe 55000
004 Triathlon special 30000
run;
data top_brands;
format name $char24.;
input name $ & sales;
datalines;
Cougar 500 58000
Mike running pro 56000
Abibas Super-tread 53000
run;
proc sql;
select name, sales
from mozart_shoes
where sales gt any (select sales from top_brands);
quit;

This query returns the following results as expected where these two sports shoes have annual sales greater than at least one of the top brands.

name sales
__________________________________
X-Pro sports trainer 60000
Road runner deluxe 55000

Changing the operator to all shows us the one shoe out competing all of the top brands.

name sales
__________________________________
X-Pro sports trainer 60000

The NOT operator can also be used to good effect as part of the condition. For example, changing our code to the following:

proc sql;
select name, sales
from mozart_shoes
where sales not gt any (select sales from top_brands);
quit;

returns all of the shoes whose annual sales did not exceed any of the top brands.

name sales
__________________________________
Super-tread trainer 50000
Triathlon special 30000

LARGEST and SMALLEST Functions

The MAX and MIN functions allow you to easily identify the maximum and minimum values from a selection of variables. However in order to find the second highest value or the bottom 3 values you will need to make use of the SAS 9 functions LARGEST and SMALLEST.
The following examples are based on the data listed below:
data balances;
input custid $ bal1 bal2 bal3 bal4 bal5 bal6;
datalines;
12345 911.00 863.30 896.98 1672.26 1675.20 602.42
12346 62.27 10.28 1377.79 1683.42 1290.92 553.13
12347 1996.54 74.68 253.94 374.02 691.90 593.75
12348 1494.83 1114.74 332.76 505.95 . 963.98
12349 336.61 295.78 .a 1965.40 592.76 .b
;
run;
Maximum and Minimum
The syntax for the maximum and minimum functions is as follows.
function(list of variables);

Largest and Smallest
The largest and smallest functions have similar syntax.
function(value, list);
Where value is the nth item you want (e.g. 2 for the second largest/smallest) and
list is a list of variables that will be processed.
The following code demonstrates the use of the MAX, MIN, LARGEST and SMALLEST functions.
data balances2;
set balances;
maxbal=max(of bal1-bal6);
minbal=min(of bal1-bal6);
large1=largest(1,of bal1-bal6);
large2=largest(2,of bal1-bal6);
small1=smallest(1,of bal1-bal6);
small2=smallest(2,of bal1-bal6);
small3=smallest(3,of bal1-bal6);
drop bal:;
run;
The following results are produced:
Max, Min, Smallest and Largest Functions

maxbal minbal large1 large2

1675.20 602.42 1675.20 1672.26
1683.42 10.28 1683.42 1377.79
1996.54 74.68 1996.54 691.90
1494.83 332.76 1494.83 1114.74
1965.40 295.78 1965.40 592.76


Max, Min, Smallest and Largest Functions

small1 small2 small3

602.42 863.30 896.98
10.28 62.27 553.13
74.68 253.94 374.02
332.76 505.95 963.98
295.78 336.61 592.76
Note that where the first argument in the largest and smallest functions is set to '1', the result is the same as the max and min functions. Also note that in this example when evaluating the minimum, the missing values have been ignored.
Handling of missing values
The max and min functions completely exclude missing data from the evaluation however the LARGEST and SMALLEST functions handle missing values in the data by placing any missing values present at the end of the sorted list of variables.
So the LARGEST function will return the maximum data value where a '1' is specified in the first argument, the second largest for a '2' and so on. Once all of the non missing data is referenced the missing values will be returned.
The SMALLEST function will return the lowest value when '1' is specified the second lowest for a '2' and so on. Again once all the non missing values have been referenced the missing values will be returned. The last non missing value in this case will be the highest data value.
The following example demonstrates how the largest and smallest functions deal with missing data values.
data balances3;
set balances;
large1=largest(1,of bal1-bal6);
large6=largest(6,of bal1-bal6);
small1=smallest(1,of bal1-bal6);
small6=smallest(6,of bal1-bal6);
drop bal:;
run;
Generates the following data:
Missing Values with the LARGEST and SMALLEST Functions
large1 large6 small1 small6
1675.20 602.42 602.42 1675.20
1683.42 10.28 10.28 1683.42
1996.54 74.68 74.68 1996.54
1494.83 . 332.76 .
1965.40 . 295.78 .

Wednesday, August 7, 2013

Password protect your data

Want to protect your SAS dataset with a password to protect it from other users. You can use the pw = option in the SAS datastep. Here is an example demonstrating the same. This will protect your data with read/write authorization. 

DATA <EXAMPLE DATA> (ENCRYPT = YES PW = <PASSWORD>);
SET <EXAMPLE DATA>;
RUN;

If you lose your password, a lot of time and effort is needed to decrypt the file.

Extract the first letter of the character variable

Have you ever needed to create a variable which is the first letter of the character variable?  In the below example there are four ways to do that. X1, X2, X3 and X4 all contain letter P.

data one;
 name='Post';
 X1 = substr(name,1,1);
 X2 = first(name);
 substr(X3,1,1) = name;
 length X4 $1;
 X4 = name; 
run;

Create a custom date time format

Sometimes the numeric formats provided by SAS are not enough to deal with the datetime values. Let say we we have a date time format in the form of dd-mm-yyyy hh:mm:ss, then there is no standard format to deal with this.
Below code can help you to create your own custom date time format in SAS.

proc format;
  picture MyMSdt other='%0d-%0m-%0Y %0H:%0M:%0S' (datatype=datetime);
run;
 
data test;
  mydatetime='25nov2009 14:44:56'dt;
  format newdt MyMSdt.;
  newdt=mydatetime;
  put mydatetime= newdt=;
run;
The LOG shows: mydatetime=1574779496 newdt=25-11-2009 14:44:56

Convert Excel Date to SAS Date

When we import excel files into a SAS dataset then the conversion of date/time from excel to SAS becomes messy. 1 January 1960 is Day 0 in SAS while Day 1 in Excel. Below formulas will help to convert the date/time from Excel to SAS

  • SAS_date = Excel_date - 21916;
  • SAS_time = Excel_time * 86400;
  • SAS_date_time = (Excel_date_time - 21916) * 86400;

Excel has two different datetime "systems" and the default differs depending on the platform. The above formulas assume Excel is using the 1900 date system. If Excel is using the 1904 Date System, use 20454 in the above formulas instead of 21916.

Tuesday, August 6, 2013

Creating a dummy Variable the easy way

/*Creating dummy/indicator variables the easy way*/
Data Code;
Input X Y $ ;
datalines;
15 a
30 a
9 b
6 c
;
run;

/*Method 1: Traditional*/

data Code1;
set Code;
If X>10 then Z1=1;
Else Z1=0;
If 9=<X<=15 then Z2=1;
Else Z2=0;
If Y="a" then Z3=1;
else Z3=0;
run;

/*Method 2: The easy way*/

data Code2;
set Code;
Z1=X>10;
Z2=9<=X<=15;
Z3=Y="a";

run;

Monday, August 5, 2013

Using Proc SUMMARY to Produce Multiple Tables with Varying Statistics

When calculating descriptive statistics on a table of data, we often need to produce different statistics for the different measurements contained within. We can create any number of output data sets, each containing different statistics from a single call to the summary procedure. This article shows you how.
Indeed, either of Proc MEANS or Proc SUMMARY can produce the functionality demonstrated here, as under the covers, both procedures are one and the same. It is simply that each executes with different options by default.
Consider the following example:
proc summary data=sashelp.shoes;
types region
region*subsidiary;
class region subsidiary;
var stores sales returns;
output out=total_stores n(stores)=
sum(stores)=
/ autoname autolabel;
output out=sales_analysis
n(sales returns)=
mean(sales returns)=
std(sales returns)=
min(sales returns)=
max(sales returns)=
/ autoname autolabel;
run;

Submitting this code to SAS produces the following log, which illustrates the two output tables generated:
NOTE: There were 395 observations read from the data set
SASHELP.SHOES.
NOTE: The data set WORK.TOTAL_STORES has 63 observations and 6
variables.
NOTE: The data set WORK.SALES_ANALYSIS has 63 observations and
14 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
real time 0.04 seconds
cpu time 0.06 seconds

The procedure statements are explained as follows:
proc summary data=sashelp.shoes;

The PROC statement calls the procedure and reads the SHOES data set from the SASHELP library. This table is available to all readers irrespective of SAS version and platform.
types region
region*subsidiary;

The TYPES statement overrides the default behaviour to summarise all combinations of classification variables. In this example groups will be formed for the unique values within REGION and the cross of values from REGION and SUBSIDIARY columns.
class region subsidiary;

The CLASS statement defines the columns whose values are used to form groups.
var stores sales returns;

The VAR statement defines which variables will be analysed. Not all variables need to be used on OUTPUT statements, and each variable may be used more than once.
output out=total_stores n(stores)=
sum(stores)=
/ autoname autolabel;

The first OUTPUT statement (above) defines the TOTAL_STORES table. It requests the N and SUM statistics are calculated for the STORES variable. Options AUTONAME and AUTOLABEL request the procedure creates unique and meaningful column names and labels for the results.
output out=sales_analysis
n(sales returns)=
mean(sales returns)=
std(sales returns)=
min(sales returns)=
max(sales returns)=
/ autoname autolabel;

The second OUTPUT statement defines the SALES_ANALYSIS table. Various statistics are requested, N through to MAX, for two variables, in this case SALES and RETURNS are analysed. Similarly to the previous example, the AUTONAME and AUTOLABEL options are applied, removing the programmers need to specify each output column name.
run;

The RUN statement marks the boundary of the procedure step and causes execution at this point.

Advanced Uses of the Colon Modifier

In SAS, the colon ( : ) can be used in conjunction with all of the comparison operators (=, >, <, >=, <=, ne, in) to compare prefix's.
Consider the following examples, given the following surnames:
Bevan
Bosley
Bowen
Burden
Bush
Consider the following data step statements using the colon modifier:
if surname =: 'B' then ...
will find all 5 surnames.
if surname =: 'Bo' then ...
will find Bosley and Bowen only.
if surname >=: 'Bo' then ...
will find Bosley, Bowen, Burden and Bush only.
if surname <: 'Bo' then ...
will find Bevan only.
if surname ne: 'Be' then ...
will find Bosley, Bowen, Burden and Bush only.
if surname in: ('Be','Bu') then ...
will find Bevan, Burden and Bush only.
Using the colon modifier with equals (= : ) is commonly referred to as ‘begins with’. BEWARE! What SAS actually does is adjust the character string to be the same length before they can be compared – it will truncate the longer string to the length of the shorter string during the comparison.
So, given the list of surnames – Tom, Tomlinson, Tomson , the following conditional statement will return the results Tom and Tomson
if surname =: 'Tomson' then ...
The surprising result here is that surname=’Tom’ actually meets this condition. In performing this comparison SAS has determined that the value of ‘surname’ is shorter than the right hand side of the equation, and so the right hand side get’s truncated to the same length.
A zero length character used in either in: or =: will always evaluate to 0, or false.
The colon modifier can also be used as a variable name wildcard. For example, if you have a dataset with many variables sharing the same prefix but with a different suffix (e.g. balance_012008, balance_022008, balance_032008 ) the colon modifier can save a lot of typing .
data subset;
set bigdataset (keep=balance: );
...processing statements…
run;

This will keep all the variables that begin with ‘balance’ that are on ‘bigdataset’.
This feature can be used in many ways, such as:
sum (of balance: );
will sum all the variables beginning with ‘balance’
array bal (*) balance:;
creates an array for all ‘balance’ variables
drop balance:;
drops all variables beginning with ‘balance’.

Moving Data Between Libraries

If you would like to move a data set from one library to another, rather than copy it, you can use the MOVE option with either Proc DATASETS or Proc COPY.
First, lets build a simple data set (TABLE1) the WORK library:
data work.table1;
do i = 1 to 10;
output;
end;
run;
Now lets move TABLE1 into the SASUSER library with Proc DATASETS:
proc datasets lib=work;
copy out=sasuser move;
select table1;
quit;
One final example, lets move TABLE1 back into the WORK library, this time with Proc COPY:
proc copy in=sasuser out=work move;
select table1;
run;

Varnum option in Proc contents

Generally when you create a sas dataset by using proc contents on a sas dataset, the output data set is sorted in ascending order by name by default. But if you want to arrange the observations in a same order as the order of columns you can use varnum option in proc contents.
Proc contents data = xyz out = abc varnum;   run;
It creates another column in dataset 'abc' called varnum which numbers the variable name in order which the variable have in data set 'xyz'. The you can sort dataset abc by varnum to get the dataset 'abc' observations arranged in order of the columns in dataset 'xyz'.           

Generally when you create a sas dataset by using proc contents on a sas dataset, the output data set is sorted in ascending order by name by default. But if you want to arrange the observations in a same order as the order of columns you can use varnum option in proc contents.
Proc contents data = xyz out = abc varnum;   run;
It creates another column in dataset 'abc' called varnum which numbers the variable name in order which the variable have in data set 'xyz'. The you can sort dataset abc by varnum to get the dataset 'abc' observations arranged in order of the columns in dataset 'xyz'.                     

Syntax Checking SAS Programs

To check the syntax of a data step or procedure without actually running the code, add a CANCEL keyword to the end of the RUN statement, as shown in the following example:
data work.test;
infile 'external file';
input a b c;
run cancel;
proc print data=work.test;
run cancel;
Each step will validate the "tokens" in your syntax and report any errors to the SAS log. SAS will then stop processing that step, therefore no data are read.
Commonly, a macro variable is created that is resolved in place of the CANCEL keyword. This allows a global method of turning syntax checking on or off. This is demonstrated with the following example:
%let debug=CANCEL;
%*let debug=;
data work.test;
infile 'external file';
input a b c;
run &debug.;
proc print data=work.test;
run &debug.;

Producing Enhanced HTML Reports using the ODS TABLEEDITOR Tagset


The ODS TABLEEDITOR tagset enables us to produce sophisticated HTML reports which exhibit similar characteristics to reports created using Microsoft Excel or Visual Basic .NET. The tagset can be downloaded from the ODS Markup Resources page on the SAS Support site in the form of a ZIP file which contains a host of examples along with the tagsets definition, saved in a TPL file. When extracted and submitted within SAS, this allows us to make use of the TABLEEDITOR tagset as an ODS destination.
This tip aims to demonstrate a couple of examples of using the TABLEEDITOR tagset to significantly enhance the presentation of HTML output.
In our first example the TABLEEDITOR tagset is used to generate HTML output with both autofilters and frozen headers activated. The autofilters have a default width uniformly assigned and have been defined for the first seven columns. Frozen headers have also been activated enabling us to scroll through the data listing whilst still being able to view column headings, in a similar fashion to Excel.
%inc 'tableeditor.tpl';
title "Listing of Product Sales";
ods listing close;
ods tagsets.tableeditor file="Sales_Report_1.html"
style=styles.meadow
options(autofilter="YES"
autofilter_table="1"
autofilter_width="6em"
autofilter_endcol= "7"
frozen_headers="YES"
frozen_rowheaders="YES"
) ;
proc print data=sashelp.prdsale noobs;
var year quarter month country region prodtype
product actual predict;
format actual predict nlmnlgbp12.2;
run;
ods tagsets.tableeditor close;
ods listing;
In order to view the resulting output it will be necessary to permit Internet Explorer to run Active X controls.
In our second example the TABLEEDITOR tagset is used to generate a HTML file with two reports, each of which is displayed on a separate tab which are named in the 'options' supplied on the opening ODS statement.
ods listing close;
ods noproctitle;
ods tagsets.tableeditor file="Sales_Report_2.html"
style=styles.meadow
options(web_tabs="First Report,Second Report");
title "Summary of product sales";
proc means data=sashelp.prdsale mean min max maxdec=2;
class prodtype product;
var actual predict;
run;
title "Mean product sales by country";
proc tabulate data=sashelp.prdsale format=nlmnlgbp12.2;
class prodtype product country;
var actual predict;
table prodtype*product,country*(actual*mean=''
predict*mean='');
run;
ods tagsets.tableeditor close;
ods listing;