Pages

Saturday, August 3, 2013

Widgets

The IN= dataset option with Example

Sometimes we have multiple input datasets, and it's important to know which dataset some row comes from. Here we have two input datasets, a and b, sorted by some common matching variable x. We want to know how many observations come from each of the input datasets, and how many from both (as merged by the matching variable). 

1) data _null_ ; 
2) merge a ( in = in_a ) 
3) b ( in = in_b ) 
4) end=the_end ; 
5) by x ; 
6) retain num_a num_b num_both 0 ; 
7) num_a + in_a ; 
8) num_b + in_b ; 
9) num_both + ( in_a and in_b ) ; 
10) if the_end 
11) then put / num_a= num_b= num_both= / ; 
12) run ; 

Notes: 
1) The DATA step normally creates an output dataset. If we just want to do some processing, but not create a dataset, we specify the special "no-dataset dataset" _null_. 

2) and 3) The in= dataset option creates a temporary (non-saved) variable that will be True only if a row comes from that dataset, and False otherwise. For a given input row, in_a will be True only if that row comes from dataset a, and in_b will be True only if it comes from dataset b. Note that since the rows are being merged with a common by variable, an observation could well come from both input datasets.

4) Again we create a temporary variable, the_end this time, that will be True only at the last observation of the input datasets. 

6) We want our accumulators to keep their values between the input observations, and to start out at zero. 

7) and 8) SAS has only numeric and character variables; boolean variables ("True" and "False") are actually numeric: True is 1 and False is 0. So the variables in_a and in_b will have the values 1 or 0, and these values can be used in ordinary arithmetic, such as adding to an accumulator. 

9) Boolean variables can also be used in logical expressions, which return the boolean values of 1 or 0. Here, the expression ( in_a and in_b ) is True (=1) if and only if both in_a and in_b are True (=1). The value of the expression can then be used arithmetically in our count. 

10) Here we use the temporary variable created in line 4). 

11) In this case, we just want a note in the log. This put statement will skip a line, print the values of our three accumulators, and skip another line. 

No comments:

Post a Comment