2009-06-17

SAS - How merge (mal)functions

The normal MERGE statement in SAS simply fouls up on MANY to MANY relationships. You are far better off using SQL but even then there are pitfalls. Check the examples below.
Data setsSetA
SETA
Alpha   Beta Desc
1000    1    Ax1 Bx2
2000    2    Ax1 Bx0
4000    3    Ax2 Bx1
4000    4    Ax2 Bx1
5000    5    Ax3 Bx2
5000    6    Ax3 Bx2
5000    7    Ax3 Bx2
SetB
SETB
Alpha   Gamma
1000    11
1000    12
3000    13
4000    14
5000    15
5000    16

SAS Merge
Does not perform combinatorial on 5000.
This is rediculous
DATA SetAB01;
MERGE SetA SetB;
BY Alpha;
RUN;
Alpha Beta Desc    Gamma
1000  1    Ax1 Bx2 11
1000  1    Ax1 Bx2 12
2000  2    Ax1 Bx0 
3000               13
4000  3    Ax2 Bx1 14
4000  4    Ax2 Bx1 14
5000  5    Ax3 Bx2 15
5000  6    Ax3 Bx2 16
5000  7    Ax3 Bx2 16
SAS SQL Simple Join
Does not include unmatched.
Where is Alpha=2000,3000?
PROC SQL;
CREATE TABLE SetAB02 AS
SELECT *
FROM SetA, SetB
WHERE SetA.Alpha = SetB.Alpha;
QUIT;
Alpha Beta Desc    Gamma
1000  1    Ax1 Bx2 11
1000  1    Ax1 Bx2 12
4000  3    Ax2 Bx1 14
4000  4    Ax2 Bx1 14
5000  5    Ax3 Bx2 15
5000  6    Ax3 Bx2 15
5000  7    Ax3 Bx2 15
5000  5    Ax3 Bx2 16
5000  6    Ax3 Bx2 16
5000  7    Ax3 Bx2 16
SAS SQL Full Join
Drops the Alpha value for unmatched left.
Where is Alpha=3000?
PROC SQL;
CREATE TABLE SetAB03 AS
SELECT *
FROM SetA FULL JOIN SetB
ON SetA.Alpha = SetB.Alpha;
QUIT;
Alpha Beta Desc    Gamma
1000  1    Ax1 Bx2 11
1000  1    Ax1 Bx2 12
2000  2    Ax1 Bx0    
13
4000  3    Ax2 Bx1 14
4000  4    Ax2 Bx1 14
5000  5    Ax3 Bx2 15
5000  5    Ax3 Bx2 16
5000  6    Ax3 Bx2 15
5000  6    Ax3 Bx2 16
5000  7    Ax3 Bx2 15
5000  7    Ax3 Bx2 16
SAS SQL Full Join Coalesce
Works!
PROC SQL;
CREATE TABLE SetAB03A AS
SELECT
COALESCE(SetA.Alpha, SetB.Alpha)
AS Alpha, *
FROM SetA FULL JOIN SetB
ON SetA.Alpha = SetB.Alpha;
QUIT;
Alpha Beta Desc    Gamma
1000  1    Ax1 Bx2 11
1000  1    Ax1 Bx2 12
2000  2    Ax1 Bx0 
3000               13
4000  3    Ax2 Bx1 14
4000  4    Ax2 Bx1 14
5000  5    Ax3 Bx2 15
5000  5    Ax3 Bx2 16
5000  6    Ax3 Bx2 15
5000  6    Ax3 Bx2 16
5000  7    Ax3 Bx2 15
5000  7    Ax3 Bx2 16
SAS SQL Left Join
Probably what you want.
No left unmatched
Where is 3000?
PROC SQL;
CREATE TABLE SetAB04 AS
SELECT *
FROM SetA LEFT JOIN SetB
ON SetA.Alpha = SetB.Alpha;
QUIT;
Alpha Beta Desc    Gamma
1000  1    Ax1 Bx2 11
1000  1    Ax1 Bx2 12
2000  2    Ax1 Bx0 
4000  3    Ax2 Bx1 14
4000  4    Ax2 Bx1 14
5000  5    Ax3 Bx2 15
5000  6    Ax3 Bx2 15
5000  7    Ax3 Bx2 15
5000  5    Ax3 Bx2 16
5000  6    Ax3 Bx2 16
5000  7    Ax3 Bx2 16

A nice Venn Diagram display of SQL linking