Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last time when I read the paper “A General Regression Neural Network” by Donald Specht, it was exactly 10 years ago when I was in the graduate school. After reading again this week, I decided to code it out with SAS macros and make this excellent idea available for the SAS community.
The prototype of GRNN consists of 2 SAS macros, %grnn_learn() for the training of a GRNN and %grnn_pred() for the prediction with a GRNN. The famous Boston Housing dataset is used to test these two macros with the result compared with the outcome from the R implementation below. In this exercise, it is assumed that the smoothing parameter SIGMA is known and equal to 0.55 in order to simplify the case.
pkgs <- c('MASS', 'doParallel', 'foreach', 'grnn')
lapply(pkgs, require, character.only = T)
registerDoParallel(cores = 8)
data(Boston)
X <- Boston[-14]
st.X <- scale(X)
Y <- Boston[14]
boston <- data.frame(st.X, Y)
pred_grnn <- function(x, nn){
xlst <- split(x, 1:nrow(x))
pred <- foreach(i = xlst, .combine = rbind) %dopar% {
data.frame(pred = guess(nn, as.matrix(i)), i, row.names = NULL)
}
}
grnn <- smooth(learn(boston, variable.column = ncol(boston)), sigma = 0.55)
pred_grnn <- pred_grnn(boston[, -ncol(boston)], grnn)
head(pred_grnn$pred, n = 10)
# [1] 24.61559 23.22232 32.29610 32.57700 33.29552 26.73482 21.46017 20.96827
# [9] 16.55537 20.25247
The first SAS macro to train a GRNN is %grnn_learn() shown below. The purpose of this macro is store the whole specification of a GRNN in a SAS dataset after the simple 1-pass training with the development data. Please note that motivated by the idea of MongoDB, I use the key-value paired scheme to store the information of a GRNN.
libname data '';
data data.boston;
infile 'housing.data';
input x1 - x13 y;
run;
%macro grnn_learn(data = , x = , y = , sigma = , nn_out = );
options mprint mlogic nocenter;
********************************************************;
* THIS MACRO IS TO TRAIN A GENERAL REGRESSION NEURAL *;
* NETWORK (SPECHT, 1991) AND STORE THE SPECIFICATION *;
*------------------------------------------------------*;
* INPUT PARAMETERS: *;
* DATA : INPUT SAS DATASET *;
* X : A LIST OF PREDICTORS IN THE NUMERIC FORMAT *;
* Y : A RESPONSE VARIABLE IN THE NUMERIC FORMAT *;
* SIGMA : THE SMOOTH PARAMETER FOR GRNN *;
* NN_OUT: OUTPUT SAS DATASET CONTAINING THE GRNN *;
* SPECIFICATION *;
*------------------------------------------------------*;
* AUTHOR: *;
* WENSUI.LIU@53.COM *;
********************************************************;
data _tmp1;
set &data (keep = &x &y);
where &y ~= .;
array _x_ &x;
_miss_ = 0;
do _i_ = 1 to dim(_x_);
if _x_[_i_] = . then _miss_ = 1;
end;
if _miss_ = 0 then output;
run;
proc summary data = _tmp1;
output out = _avg_ (drop = _type_ _freq_)
mean(&x) = ;
run;
proc summary data = _tmp1;
output out = _std_ (drop = _type_ _freq_)
std(&x) = ;
run;
proc standard data = _tmp1 mean = 0 std = 1 out = _data_;
var &x;
run;
data &nn_out (keep = _neuron_ _key_ _value_);
set _last_ end = eof;
_neuron_ + 1;
length _key_ $32;
array _a_ &y &x;
do _i_ = 1 to dim(_a_);
if _i_ = 1 then _key_ = '_Y_';
else _key_ = upcase(vname(_a_[_i_]));
_value_ = _a_[_i_];
output;
end;
if eof then do;
_neuron_ = 0;
_key_ = "_SIGMA_";
_value_ = σ
output;
set _avg_;
array _b_ &x;
do _i_ = 1 to dim(_b_);
_neuron_ = -1;
_key_ = upcase(vname(_b_[_i_]));
_value_ = _b_[_i_];
output;
end;
set _std_;
array _c_ &x;
do _i_ = 1 to dim(_c_);
_neuron_ = -2;
_key_ = upcase(vname(_c_[_i_]));
_value_ = _c_[_i_];
output;
end;
end;
run;
proc datasets library = work;
delete _: / memtype = data;
run;
quit;
********************************************************;
* END OF THE MACRO *;
********************************************************;
%mend grnn_learn;
%grnn_learn(data = data.boston, x = x1 - x13, y = y, sigma = 0.55, nn_out = data.grnn);
proc print data = data.grnn (obs = 10) noobs;
run;
/* SAS PRINTOUT OF GRNN DATA:
_neuron_ _key_ _value_
1 _Y_ 24.0000
1 X1 -0.4194
1 X2 0.2845
1 X3 -1.2866
1 X4 -0.2723
1 X5 -0.1441
1 X6 0.4133
1 X7 -0.1199
1 X8 0.1401
1 X9 -0.9819
*/
After the training of a GRNN, the macro %grnn_pred() would be used to generate predicted values from a test dataset with all predictors. As shown in the print-out, first 10 predicted values are identical to those generated with R.
libname data '';
%macro grnn_pred(data = , x = , id = NA, nn_in = , out = grnn_pred);
options mprint mlogic nocenter;
********************************************************;
* THIS MACRO IS TO GENERATE PREDICTED VALUES BASED ON *;
* THE SPECIFICATION OF GRNN CREATED BY THE %GRNN_LEARN *;
* MACRO *;
*------------------------------------------------------*;
* INPUT PARAMETERS: *;
* DATA : INPUT SAS DATASET *;
* X : A LIST OF PREDICTORS IN THE NUMERIC FORMAT *;
* ID : AN ID VARIABLE (OPTIONAL) *;
* NN_IN: INPUT SAS DATASET CONTAINING THE GRNN *;
* SPECIFICATION GENERATED FROM %GRNN_LEARN *;
* OUT : OUTPUT SAS DATASET WITH GRNN PREDICTIONS *;
*------------------------------------------------------*;
* AUTHOR: *;
* WENSUI.LIU@53.COM *;
********************************************************;
data _data_;
set &data;
array _x_ &x;
_miss_ = 0;
do _i_ = 1 to dim(_x_);
if _x_[_i_] = . then _miss_ = 1;
end;
if _miss_ = 0 then output;
run;
data _data_;
set _last_ (drop = _miss_);
%if &id = NA %then %do;
_id_ + 1;
%end;
%else %do;
_id_ = &id;
%end;
run;
proc sort data = _last_ sortsize = max nodupkey;
by _id_;
run;
data _data_ (keep = _id_ _key_ _value_);
set _last_;
array _x_ &x;
length _key_ $32;
do _i_ = 1 to dim(_x_);
_key_ = upcase(vname(_x_[_i_]));
_value_ = _x_[_i_];
output;
end;
run;
proc sql noprint;
select _value_ ** 2 into :s2 from &nn_in where _neuron_ = 0;
create table
_last_ as
select
a._id_,
a._key_,
(a._value_ - b._value_) / c._value_ as _value_
from
_last_ as a,
&nn_in as b,
&nn_in as c
where
compress(a._key_, ' ') = compress(b._key_, ' ') and
compress(a._key_, ' ') = compress(c._key_, ' ') and
b._neuron_ = -1 and
c._neuron_ = -2;
create table
_last_ as
select
a._id_,
b._neuron_,
sum((a._value_ - b._value_) ** 2) as d2,
mean(c._value_) as y,
exp(-(calculated d2) / (2 * &s2)) as exp
from
_last_ as a,
&nn_in as b,
&nn_in as c
where
compress(a._key_, ' ') = compress(b._key_, ' ') and
b._neuron_ = c._neuron_ and
b._neuron_ > 0 and
c._key_ = '_Y_'
group by
a._id_, b._neuron_;
create table
_last_ as
select
a._id_,
sum(a.y * a.exp / b.sum_exp) as _pred_
from
_last_ as a inner join (select _id_, sum(exp) as sum_exp from _last_ group by _id_) as b
on
a._id_ = b._id_
group by
a._id_;
quit;
proc sort data = _last_ out = &out sortsize = max;
by _id_;
run;
********************************************************;
* END OF THE MACRO *;
********************************************************;
%mend grnn_pred;
%grnn_pred(data = data.boston, x = x1 - x13, nn_in = data.grnn);
proc print data = grnn_pred (obs = 10) noobs;
run;
/* SAS PRINTOUT:
_id_ _pred_
1 24.6156
2 23.2223
3 32.2961
4 32.5770
5 33.2955
6 26.7348
7 21.4602
8 20.9683
9 16.5554
10 20.2525
*/
After the development of these two macros, I also compare predictive performances between GRNN and OLS regression. It turns out that GRNN consistently outperforms OLS regression even with a wide range of SIGMA values. With a reasonable choice of SIGMA value, even a GRNN developed with 10% of the whole Boston Housing dataset is able to generalize well and yield a R^2 > 0.8 based upon the rest 90% data.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
