In linear regression, confidence interval (CI) of population DV is narrower than that of predicted DV. With the assumption of generalizability, CI of at
is
,
while CI of is
.
The pivot methods of both are quite similar as following.
,
so .
,
so
of linear regression is the point estimate of
for fixed IV(s) model. Or, it is the point estimate of wherein
denotes the correlation of Y and
, the linear composition of random IV(s) . The CI of
is wider than that of
with the same
and confidence level.
[update] It is obvious that CI of relies on the distribution presumption of IV(s) and DV, as fixed IV(s) are just special cases of generally random IV(s). Usually, the presumption is that all IV(s) and DV are from multivariate normal distribution.
In the bivariate normal case with a single random IV, through Fisher’s z-transform of Pearson’s r, CI of the re-sampled can also be constructed. Intuitively, it should be wider than CI of
.
Thus,
CI of can be constructed as
. With the reverse transform
, the CI bounds of
are
and
.
In multiple p IV(s) case, Fisher’s z-transform is
.
Although it could also be used to construct CI of , it is inferior to noncentral F approximation of R (Lee, 1971). The latter is the algorithm adopted by MSDOS software R2 (Steiger & Fouladi, 1992) and R-function ci.R2(…) within package MBESS (Kelley, 2008).
In literature, “CI(s) of R-square” are hardly the literal CI(s) of in replication once more. Most of them actually refer to CI of
. Authors in social science unfamiliar to
hate to type
when they feel convenient to type r or R. Users of experimentally designed fixed IV(s) should have reported CI of
. However, if they were too familiar to Steiger’s software R2 to ignore his series papers on CI of effect size, it would be significant chance for them to report a loose CI of
, even in a looser name “CI of
“.
—-
Lee, Y. S. (1971). Some results on the sampling distribution of the multiple correlation coefficient. Journal of the Royal Statistical Society, B, 33, 117–130.
Kelley, K. (2008). MBESS: Methods for the Behavioral, Educational, and Social Sciences. R package version 1.0.1. [Computer software]. Available from http://www.indiana.edu/~kenkel
Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior research methods, instruments and computers, 4, 581–582.
conf.level <- .95;
##
##Parameters for Part I
##CI of population of DV vs. of re-sampled DV
beta <- 1.73;
sigma <- 0.32;
N <-80;
x <- runif(N);
new <- data.frame(x = seq(0,5, 0.1));
##
##
oldpar <- par(mfrow=c(1,2));
##
y <- beta *x + sigma*rnorm(x);
pred.pred <- predict(lm1<-lm(y ~ x), new
, interval="prediction");
pred.conf <- predict(lm1, new
, interval="confidence");
matplot(new$x,pred.pred,type="l",col="red",
main="Blue: CI of population\nRed: CI of new sample",
xlab="",ylab="");
matplot(new$x,pred.conf,type="l",col="blue",add=TRUE);
points(x,y);
##End of Part I.
##
##Parameters for Part II.
##CIs of \eta^2, \rho^2, vs. R^2
(R2<-summary(lm1)$r.square); ##Observed R-square
N <- N; ## sample size
p <- 1; ## number of predictors, except for intercept
##
require(MBESS);
ci.eta2<-c((ci<-ci.R2(conf.level=conf.level,
R2=R2, N=N, p=p,
Random.Predictors=FALSE))$Lower.Conf.Limit.R2
,ci$Upper.Conf.Limit.R2);
ci.rho2<-c((ci<- ci.R2(conf.level=conf.level,
R2=R2, N=N, p=p,
Random.Predictors=TRUE))$Lower.Conf.Limit.R2
,ci$Upper.Conf.Limit.R2);
ci.R2 <- c((
max(0,
tanh(
atanh(sqrt(R2)) -
sqrt(2/(N-3))*qnorm(0.5+conf.level/2)
)
)
)^2 ,(
tanh(
atanh(sqrt(R2)) +
sqrt(2/(N-3))*qnorm(0.5+conf.level/2)
)
)^2);
ci3<-data.frame(c(ci.eta2,ci.rho2,ci.R2)
,rep(c(' eta2',' rho2','R2'),each=2)
,c(rep(c('Lower','Upper'),3)));
names(ci3)<-c('CI','Type','Direction');
tapply(ci3$CI,list(ci3$Type,ci3$Direction),mean);
##to plot CI bars with boxplot
## let Q1,Q2,Q3 = R2 and Q0,Q4 as CI bounds
ci3.R2<-ci3;ci3.R2$CI[]<-R2;
plot(CI~Type,range=0,
data=rbind(ci3,ci3.R2,ci3.R2),xlab=”,
ylab=paste(conf.level,’CI’),
main=paste(’R2=’,round(R2,4),’N=’,N,’\nSingle predictor’),
xlim=c(0,4),ylim=c(0,1));
abline(h=R2);
##
par <- oldpar;
##

{ 1 } Trackback
[...] The original blog article is “Confidence interval of R-square”, but, which one? [...]
Post a Comment