Developing normal pdf from symmetry & independence

When I was in the 3rd grade of my middle school, I enjoyed my town bookstore as a standing library. There a series of six math-story books by Zhang Yuan-Nan impressed me a lot. I cited a case from one in my PPT when I taught the normal distribution — the normal pdf can be derived from simple symmetry & independence conditions.

Today I can even google out an illegal pdf of its new edition to verify the case (2005, pp. 89). Actually I have bought the new edition series (now 3 books) and lent them to students. Those conditions are as instinctive as–

1. For white noise errors on 2-D, the independence means pdf at  (x,y) is the product of 1-D pdf, that is,  \phi\left(x\right)\phi\left(y\right) .

2. The symmetry means pdf at  (x,y) is just a function of  x^{2}+y^{2}, nothing to do with direction. That is,  \phi\left(x\right)\phi\left(y\right)=f\left(x^{2}+y^{2}\right).

So,   f\left(x^{2}\right)f\left(y^{2}\right)=f\left(x^{2}+0\right)f\left(0+y^{2}\right)=\phi^{2}\left(0\right)f\left(x^{2}+y^{2}\right).

For middle school students, the book stated a gap here to arrive at the final result   f\left(x^{2}\right)=ke^{bx^{2}}, which is    \phi\left(x\right)=\frac{1}{\phi\left(0\right)}f\left(x^{2}+0\right)=\frac{k}{\phi\left(0\right)}e^{bx^{2}}.

I think non-math graduate students with interests can close the gap by themselves with following small hints.

Denote  \alpha=x^{2},\beta=y^{2}.
We have
 \log f\left(\alpha\right)+\log f\left(\beta\right)=\log\phi^{2}\left(0\right)+\log f\left(\alpha+\beta\right),
or
 \;\;\left[\log f\left(\alpha\right)-\log\phi^{2}\left(0\right)\right]+\left[\log f\left(\beta\right)-\log\phi^{2}\left(0\right)\right]    =\left(\log f\left(\alpha+\beta\right)-\log\phi^{2}\left(0\right)\right).
Denote  g\left(\alpha\right)=\log f\left(\alpha\right)-\log\phi^{2}\left(0\right) .
That is,   g\left(\alpha\right)+g\left(\beta\right)=g\left(\alpha+\beta\right).

Now to prove  g\left(\frac{m}{n}\right)=\frac{m}{n}g\left(1\right),\forall m,n\in\mathbb{N}. With continuousness, it gets  g\left(\alpha\right)=\alpha g\left(1\right),\forall\alpha\in\mathbb{R}.

愉快地发现SciTE和LyX在WinXP下都支持中文

愉快地发现我在WinXP上最常用的编辑工具SciTE(当前版本1.75) 和 LyX(当前版本1.5.3) 都支持unicode(也就是说,支持中文)。之前不了解,只因为缺省设置不支持中文。需要手工操作修改设置。

SciTE的设置是Options->Open Global Options File,编辑SciTEGlobal.properties,找到如下段落

# Unicode
#code.page=65001
code.page=0
#character.set=204
# Required for Unicode to work on GTK+:
#LC_CTYPE=en_US.UTF-8
#output.code.page=65001

修改为

# Unicode
code.page=65001
#code.page=0
character.set=204
# Required for Unicode to work on GTK+:
LC_CTYPE=en_US.UTF-8
output.code.page=65001

保存。然后关闭再打开SciTE,就会发现不再出现中文被切一半的现象。如果编辑的文档格式不是utf-8而是ucs-2 ,还可以在File->Encoding 里临时选。

[update] 除了utf-8, SciTE 还支持国内更常用的GBK码,设置如下:

code.page=936
output.code.page=936
character.set=134

此外,我还推荐把SciTEGlobal.properties文件中的line.margin.visible=1 和 wrap=1 两处的注释#号去掉,效果是缺省显示行号,并使超长的行折行显示。SciTE的优点太多了–开源免费;轻巧,启动快;支持Ctrl+鼠标中轮滚动无级缩放;支持Ctrl+回车 前文已出现过的拼写自动补齐选项;支持Alt键方形选段;…

LyX(版本>=1.5.1)在winXP已经可以在.lyx文件正文和公式框中录入中文。麻烦的是输出中文的pdf。[update]在LyX1.6.0下,需要在菜单Layout->Document->Language->选Chinese(simplified),并勾选 Use language’s default encoding设为English + GBK encoding组合(发现Language选项只影响F7拼写检查,而Encoding只影响pdf/dvi生成;还可以点Save as Document Defaults设为今后的缺省)。我试了若干方案,最后才试成功用南开网站上的MiKTeX+中文插件(Patches4miktex.exe),实现了输出正文的中文为pdf,美中不足是公式框中还不能实现中文公式输出pdf,甚至在Document中的中文设置选定后,公式框中的中文都不能在LyX环境中正确显示,期待后续的LyX版本处理好这个问题[update]公式框中的中文只需要再ctrl-M一次即可。例如,\frac{\mbox{分子}}{\mbox{分母}}可以输出,而\frac{分子}{分母}不行。

新版本LyX已经引入了文档版本控制,相当于word中的revision功能,有待深入试用。目前LyX仍不支持Ctrl+鼠标中轮滚动无级缩放,如果公式显得太小,需要在菜单设置中修改显示缩放比例:Tools->Preferences->Look and feel->Screen fonts->Zoom %。这可能是比较容易在后续版本中实现的功能。

相关网址:

SciTE主页 http://www.scintilla.org/SciTE.html

LyX主页 http://lyx.org/

南开MiKTeX中文插件 http://miktex.math.nankai.edu.cn/

我为Wordpress / Wordpress MU 系列平台制作的支持暗背景LaTeX小插件 http://lixiaoxu.lxxm.com/latex_math_cgi

My first wp plugin work: LaTeX_Math_cgi 1.0


Download: LaTeX_Math_cgi.zip

Install: Download  and unzip it to your wordpress’ /wp-content/plugins/ directory.

Activate it in Plugins menu. View its options in Options > LaTeX tab –

It is actually used as a mimeTeX plugin rather than just a  L^{A}T_{E}X plugin. My contribution is technically trivial. But, you must need it if you change from a light theme to a cool black one while find the default mimeTeX images are black in front and transparent in background. This plugin provides mimeTeX’s \reverse and \opaque options to tune your math forms to your wp theme without editing them one by one.

The 2nd function is the option for your cgi URL. The default is http://www.forkosh.dreamhost.com/mimetex.cgi You can DIY one following Forkosh’s instructions and then set it like /cgi-bin/mimetex.cgi It can help your visitors not to cost Forkosh’s unselfish bandwidth.

Understanding QQ plots

## Try distributions like rchisq, rt, runif, rf to view its heavy, or light, left, or right tail.


n <- 30;
ry <- rnorm(n);
qqnorm(ry);qqline(ry);
max(ry)
min(ry)
##view and guess what are x(s) and y(s)
I <- rep(1,n);
qr <- ((ry%*%t(I) > I %*% t(ry))+.5*(ry %*% t(I) == I%*%t(ry)))%*%I *(1/n);##qr are the sample quantiles
points(qr,ry,col="blue"); ##to view the fact, try the following
points(qr,qr*0,col="green",pch="|");
rx <- qnorm(qr);
points(rx,ry,col="red",pch="O");
##Red O(s) circle black o(s) exactly.

03DEC2007 R-workshop sponsored by dept of psy, ZSU(=SYSU, Guang-Zhou)

Here is the updated PPT for the talk in the afternoon–which includes the zipped example codes and set-up steps for the workshop in the evening within the 3rd page. The listed anonymous on-line test (result statistics) on p-value interpretation was cited indirectly From Gigerenzer, Krauss, & Vitouch (2004).

There is an advert on http://www.psy.sysu.edu.cn/detail_news.asp?id=258 and a formal CV of the speaker is available on http://lixiaoxu.googlepageS.com

Classic Neyman-Pearson approach demo

It notes here that N-P approach does not utilize the information in the accurate p value. Actually, at the time N-P approach was firstly devised, the accurate p value was not available usually. Now almost all statistic softwares provide accurate p values and the N-P approach becomes obsolete. Wilkinson & APA TFSI (1999) recommended to report the accurate p value rather than just significance/insignificance, unless p is smaller than any meaningful precision.

p_prior<-0.22;
##try p_prior<-0.82 to demo significant repetition once more
p_alpha<-0.05;
p_power<-0.8;
##-----------------
op <- par(mfrow=c(1,2));
h_1_s<-p_power*p_prior/(p_alpha*(1-p_prior)+p_power*p_prior);
h_0_i<-(1-p_alpha)*(1-p_prior)/((1-p_alpha)*(1-p_prior)+(1-p_power)*p_prior);
##
D<-matrix(c(1-p_alpha,p_alpha,1-p_power,p_power),nrow=2);
rownames(D)<-c("Insig","Sig");
colnames(D)<-c(paste(1-p_prior,"H_0"),paste(p_prior,"H_1"));
barplot(D
,width=c(1-p_prior,p_prior)
,legend = rownames(D)
,space=0
,col=c(gray(0.6),gray(0.9)),asp=1
);
title(main=paste("Sig: H_1 from",p_prior,"to"
,round(h_1_s,digits=2)
,"\n","Insig: H_0 from",1-p_prior,"to"
,round(h_0_i,digits=2)
)
,sub=paste(round(h_1_s,digits=2),"=(",p_power,"*",p_prior,")/("
,p_alpha,"*",1-p_prior,"+",p_power,"*",p_prior,")"
,"\n",round(h_0_i,digits=2)
,"=(",1-p_alpha,"*",1-p_prior,")/("
,1-p_alpha,"*",1-p_prior,"+",1-p_power,"*",p_prior,")"
)
);
##
v_power<-c(0.1,(5:9)/10);
v_prior<-(1:1000)/1000;
n<-length(v_power);
v_c<-sample(colors())[1:n];
plot(v_prior,v_prior,type="l",col="black"
,xlab="a_prior",ylab="post_hoc",asp=1
,xlim=c(0,1),ylim=c(0,1),main="Power=.1,.5,.6,.7,.8, and .9");
lines(v_prior,1-v_prior);
for (i in 1:n){
p_power<-v_power[i];
v_1_s<-p_power*v_prior/(p_alpha*(1-v_prior)+p_power*v_prior);
v_0_i<-(1-p_alpha)*(1-v_prior)/((1-p_alpha)*(1-v_prior)+(1-p_power)*v_prior);
lines(v_prior,v_1_s,col=v_c[i])
lines(v_prior,v_0_i,col=v_c[i])
}
points(p_prior,h_1_s,col="red");
points(p_prior,p_prior,col="red");
points(p_prior,h_0_i,col="blue");
points(p_prior,1-p_prior,col="blue");
par(op);

Wilkinson, L. & APA TFSI (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

Different corr(s) of different IV scopes with same regression coef

 Y=\alpha+\beta X+\varepsilon,\;\varepsilon\sim N\left(0,\sigma^{2}\right)

With  \alpha,\beta, \sigma known in the linear relationship, can the correlation in the scatter plot of Y against X be estimated from the linear formula?

You may recall in Hierarchical Linear Model class, the scopes of the W dramatically impact the regression coefficients of F~W in the following R demo (hlm.jpg). While this time the regression coefficient has been fixed to a known  \beta. So the scopes of X would never impact the regression coefficient. However, it proved that the correlation r could range from zero to unit (or -1) according to the variance of X in the final close form  r=\frac{\beta\mbox{Var}\left(X\right)}{\mbox{Std}\left(Y\right)\mbox{Std}\left(X\right)}=\beta\frac{\mbox{Std}\left(X\right)}{\sqrt{\beta^{2}\mbox{Var}\left(X\right)+\sigma^{2}}}.

Let me quote as the final words from Cohen (1994; p.1001; Where the role of IV is replaced by that of DV within typical contexts like ANOVA) –

… standardized effect size measures, such as d and f, developed in power analysis (Cohen, 1988) are, like correlations, also dependent on population variability of the dependent variable and are properly used only when that fact is kept in mind.

Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.

Compare to the following case: different corr(s) of different IV scopes with hierarchical regression coefficients –

### -----
m<-200; ## NO of Inc.
MeanW<-runif(m,max=10000,min=2000);
MeanF<-(MeanW-mean(MeanW))/sd(MeanW) *sqrt(0.8)+sqrt(0.2)*rnorm(m);
Beta<-(MeanW-mean(MeanW))/sd(MeanW)*sqrt(0.8)+sqrt(0.2)*rnorm(m);
Beta<- (-Beta)*2/(max(Beta)-min(Beta)); ## Beta ranges from 1 to 0
J<-c(1:m);
##
## to view Inc. level data
## plot(data.frame(MeanW,MeanF,Beta));
##
n<-6; ## Each Inc. includes n samples.
s<-c(rep(1,n));
MeanW<-c(t(MeanW%*%t(s)));
MeanF<-c(t(MeanF%*%t(s)));
Beta<-c(t(Beta%*%t(s)));
J<-c(t(J%*%t(s)));
###
CW<-rnorm(m*n,sd=100);
CF<-CW*Beta/(sd(MeanW)*sd(MeanF))+rnorm(m*n,sd=0.02);
W<-MeanW+CW;
F<-MeanF+CF;
###
df<-data.frame(W,F,CW,CF,MeanW,MeanF,J);
##
## to valify it in SAS Proc Mixed or SPSS Mixed Model
## write.csv(df,choose.files());
##
plot(W,F,col=sample(colors())[J]); ##
##
library(nlme);
df.int <- groupedData( F ~ CW | J, data=df) ;
## HLM
## F_{i,j}=\beta_{0,j}+\beta_{1,j}*CW + r_{i,j}
## \beta_{0,j}=\gamma_{0,0} + \gamma_{0,1}*MeanW + u_{0,j}
## \beta_{1,j}=\gamma_{1,0} + \gamma_{1,1}*MeanW + u_{1,j}
## Mixed Model
## F_{i,j}=\gamma_{0,0} + \gamma_{0,1}*MeanW + u_{0,j} +(\gamma_{1,0} + \gamma_{1,1}*MeanW + u_{1,j} ) *CW + r_{i,j}
##
lm_1 <- lme(F ~ MeanW+CW+MeanW:CW, data=df.int, random = ~1 + CW | J) ;
summary(lm_1);
intervals(lm_1);
anova(lm_1);
## plot(lm_1); ## view the residual plot
###################

“Effect Size” — same data, different interpretations


d<-32; ## Try d<-20 !
## to reduce the death rate by d%, From (50+d/2)% to (50-d/2)%
y<-c(rep("Live",50+d/2),rep("Death",50-d/2));
y<-c(y,rep("Live",50-d/2),rep("Death",50+d/2));
y<-(y=="Live"); ## TRUE vs FALSE
x<-c(rep("Treatment",100),rep("Control",100));
x<-(x=="Treatment");
## correlation^2
cor(x,y,method="pearson")^2
cor(x,y,method="spearman")^2
cor(x,y,method="kendall")^2
## R^2 of linear regression with norminal IV
## should we use logistic regression?
## However, R^2 is not available in GLM.
## summary(lm(y~x))
## names(summary(lm(y~x)))
summary(lm(y~x))$r.squared
## anova
## anova(lm(y~x))
## names(anova(lm(y~x)))
s<-anova(lm(y~x))$"Sum Sq";s[1]/sum(s)

Just a short R-script note to embody the 3-page-paper of Rosenthal & Rubin (1982).

Table 1. (p. 167) listed a simple set-up. There was a between-subject treatment. Control group includes 34 alive cases and 66 dead cases. Treatment group includes 66 alive cases and 34 dead cases. The question is what is the percentage of the variance explained by the nominal IV indicating the group?

The authors pointed out that one may interpret the data result as death rate was reduced by 32% while the other may interpret the same as 10.24% variance was explained. Let’s demo it more dramatically to imagine just 4% explained variance would reduce death rate by 20%.

Rosenthal, R. & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166-169.

Anscombe’s 4 Regressions — A Trivially Updated Demo

##----------
## This is a trivially updated version based on the R document "?anscombe".
require(stats); require(graphics)
anscombe

##-- now some "magic" to do the 4 regressions in a loop:##< -
ff = y ~ x
for(i in 1:4) {
ff[2:3] = lapply(paste(c("y","x"), i, sep=""), as.name)
assign(paste("lm.",i,sep=""), lmi <- lm(ff, data= anscombe))
}

## See how close they are (numerically!)
sapply(objects(pattern="lm\\.[1-4]$"), function(n) coef(get(n)))
lapply(objects(pattern="lm\\.[1-4]$"),
function(n) coef(summary(get(n))))

## Now, do what you should have done in the first place: PLOTS
op <- par(mfrow=c(4,3),mar=.1+c(4,4,1,1), oma= c(0,0,2,0))
for(i in 1:4) {
ff[2:3] <- lapply(paste(c("y","x"), i, sep=""), as.name)
plot(ff, data =anscombe, col="red", pch=21, bg = "orange", cex = 1.2,
xlim=c(3,19), ylim=c(3,13))
abline(get(paste("lm.",i,sep="")), col="blue")
plot(lm(ff, data =anscombe),which=1,col="red", pch=21, bg = "orange", cex = 1.2
,sub.caption="",caption="" )
plot(lm(ff, data =anscombe),which=2,col="red", pch=21, bg = "orange", cex = 1.2
,sub.caption="",caption="" )
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex=1.5)
par(op)

##

## Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21.

自由度的几何:对截距项投影残差向量的长度平方

这是《相关系数的几何:对截距投影的残差向量之间交角余弦》示意图,恰好可以用于解释为什么 \sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}满足的 \chi^2分布dfn-1而不是n

其中 X_{i}\equiv\mu+\varepsilon_{i} \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]n维空间中的标准正态随机向量。那么,容易知道有 \sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}=\sum_{i=1}^{n}\left(\varepsilon{}_{i}-\bar{\varepsilon}\right)^{2}。这个表达式就是向量 \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]-\left[\begin{array}{c}\bar{\varepsilon}\\\bar{\varepsilon}\\\vdots\\\bar{\varepsilon}\end{array}\right]长度的平方。我们已经知道, \left[\begin{array}{c}\bar{\varepsilon}\\\bar{\varepsilon}\\\vdots\\\bar{\varepsilon}\end{array}\right]就是 \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]在截距向量(日晷指针) \left[\begin{array}{c}1\\1\\\vdots\\1\end{array}\right]上的投影。自然, \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]-\left[\begin{array}{c}\bar{\varepsilon}\\\bar{\varepsilon}\\\vdots\\\bar{\varepsilon}\end{array}\right]就是对截距项投影残差向量,也就是在日晷盘上的投影。

日晷所处空间的n是3。如果我们对 \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\varepsilon_{3}\end{array}\right]抽样许多次,就会看到三维空间中各个方向对称的标准正态分布散点图。这些散点图在日晷盘上的投影就是二维空间标准正态分布散点图。日晷盘中这些点对应向量的长度平方自然是 \chi^2_{df=2}的抽样。