Skip to content

可改进翻译的统计术语歧义

a sample: 一个样本(a case?) > 一组抽样
two sample t test: 两样本t检验(two cases ...?) > 两组样本t检验
sample size: 样本大小(case scale?) > 样本组容量
central tendency: 集中趋势(non-dispersion?) > 中心趋势
dispersion: 离散(discrete?)趋势 > 分散趋势
SD=standard deviation: 标准方差(..variance?) > 标准离差
df=degrees of freedom: 自由度(size of freedom?) > 自由维度(dimensions of freedom)

转载:公益广告一枚(已更新)


转载声明:
1. 授权72pines.com的管理员在判断有必要的情况下将本文转为private.

2. 我还没有试用文中推荐的技术,转载不表示对该技术的适用性背书[update]试用体验甚好,速度优于我目前所用的中文大学vpn,自动切换效果尤佳。
SSH-D服务已终止,详见提供方的说明,下文过期部分已更改文本为灰色。下文推荐的替代品是Puff圣诞版,支持IE、更支持Firefox+Autoproxy。

3.题图T恤亦为转载广告,链接指向淘宝卖家,是否公益待考。图为男款,亦有女款。卖家声称断货,疑为淘宝和谐。有意购买宜直接询问卖家。

 






 

转自:http://docs.google.com/View?id=ddtsdhbg_1frwvsrgg

翻墙组合Firefox+Autoproxy+Puff的使用

 

一、准备工作

1、  下载最新版的Firefox浏览器,猛击此处

2、  Autoproxy可在安装完Firefox后在组件中心找到;

3、  下载最新版的Puff(当前最新版为圣诞版),猛击此处

二、开始安装

   1Firefox的安装:双击下载好的程序,出现安装页面,直接下一步,如图

到此步单击“安装”后程序开始安装,请在安装完成后运行firefox浏览器。

2、安装autoproxy

   运行firefox浏览器后,在工具栏选择“工具”—“附加组件”,如下图

 

在出现的附加组件对话框的搜索栏输入“autoproxy”并单击搜索栏右侧放大镜图标

然后单击上图的“查看全部结果”。

回到firefox浏览器查看自动打开的搜索结果页面,如下图

先选中“让我安装这个试验性附加组件”,然后单击“添加到Firefox”。

安装结束后请单击下图中的“重新启动Firefox”按钮。

Firefox重新启动后,浏览器将自动弹出两个页面:Autoproxy的设置页面及附加组件的设置页面,如下图

选中上图的“gfwList”并确定。附加组件的设置如下图

至此,Autoproxy安装配置完成。

4、  安装Puff,一直Next,期间可选择在桌面建立图标及安装完成后运行程序,如图

运行Puff后点击“connect”,如下图

 

到此步,Firefox已经能通过Puff代理,在Autoproxy的指引下,识别你输入的网址是否需要翻墙了。如果Firefox在这种情况下依然无法打开你需要的网页,那么你可能需要在AutoproxygfwList里添加新的规则,具体操作如下:

全文完

 

 


 

本文将提供一种一劳永逸的翻墙方式(ssh -D),实施之后,那道墙——对你来说——将从此透明。

本文面向的用户:使用Windows作为操作系统并且使用Firefox作为常用浏览器。

第一步:免费获取拥有SSH权限的帐号和密码。

默认的免费获取方式:将本文转载到你自己的博客上,将转载后的文章网址发送到f.ckgfw#gmail.com

注意:转载前请先确认自己是(或曾是)一名blogger(博客),否则将会浪费彼此的时间。

转载方式:拷贝文章代码至博客后台HTML编辑器中,直接发布即可,文章标题自拟,可在前后文插入自己的评论。

经过人工审核,你将收到一封附有五个拥有SSH权限的帐号和密码的电子邮件,你可以将它们赠与你信任的人。

更多获取方式将在今后陆续激活,请关注我们的最新更新:https://friendfeed.com/fuckgfw

第二步:配置MyEnTunnel软件

下载并安装MyEnTunnel,该软件全名为My Encrypted Tunnel。

一键下载:https://dl.getdropbox.com/u/873345/download/myentunnel.exe

myentunnel

按照上图将第一步收到的帐号信息填写到相应的地方后,点击save按钮,再点击hide按钮。

第一次连接过程中会出现一个认证对话框,按照提示确认即可。以后的自动连接中将不再出现此认证对话框。

最后点击hide按钮,使对话框隐藏到系统任务栏中。

提示:

为MyEntunnel创建一个快捷方式,将其复制到系统的【启动】(C:\Documents and Settings\当前用户名(需要修改成你自己的)\「开始」菜单\程序\启动)文件夹中,今后开机便可自动启动软件,并自动连接服务器。

tray

绿色代表连接成功且稳定;黄色代表正在连接或重新连接;红色代表连接失败。

第三步:配置Firefox浏览器

假设你正使用Firefox浏览器阅读本文。

一键安装:http://autoproxy.mozdev.org/latest.xpi

xpi-offical

点击立即安装,安装后,重新启动Firefox。然后你会看到如下对话框,选择gfwlist (P.R.China)后,点击确定。

gfwlist

接着你会看到Firefox主界面右上角出现有一个“福”字图案,点击“福”。

fu

点击“代理服务器——编辑代理服务器”。

edit

随即出现如下画面,你会看到如GAppProxy、Tor和Your Freedom这样一系列代理服务器名称。

before

将GAppProxy一栏的参数修改为如下图所示。

after

修改完毕后,点击确定。至此配置已全部就绪。

获取更多帮助,请关注反馈中心:https://friendfeed.com/fuckgfw-feedback

Bernie:"Eat me!"

第四步:支持fuckGFW

  1. 如果您翻墙成功,请大笑一声并用充满磁性地低音说出:Hello, world!
  2. 如果由于线路原因,始终翻墙不成,不要气馁,给我们发Email,咱们一起解决问题。
  3. 假如哪天突然无法正常连接,请先到反馈中心汇报,我们会及时做出反应。
  4. 目前您有如下几种方式及时获取我们的最新动态:FriendFeed | Twitter | Blog
  5. 保持默契,我们相信您一定可以做到。

版权信息:您可以自由复制、传播、演绎本作品且无需署名、无需注明原始出处。

Why practitioners discretize their continuous data

Yihui asked this question yesterday. My supervisor Dr. Hau also criticized routine grouping discretization. I encountered two plausible reasons in 2007 classes, one negative, the other at least conditionally positive.

The first is a variant of the old Golden Hammer law -- if the only tool is ANOVA, every continuous predictor need discretization. The second reason is empirical -- ANOVA with discretization steals df(s). Let's demo it with a diagram.
The red are the population points, and the black are samples. Which predicts the population better--the green continuous line, or the discretized blue dashes? R simulation code is given.





Tagged , ,

《结构方程模型及其应用》(侯, 温, & 成,2004)部分章节R代码

John FOX教授的sem包试写了这本教材的几个例子,结果都与LISREL8报告的Minimum Fit Function Chi-Square吻合。不过LISREL其它拟合指标用的是Normal Theory Weighted Least Squares Chi-Square ,所以看上去比Minimum Fit Function Chi-Square报告的结果要好那么一些些。LISREL历史上推的GFI/AGFI曾因为经常将差报好被批评,圈内朋友私下嘲笑这样LISREL就更好卖了,买软件的用户也高兴将差报好,只有读Paper的人上当。

目前只写了chap3_1..到Chap3_2_...,一共5(或6)个例子。尺有所短,寸有所长--sem包不会自动报告所有修正指数,不能做样本量不一样多的多组模型,对复杂的模型要写的代码太多。不过,在已经尝试的几个例子里,有一个是LISREL跑不出来但sem包能跑出结果的。目前sem包还没有达到结构方程众多商业软件的成熟水准,但R庞大的义工武器库已经使sem包至少已经在Missing Data Multiple Imputation、Bootstrapping等等应用上胜出一筹。所以例子中还附带了缺失数据一讲Multiple Impuation的R示范代码。

欢迎各位有兴趣作类似尝试的同学将结果email我, 可以陆续更新到下面的GPL版权代码集合中。

代码下载:lixiaoxu.googlepages.com (中大镜像)

[update] AMos, Mplus 与 SAS/STAT CALIS 缺省报告与使用的都是 Minimum Fit Function Chi-Square,可通过各种软件(Albright & Park, 2009)的结果对比查验。Normal Theory Weighted Least Squares Chi-Square并非总是比Minimum Fit Function Chi-Square报告更“好”的拟合结果(虽然常常如此)。Olsson, Foss, 和 Breivik (2004) 用模拟数据对比了二者,确证Minimum Fit Function Chi-Square计算得到的拟合指标在小样本之外的情形都比Normal Theory Weighted Least Squares Chi-Square的指标更适用。

R的sem包目前在迭代初值的计算上还做得不够好。我遇到的几个缺省初值下迭代不收敛案例,将参数设定合理范围的任意初值(比如TX,TY都设成0-1之间的正实数)之后都收敛了。

--

Albright, J. J., & Park., H. M. (2009). Confirmatory factor analysis using Amos, LISREL, Mplus, and SAS/STAT CALIS. The University Information Technology Services Center for Statistical and Mathematical Computing, Indiana University. Retrieved July 7, 2009, from http://www.indiana.edu/~statmath/stat/all/cfa/cfa.pdf.

Olsson, U. H., Foss, T., & Breivik, E. (2004). Two equivalent discrepancy functions for maximum likelihood estimation: Do their test statistics follow a non-central chi-square distribution under model misspecification? Sociological Methods Research, 32(4), 453-500.

[update]R中的sem包妙处之一是可在线实现结构方程应用界面。下面这个最粗陋的例子对应原书Chap3_1_4_CFA_MB.LS8的结果。
--荣耀属于sem package的作者Rweb的作者、以及服务器运算资源的提供者





Tagged ,

Paper for 1st Chinese useR! Conference: Web Powered by R, or R Powered by Web

欢迎在本部的同学明天上午到现场看李崇亮同学演示,地点见会议主页

论文下载(Googlepages, 中文大学镜像)
RWebFriend for Wordpress 在线示例(yo2.cn上的示例,  奇想录上的临时示例)

Google Presentation 在线演示

[update2009.07.11]文中MediaWiki与R集成的实验平台已不再开放编辑权限,原例转到另一个平台上(界面是西班牙语,找不到英语界面的平台)。希望能有帮手提供linux服务器自建一个平台。

Tagged ,

Misunderstanding of Eq. 4 in Singer’s (1998) SAS PROC MIXED paper

Singer (1998, p. 327, Eq. 4) gave a big covariance matrix as the following --

...if we combine the variance components for the two random effects together into a single matrix, we would find a highly structured block diagonal matrix. For example, if there were three students in each class, we would have:

If the number of students per class varied, the size of each of these submatrices would also vary, although they would still have this common structure. The variance in MATHACH for any given student is assumed to be \tau_{00}+\sigma^{2} ...

Quiz:

1. Sampling K repetitions with replacement, Y_{k} denotes MATHACH of the k-th random student within a given class, say, the first class. What is the population variance of the Y series? A: \sigma^{2}, B: \tau_{00}+\sigma^{2}.

2. Sampling 2\times K repetitions with replacement, \left(Y_{1st,k},Y_{2nd,k}\right) denotes MATHACH(s) of the k-th pair of random students within a given class, say, the first class. Sometime the 2nd sampled student is just the 1st one by chance. What is the population covariance of the Y_{1st} and Y_{2nd} series? A: 0, B: \tau_{00}.

3. Sampling K repetitions with replacement, Y_{k} denotes MATHACH of the k-th random student within one randomly sampled class for each repetition. What is the population variance of the Y series? A: \sigma^{2}, B: \tau_{00}+\sigma^{2}.

4. Sampling 2\times K repetitions with replacement, \left(Y_{1st,k},Y_{2nd,k}\right) denotes MATHACH(s) of the k-th pair of random students within one random class sampled for each pair of students. Sometime the 2nd sampled student is just the 1st one by chance. What is the population covariance of the Y_{1st} and Y_{2nd} series? A: 0, B: \tau_{00}.

Correct answers: A A B B.

Your plausible incorrect answers for Quiz 1 and 2 just tell how the matrix caused my misunderstanding when I read the paper the first time. It is difficult to give names for the randomly sampled classes, and the matrix columns and rows.

--
Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics. 24. 323-355.

Tagged

气泡图击败Data Snoop

转自:泡网

Data Snoop, 民科的神奇直线(google 始作俑者):
民科的神奇直线

气泡图,数据击败Data Snoop:
莫说是条直线,就是一朵菊花也能连出来

R tip:

n=20;plot(rnorm(n),rnorm(n),cex=sqrt(abs(rnorm(n)))*10,pch=1,col=1:n);

Tagged ,

“Confidence interval of R-square”, but, which one?

In linear regression, confidence interval (CI) of population DV is narrower than that of predicted DV. With the assumption of generalizability, CI of \tilde{Y}_{\left[1\times1\right]} at x_{\left[1\times p\right]} is

\;\hat{Y}\pm\left(x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p},

while CI of Y\left(x\right)=\tilde{Y}\left(x\right)+\varepsilon is

\;\hat{Y}\pm\left(1+x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p}.

The pivot methods of both are quite similar as following.

\;\frac{\hat{Y}-\tilde{Y}}{s_{\hat{Y}}}\sim t_{df=N-p} ,

so \tilde{Y}_{critical}=\hat{Y}-s_{\hat{Y}}\times t_{critical} .

\;\frac{\hat{Y}-Y}{s_{\left(\hat{Y}-Y\right)}}\sim t_{df=N-p} ,

so Y_{critical}=\hat{Y}-s_{\left(\hat{Y}-Y\right)}\times t_{critical}=\hat{Y}-s_{\left(\hat{Y}-\tilde{Y}-\varepsilon\right)}\times t_{critical}

R^{2} of linear regression is the point estimate of

\;\eta^{2}\equiv\frac{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)}{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)+N\sigma^{2}}

for fixed IV(s) model. Or, it is the point estimate of \rho^{2} wherein \rho denotes the correlation of Y and X\beta, the linear composition of random IV(s) . The CI of \rho^{2} is wider than that of \eta^{2} with the same R^{2} and confidence level.

[update] It is obvious that CI of \rho^{2} relies on the distribution presumption of IV(s) and DV, as fixed IV(s) are just special cases of generally random IV(s). Usually, the presumption is that all IV(s) and DV are from multivariate normal distribution.

In the bivariate normal case with a single random IV, through Fisher's z-transform of Pearson's r, CI of the re-sampled R^{\prime2}=r^{\prime2} can also be constructed. Intuitively, it should be wider than CI of \rho^{2} .

\;\tanh^-\left(r\right)\equiv\frac{1}{2}\log\frac{1+r}{1-r}\;{appr\atop \sim}\; N\left(\tanh^-\left(\rho\right),\frac{1}{N-3}\right)

Thus,

\;\tanh^-\left(r^{\prime}\right)-\tanh^-\left(r\right){appr\atop \sim}N\left(0,\frac{2}{N-3}\right)

CI of \tanh^-\left(r^{\prime}\right) can be constructed as \tanh^-\left(r\right)\pm\sqrt{\frac{2}{N-3}}z_{\frac{\alpha}{2}} . With the reverse transform \tanh\left(.\right), the CI bounds of R^{\prime2} are

\;\left(\max\left(0,\tanh\left(\tanh^-\left(R\right)-\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)\right)^{2}

and

\;\left(\tanh\left(\tanh^{-1}\left(R\right)+\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)^{2}.

In multiple p IV(s) case, Fisher's z-transform is

\;\left(N-2-p\right)\left(\tanh^-\left(R\right)\right)^{2}\;{appr\atop \sim}\;\chi_{df=p,ncp=\left(N-2-p\right)\left(\tanh^-\left(\rho\right)\right)^{2}}^{2} .

Although it could also be used to construct CI of \rho^{2} , it is inferior to noncentral F approximation of R (Lee, 1971). The latter is the algorithm adopted by MSDOS software R2 (Steiger & Fouladi, 1992) and R-function ci.R2(...) within package MBESS (Kelley, 2008).

In literature, "CI(s) of R-square" are hardly the literal CI(s) of R^{2} in replication once more. Most of them actually refer to CI of \rho^{2} . Authors in social science unfamiliar to L^AT_EX hate to type \rho when they feel convenient to type r or R. Users of experimentally designed fixed IV(s) should have reported CI of \eta^{2} . However, if they were too familiar to Steiger's software R2 to ignore his series papers on CI of effect size, it would be significant chance for them to report a loose CI of \rho^{2}, even in a looser name "CI of R^{2}".

----

Lee, Y. S. (1971). Some results on the sampling distribution of the multiple correlation coefficient. Journal of the Royal Statistical Society, B, 33, 117–130.

Kelley, K. (2008). MBESS: Methods for the Behavioral, Educational, and Social Sciences. R package version 1.0.1. [Computer software]. Available from http://www.indiana.edu/~kenkel

Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior research methods, instruments and computers, 4, 581–582.

R Code of Part I:





R Code of Part II:





Tagged ,

Wordpress (and WPMU) Plugin for R Web Interface

Download: RwebFriend.zip [Update] Including Chinese UTF8 Version
Plugin Name: RwebFriend
Plugin URL: http://lixiaoxu.lxxm.com/RwebFriend
Description: Set Rweb url options and transform [rcode]...[/rcode] or <rcode>...</rcode> tag-pair into TEXTAREA which supports direct submit to web interface of R. *Credit notes:codes of two relevant plugins are studied and imported. One of the plugins deals with auto html tags within TEXTAREA tag-pair, the other stops Wordpress to auto-transform quotation marks.
Version: 1.0
Author: Xiaoxu LI
Author URI: http://lixiaoxu.lxxm.com/
Setup:
Usage:

[update] The free Chinese wordpress platform yo2.cn has installed this plugin.  See my demo.

More online demos -- http://wiki.qixianglu.cn/rwebfriendttest/

[update, June 2009] 72pines.com (here!) installed this plugin. Try





[update2009JUL18]Test installed packages of Rweb:
http://www.mzandee.net/cgi-bin/Rweb/Rweb.cgi
http://bayes.math.montana.edu/cgi-bin/Rweb/Rweb.cgi
http://pbil.univ-lyon1.fr/cgi-bin/Rweb/Rweb.cgi

Tagged ,

Type III ANOVA in R

Type III ANOVA SS for factor A within interaction of factor B is defined as SS_{A:B+A+B}-SS_{A:B+B}, wherein A:B  is the pure interaction effect orthogonal to main effects of A, B, and intercept. There are some details in R to get pure interaction dummy IV(s).

Data is from SAS example PROC GLM, Example 30.3: Unbalanced ANOVA for Two-Way Design with Interaction

##
##Data from http://www.otago.ac.nz/sas/stat/chap30/sect52.htm
##
drug <- as.factor(c(t(t(rep(1,3)))%*%t(1:4))); ##Factor A
disease <- as.factor(c(t(t(1:3)) %*% t(rep(1,4))));##Factor B
y <- t(matrix(c(
42 ,44 ,36 ,13 ,19 ,22
,33 ,NA ,26 ,NA ,33 ,21
,31 ,-3 ,NA ,25 ,25 ,24
,28 ,NA ,23 ,34 ,42 ,13
,NA ,34 ,33 ,31 ,NA ,36
,3 ,26 ,28 ,32 ,4 ,16
,NA ,NA ,1 ,29 ,NA ,19
,NA ,11 ,9 ,7 ,1 ,-6
,21 ,1 ,NA ,9 ,3 ,NA
,24 ,NA ,9 ,22 ,-2 ,15
,27 ,12 ,12 ,-5 ,16 ,15
,22 ,7 ,25 ,5 ,12 ,NA
),nrow=6));
## verify data with http://www.otago.ac.nz/sas/stat/chap30/sect52.htm
(cbind(drug,disease,y));
##
## make a big table
y <- c(y);
drug <- rep(drug,6);
disease <- rep(disease,6);
##
## Design the PURE interaction dummy variables
m <- model.matrix(lm(rep(0,length(disease)) ~ disease + drug +disease:drug));
##! If lm(y~ ...) is used, the is.na(y) rows will be dropped. The residuals will be orthogonal to observed A, & B rather than designed cell A & B. It will be Type II SS rather than Type III SS.
c <- attr(m,"assign")==3;
(IV_Interaction <-residuals( lm(m[,c] ~ m[,!c])));
##
## verify data through type I & II ANOVA to http://www.otago.ac.nz/sas/stat/chap30/sect52.htm
## Type I ANOVA of A, defined by SS_A --
anova(lm(y~drug*disease));
##
## Type II ANOVA of A, defined by SS_{A+B}-SS_B --
require(car);
Anova(lm(y~drug*disease),type='II');
anova(lm(y~disease),lm(y~drug + disease))
##
##
## Type III ANOVA of A defined by SS_{A:B+A+B}-SS_{A:B+B}
t(t(c( anova(lm(y~IV_Interaction+disease),lm(y~disease * drug))$'Sum of Sq'[2]
,anova(lm(y~IV_Interaction+drug),lm(y~disease*drug))$'Sum of Sq'[2]
,anova(lm(y~disease+drug),lm(y~disease*drug))$'Sum of Sq'[2])))
##
##

Currently, Anova(...) of Prof John Fox's car package (V. 1.2-8 or 1.2-9) used "impure" interaction dummy IV(s), which made its type III result relying upon the order of factor levels. I think in its next version, the "pure" interaction dummy IV(s) will be adopted to give consistent type III SS.

[update:]

In Prof John FOX's car package, with parameter contrasts in inputted lm object, Example(Anova) gave type III SS consistent to  other softwares. In this case, the code line should be --

Anova(lm(y~drug*disease, contrasts=list(drug=contr.sum, disease=contr.sum)),type='III');

Contrasts patterns are defined within lm(...) rather than Anova(...). An lm object with default contrasts parameter is inappropriate to calculate type III SS, or the result will rely on the level names in any nominal factor --

require(car);
M2<-Moore;
M2$f1<-M2$fcategory;
M2$f2<-as.factor(- as.integer(M2$fcategory));
mod1<-lm(formula = conformity ~ f1 * partner.status,data=M2);
mod2<-lm(formula = conformity ~ f2 * partner.status,data=M2);
c(Anova(mod1,type='III')$'Sum Sq'[3],Anova(mod2,type='III')$'Sum Sq'[3])

There was hot discussion of type III ANOVA on R-help newsgroup. Thomas Lumley thought Types of SS nowadays don't have to make any real sense --

http://tolstoy.newcastle.edu.au/R/help/05/04/3009.html

This is one of many examples of an attempt to provide a mathematical answer to something that isn't a mathematical question.

As people have already pointed out, in any practical testing situation you have two models you want to compare. If you are working in an interactive statistical environment, or even in a modern batch-mode system, you can fit the two models and compare them. If you want to compare two other models, you can fit them and compare them.

However, in the Bad Old Days this was inconvenient (or so I'm told). If you had half a dozen tests, and one of the models was the same in each test, it was a substantial saving of time and effort to fit this model just once.

This led to a system where you specify a model and a set of tests: eg I'm going to fit y~a+b+c+d and I want to test (some of) y~a vs y~a+b, y~a+b vs y~a+b+c and so on. Or, I want to test (some of) y~a+b+c vs y~a+b+c+d, y~a+b+d vs y~a+b+c+d and so on. This gives the "Types" of sums of squares, which are ways of specifying sets of tests. You could pick the "Type" so that the total number of linear models you had to fit was minimized. As these are merely a computational optimization, they don't have to make any real sense. Unfortunately, as with many optimizations, they have gained a life of their own.

The "Type III" sums of squares are the same regardless of order, but this is a bad property, not a good one. The question you are asking when you test "for" a term X really does depend on what other terms are in the model, so order really does matter. However, since you can do anything just by specifying two models and comparing them, you don't actually need to worry about any of this.

-thomas

Tagged , , ,