Yihui asked this question yesterday. My supervisor Dr. Hau also criticized routine grouping discretization. I encountered two plausible reasons in 2007 classes, one negative, the other at least conditionally positive.
The first is a variant of the old Golden Hammer law -- if the only tool is ANOVA, every continuous predictor need discretization. The second reason is empirical -- ANOVA with discretization steals df(s). Let's demo it with a diagram.
The red are the population points, and the black are samples. Which predicts the population better--the green continuous line, or the discretized blue dashes? R simulation code is given.

{ 2 } Comments
The discretization here is essentially a kind of local smoothing techniques using a constant kernel function. Generally speaking, local modeling can effectively improve fitness (lower error sum of squares) but we have to carefully avoid overfitting. If you discretize x into more intervals, the fitting will be even better.
Residuals and errors are different. The more intervals, squared-residuals decrease while squared-errors increase. So the black points, or discretization with max intervals, predict red population the worst.
Discretization fades micro information (most errors) while highlights macro information (usually non-linear). When LOESS is popular enough, discretization will be abandoned. Practitioners really need local smoothing to preview their concerned macro models.
{ 2 } Trackbacks
离散化:毁灭信息的有效手段...
如果你想掩盖数据,那么就把它们离散化吧!不知道为什么这么多人钟爱于将连续数据离散化,例如明明有年龄数据,在分析的时候非要分成老幼青壮这样的分类变量;明明有原始的计数数据...
[...] 李晓煦老师的博客:非常专业,为数不多的会用LaTeX写上数学公式的博客,李老师对统计理论细节研究很认真,很有国外统计研究者的风范;博文如Why practitioners discretize their continuous data讲述了为什么大家喜欢将连续型数据离散化的原因之一。 [...]
Post a Comment