# ChiMerge

The algorithm uses $\chi^2$ï»¿ statistic to discretize continuous attributes such as numeric attributes, so it performs discretization automatically.

The author invents a better algorithm than user interaction and poorly chosen intervals using domain understanding and other discretization algorithms such as equal-width-intervals, equal-frequency-intervals, C4, CART, and PVM.

However, ChiMerge proposes a concise summarization of a numeric attribute that is an interval, and its high-quality measures are intra-interval uniformity and inter-interval difference. ChiMerge operationalizes the notion of quality with $\chi^2$ï»¿ statistic, where $\chi^2$ï»¿ is a measure that tests if two discrete attributes are statistically independent.

An outline is present below.

```
foldr (\x y ->
if x and y has the lowest chi value
then merge x y
else x y
)
(map intervals with chi value)
-- equal-width-intervals or equal-frequency-intervals
repeat until chi square exceeds thresehold
```

The $\chi^2$ï»¿ value is

where $m=2$ï»¿ intervals, $k:$ï»¿number of classes, $A_{ij}$ï»¿: number of examples in $i_{th}$ï»¿ interval and $j_{th}$ï»¿ class, $R_i$ï»¿ number of examples in $i_{th}$ï»¿ interval, $C_j$ï»¿ number of examples in $j_{th}$ï»¿, N total number of examples, $E_{ij}$ï»¿ expected frequency.

And, you determine $\chi^2$ï»¿-threshold by selecting a desired significance level.

## References

Randy Kerber. 1992. ChiMerge: discretization of numeric attributes. In Proceedings of the tenth national conference on Artificial intelligence (AAAI'92). AAAI Press, 123â€“128.