D. Martínez-Galicia, A. Guerra-Hernández, X. Limón, N. Cruz-Ramírez, and F. Grimaldo. New Perspectives on Hybrid Intelligent System Design based on Fuzzy Logic, Neural Networks and Metaheuristics, volume 1050 of Studies in Computational Intelligence, chapter Extension of Windowing as a Learning Technique in Artificial Noisy Domains, pages 443–457. Springer Nature Switzerland AG, Cham, Switzerland, October 2022.
Abstract. Windowing is a guided sub-sampling method conceived to reduce the number of training examples when inducing decision trees, while preserving accuracy. However, its use has been discouraged when dealing with noisy datasets because of a lack of observed benefits. This work proposes modeling such domains probabilistically to study the performance of Windowing under controlled conditions, i.e., on artificial discrete datasets with binary class affected by different levels of noise. An extension of Windowing that removes all inconsistency in the datasets as a preprocessing step, Windowing Inconsistency Filtering (WIF), is proposed to cope with noise in class values. J48, as implemented in Weka, is adopted as a tree-inductive algorithm. WIF, Windowing, and J48 are then compared on a repeated 10-folds cross-validation process over noisy and clean test sets. Results show that the three algorithms obtain similar predictive performance, as expected, but WIF keeps reducing the number of used training examples despite the presence of noise. Future work includes considering different inductive algorithms, e.g., Naïve Bayes, to establish if the adopted extension generalizes beyond decision trees.