Loading...
Thumbnail Image
Item

Essays on Consistency and Randomization in Machine Learning and Fraud Detection

Revelas,Christos
Abstract
This dissertation addresses statistical questions in the fields of machine learning and insurance fraud detection. Random forests are a popular method used in many fields to make predictions. They are averages of decision trees with two layers of randomization: data sampling and split randomization. The first two chapters separately study the effectiveness of each randomization. In particular, the first chapter establishes pointwise consistency of trees and illustrates that growing large trees in ensembles, often the default choice in practice, is not always a good idea. The second chapter looks at the effectiveness of split randomization for different data characteristics. While prior literature has focused on the amount of noise in the data, this chapter offers a novel perspective on forest performance by showing that randomization is effective in the presence of irrelevant and correlated covariates, possibly opening the way for a better understanding of why random forests work well in many applications. The third chapter, separate from the first two, studies how insurance companies can choose which claims to investigate for fraud. Originated from a collaboration with Achmea, this chapter formalizes selection mechanisms, illustrates that selecting based on prior beliefs can lead to inconsistent learning of fraud characteristics and proposes a randomized strategy conjectured to be consistent.
Description
CentER Dissertation Series Volume: 778
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
CentER
Research Projects
Organizational Units
Journal Issue
Keywords
Citation
Revelas, C 2025, 'Essays on Consistency and Randomization in Machine Learning and Fraud Detection', Doctor of Philosophy, Tilburg University, Tilburg. https://doi.org/10.26116/tisem.16174818
License
info:eu-repo/semantics/openAccess
Embedded videos