Explore Optimal Degree of Parallelism for Distributed XGBoost Training
Abstract
The XGBoost has been an extremely popular and effective machine learning method which gained its fame throughwinning multiple Kaggle competitions. One of its strengths lies in parallel processing which makes the computationsalable and faster than its counterparts. On the other hand, there are system configurations and model tuning param-eters which need to be adjusted in order to achieve its full potential cost-effectively. In this paper, we explore how thetraining duration changes under different workloads, system configurations and framework parameters. By running mul-tiple of these experiments, practical insights can be learned and applied for the future applications of XGBoost methods.
Subject
XGBoost
Distributed Machine Learning
Big DataSystem
Permanent Link
http://digital.library.wisc.edu/1793/82297Citation
TR1867