Feature Significance Analysis of the US Adult Income Dataset
Abstract
In this paper, we analyze the classic US Adult Income Dataset using logistics regression and random forest to analyze potential factors that contribute to income bias for the 50Kincome bracket(income ≥ 50K per year). Using the two methods, we train the dataset and obtain stable models overcross validation. We also found that the two methods, although both showing good accuracy, exhibit conflicting interpretation about what factors have the most influence on the US adult income.
Subject
machine learning
random forest
big data
logistics regression
neural network
feature engineering
Permanent Link
http://digital.library.wisc.edu/1793/82299Citation
TR1869