Semiparametric Bayes Multiple Imputation for Regression Models with Missing Mixed Continuous-Discrete Covariates


Issues regarding missing data are critical in observational and experimental research, as they induce loss of information and biased result. Recently, for datasets with mixed continuous and discrete variables in various study areas, multiple imputation by chained equation (MICE) has been more widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of multiple imputation by MICE; they must satisfy strong conditions (known as compatibility) to guarantee that obtained estimators are consistent. Our exhaustive simulation studies show thatthe coverage probability of 95 % interval calculated using MICE can be less than 1 %, while the MSE of the proposed one can be less than one-fiftieth. We also applied our method to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous research works that used panel data other than ADNI database, whereas the existing methods such as MICE, resulted in entirely inconsistent results.


Full conditional specification, Missing data, Multiple imputation, Probit stickbreaking process mixture, Semiparametric Bayes model


Research Institute for Economics and Business Administration
Kobe University
Rokkodai-cho, Nada-ku, Kobe
657-8501 Japan
Phone: +81-78-803-7036
FAX: +81-78-803-7059

Takahiro HOSHINO
Department of Economics, Keio University
RIKEN Center for Advanced Intelligence Project