Abstract:
We consider the setting where multiple parties with different variables and units seek to combine their data to fit regressions but are not willing or not allowed to share their data values. We present a general strategy to tackle such problems by treating them as missing data problems, and we estimate regression coefficients using secure EM algorithms. We present secure EM algorithms for linear and log-linear regressions, based on the multivariate normal and multinomial distributions. The parties compute and share the sufficient statistics required for the EM algorithms via secure matrix product protocols, which avoid sharing individual data values.
Keywords:
Confidentiality, Data Integration, Disclosure, EM algorithm, Regression
