### Variational Bayes In Private Settings

Mijung Park

Amsterdam Machine Learning Lab

SWS Colloquium

03 Mar 2017, 10:00 am - 11:00 am

Kaiserslautern building G26, room 111

simultaneous videocast to SaarbrÃ¼cken building E1 5, room 029

simultaneous videocast to SaarbrÃ¼cken building E1 5, room 029

Bayesian methods are frequently used for analysing privacy-sensitive datasets
including medical records
emails
and educational data
and there is a growing
need for practical Bayesian inference algorithms that protect the privacy of
individuals' data. To this end
we provide a general framework for
privacy-preserving variational Bayes (VB) for a large class of probabilistic
models
called the conjugate exponential (CE) family. Our primary observation
is that when models are in the CE family
we can privatise the variational
posterior distributions simply by perturbing the expected sufficient
statistics of the complete-data likelihood. For widely used non-CE models
with binomial likelihoods (e.g.
logistic regression)
we exploit the
Polya-Gamma data augmentation scheme to bring such models into the CE
family
such that inferences in the modified model resemble the original
(private) variational Bayes algorithm as closely as possible. The
iterative nature of variational Bayes presents a further challenge for
privacy preservation
as each iteration increases the amount of noise
needed. We overcome this challenge by combining: (1) a relaxed notion of
differential privacy
called concentrated differential privacy
which
provides a tight bound on the privacy cost of multiple VB iterations and
thus significantly decreases the amount of additive noise; and (2) the
privacy amplification effect of subsampling mini-batches from large-scale
data in stochastic learning. We empirically demonstrate the effectiveness
of our method in CE and non-CE models including latent Dirichlet
allocation (LDA)
Bayesian logistic regression
and Sigmoid Belief
Networks (SBNs)
evaluated on real-world datasets.
>