Confidential Accept-Reject
Bayescomp 2025

Louis JM Aslett, Joshua Bon, Murray Pollock, Shengang Hu, Hongsheng Dai, Gareth O. Roberts

Slides: www.louisaslett.com/250620

Outline

\gdef\data{\mathbf{X}\@ifnextchar[{\datai}{}} \gdef\datai[#1]{^{(#1)}} \gdef\samp{\theta\@ifnextchar[{\sampi}{}} \gdef\sampi[#1]{_{#1}} \gdef\sampp{\theta^\star} \gdef\samps{\boldsymbol{\theta}} \gdef\Pr{\mathbb{P}} \gdef\given{\mid}

Motivation

\quad\to\ Real-world application

Background

\quad\to\ f-differential privacy

\quad\to\ Homomorphic secret sharing

Confidential Accept-Reject

Privacy Amplification

Software Implementation

Motivation

Motivation

Data often distributed between different parties

\quad\to\ observation wise (most common)

\quad\to\ variable wise (a little rarer)

Parties not willing/able to share data

\quad\to\ trust

\quad\to\ governance issues


Goal: jointly fit models to all available data

FAIRVASC Consortium (I)

  • ANCA associated vasculitis (AAV) diseases cause long-term damage to kidneys, can lead to renal failure

  • Aetiology largely unknown, annual rate \approx 2 \times 10^{-5}

  • European Vasculitis Society (EUVAS) and RITA European Reference Network project FAIRVASC
    • bring together 7 European clinical data registries
    • harmonise clinical terms and integrate into common ontology
    • uplift data into secure data SPARQL registry
    • design federated queries to run over all registries

FAIRVASC Consortium (II)

FAIRVASC Consortium (III)

  • Once data harmonised, federated SPARQL queries tested
    • patient totals
    • simple summary counts (treatment/stage/etc)

But, data governance nightmare …

  • 7 registry groups + 7 national legal systems + pre-approve query by query

  • Require privacy options to facilitate simpler governance approvals and enable more complex modelling.

Objective

  • Data summaries not enough

  • Would like to jointly fit models to data from all registries

  • Aim to perform Bayesian inference to:

    • Fit a common model
    • Observations distributed among participants
    • Honest but curious participants
    • Differential privacy of outputs

Problem setting

Background

Homomorphic Secret Sharing, x_1+x_2+x_3

Homomorphic Secret Sharing, x_1+x_2+x_3

Homomorphic Secret Sharing, x_1+x_2+x_3

Homomorphic Secret Sharing: simplest approach

Homomorphic Secret Sharing

  • “Real” schemes based on polynomial rings, also allow multiplication.

  • Even with up to \frac{1}{3} corrupted parties, information theoretically secure.

  • Rounds of addition require no communication; every multiplication requires communication.

We will write: x \oplus y \qquad x \otimes y for operations computed homomorphically.

Differential Privacy and f-DP

Definition: (\varepsilon, \delta)-differential privacy (Dwork & Roth, 2014)

A randomised algorithm \mathcal{A} : \mathcal{X} \to \Omega satisfies (\varepsilon, \delta)-differential privacy if \Pr(\mathcal{A}(\data) \in \mathcal{S}) \le e^\varepsilon \Pr(\mathcal{A}(\data') \in \mathcal{S}) + \delta for all measurable \mathcal{S} \subseteq \Omega, where \data, \data' \in \mathcal{X} are neighbouring data sets.

Hypothesis testing view: observe \xi. Do membership inference attack (MIA) to distinguish: H_0: \xi \sim \mathcal{A}(\data) \quad\text{vs}\quad H_1: \xi \sim \mathcal{A}(\data') Rejection with probability \phi: \Xi \to [0,1], giving Type-I/II errors \alpha_\phi = \mathbb{E}_{\mathcal{A}(\data)}[\phi] \qquad \beta_\phi = 1 - \mathbb{E}_{\mathcal{A}(\data')}[\phi]

Trade-off function (Dong et al., 2022)

A trade-off function, T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big) : [0,1] \to [0,1], returns the Type-II error of the most powerful MIA for a given Type-I error \alpha \in [0,1], T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big)(\alpha) \coloneqq \inf_{\phi} \big\{ \beta_\phi : \alpha_\phi \le \alpha \big\} f : [0,1] \to [0,1] is a trade-off function \iff f is convex, continuous, non-increasing and f(x) \le 1 - x \ \ \forall\,x \in [0,1].

Definition: f-differential privacy (Dong et al., 2022)

Let f be a trade-off function. A randomised algorithm \mathcal{A} is f-differentially private if T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big) \ge f \quad \forall\,\data, \data' \ st\ \|\data-\data'\|_1 \le 1

f-Differential Privacy

Confidential Accept-Reject

Accept-Reject in Monte Carlo (I)

\data \coloneqq \big( \data[1], \data[2], \dots, \data[m] \big)

Often core of algorithm is accept/reject decision.

r(\sampp \mid \samp_1, \dots, \samp_n, \data)

Acceptance Probability
Rejection Sampling r(\sampp \given \data)
MCMC r(\sampp \given \samp_n, \data)
Adaptive MCMC r(\sampp \given \samp_1, \dots, \samp_n, \data)

Key assumption:

\log r(\sampp \given \samps, \data) = \sum_{j=1}^m \log r(\sampp \given \samps, \data[j])

\log r(\sampp \given \samps, \data) = \bigoplus_{j=1}^m \log r(\sampp \given \samps, \data[j])

  Not enough alone — deterministic

Accept-Reject in Monte Carlo (II)

For U \sim \text{U}(0,1), \begin{align*} r(\sampp \given \samps, \data) &\ge U \\ \iff \left[\bigoplus_{j=1}^m \log r(\sampp \given \samps, \data[j])\right] \oplus \left(-\log U\right) &\ge 0 \\ \end{align*}

-\log U \sim \mathrm{Exp}(1), infinitely divisible, so

\colorbox{white}{$\displaystyle \bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right]$} \ge 0

\colorbox{yellow}{$\displaystyle \bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right]$} \ge 0

where V^{(j)} \sim \mathrm{Gamma}(m^{-1}, 1).

Note, participant i can recover:

\bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right] - V^{(i)}

\implies Randomised algorithm Version 1

\mathcal{A}(\data) \coloneqq \sum_{j=1}^m \log r(\sampp \given \samps, \data[j]) + \sum_{\substack{j=1 \\ j \ne i}}^m V^{(j)}

(\varepsilon, \delta)-differential privacy of v1

Fixed \sampp, \samps, randomised algorithm \mathcal{A} : \mathcal{X} \to \R, \mathcal{A}(\data) \coloneqq \log r(\sampp \given \samps, \data) + V where V \sim \mathrm{Gamma}\left(m^{-1}(m-1), 1\right) V \sim \mathrm{Gamma}\left(m^{-1}(m-1), 1\right) V \sim \mathrm{Gamma}\left(m^{-1}(m-1), 1\right), m>1, and r(\sampp \given \samps, \data) \in (0,1).

\mathcal{A} is (\varepsilon,\delta)-DP,

\varepsilon = \Delta + \frac{1}{m} \log\left(1 + \frac{\Delta}{\tau}\right), \quad \colorbox{white}{$\delta = F_V(\tau)$}

\varepsilon = \Delta + \frac{1}{m} \log\left(1 + \frac{\Delta}{\tau}\right), \quad \colorbox{yellow}{$\delta = F_V(\tau)$}

  • \Delta = \sup_{\data, \data'} |\log r(\sampp \given \samps, \data) - \log r(\sampp \given \samps, \data')|

  • \tau \ge \Delta \tau \ge \Delta is a user-chosen constant

  • F_V(\cdot) is the cumulative distribution function of V.

f-differential privacy of v1

\begin{align*} &T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big)(\alpha) \\[1em] &= \begin{cases} 1 - F_V\Big( F_V^{-1}\big( \min \{ \alpha+F_V(-\eta), 1 \} \big) + \eta \Big) &\text{if } \eta \le 0 \\[10pt] F_V\big(F_V^{-1}( 1 - \alpha) + \eta\big) - F_V(\eta) &\text{if } \eta > 0 \end{cases} \end{align*} where \eta \coloneqq \log r(\sampp \given \samps, \data) - \log r(\sampp \given \samps, \data')

\implies f-differentially private for f(\alpha) = 1 - F_V(\Delta) - \alpha, \alpha \in [0, 1 - F_V(\Delta)) and f(\alpha) = 0 otherwise, where \Delta = \sup_{\data, \data'} |\eta|.

f-differential privacy of v1, vary \eta

f-differential privacy of v1, vary m

Accept-Reject in Monte Carlo (III)

Version 2: To avoid issue with differing support of randomised algorithm,

\left( \bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right] \right) \colorbox{yellow}{$\displaystyle \otimes \underbrace{\left( \bigoplus_{j=1}^m W^{(j)} \right)}_{\sim\ \text{Exp}(1)}$} \ge 0

where V^{(j)}, W^{(j)} \sim \mathrm{Gamma}(m^{-1}, 1).

Trade-off function for v2

\mathcal{A}(\data)= \big(\log r(\sampp \given \samps, \data) + V\big) W where V, W \sim \mathrm{Exp}(1).

Let \rho \coloneqq \log r(\sampp \given \samps, \data). Then, T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big)(\alpha) = \begin{cases} F_\mathcal{A}\big( F_\mathcal{A}^{-1}(1 - \alpha \given \data) \given \data' \big) &\text{if } \rho \le \rho' \\[10pt] 1 - F_\mathcal{A}\big( F_\mathcal{A}^{-1}(\alpha \given \data) \given \data'\big) &\text{if } \rho > \rho' \end{cases} where F_\mathcal{A}(\xi \given \data) = \begin{cases} \displaystyle e^{\rho} \int_{\rho}^{0} e^{-b-\xi/b} \, db & \text{if } \xi \le 0 \\[10pt] \displaystyle 1 - 2e^{\rho} \sqrt{\xi} K_1\left(2 \sqrt{\xi}\right) & \text{if } \xi > 0 \end{cases}

f-differential privacy of v2

Confidential AR

  • Confidential AR by itself can provide f-differential privacy
    • if we have r(\sampp \given \samp, \data) \in (0,1)
  • Work ongoing:
    • private rejection sampling
    • lazy MCMC sampling
    • hierarchical Bayesian model sampling
  • Furthermore, Confidential AR is a drop-in privacy amplification tool for existing private Monte Carlo methods, without altering exactness.

Confidential AR for Privacy Amplification

Yıldırım & Ermiş (2019)

Use penalty method (Ceperley & Dewing, 1999) for exact MCMC with privacy. \log \hat{r}(\sampp \given \samp, \data) = \log r(\sampp \given \samp, \data) + \lambda - \sigma^2(\samp, \sampp)/2 where \lambda \sim \text{N}\left(0, \sigma^2(\samp, \sampp)\right).

For federated MCMC, \log \hat{r}(\sampp \given \samp, \data) = \sum_{j=1}^m \left( \log r(\sampp \given \samp, \data[j]) + \lambda_j \right) - m \sigma^2(\samp, \sampp)/2

DP of Yıldırım & Ermiş (2019)

Approximate DP holds when \sigma^2_n(\samp, \sampp) \coloneqq \tau^2 n^{2 \alpha} c^2(\samp, \sampp)

where,

  • \alpha > 0, C > 0 st \begin{align*} c(\samp, \sampp) &= \sup_{\data, \data'} \left\{ \left[ \ell(\sampp; \data) - \ell(\samp; \data) \right] - \left[ \ell(\sampp; \data') - \ell(\samp; \data') \right] \right\} \\ &\le C n^{-\alpha} \end{align*}
  • choose \tau > 0, trading off (\varepsilon, \delta) privacy, № iterations, and mixing

Empirical estimation of privacy amplification

Software Implementation

Software Implementation

Have seen pre-release preview of {fdp} R package 📦

Time permitting, live demo of two other key packages which will release in the coming months:

  • 📦 {hss}

  • 📦 {confidential.ar}

Release will be announced at:

@louisaslett@fosstodon.org

@louisaslett.bsky.social

in/louisaslett