Confidential Accept-Reject
Bayescomp 2025

Louis JM Aslett, Joshua Bon, Murray Pollock, Shengang Hu, Hongsheng Dai, Gareth O. Roberts

Slides: www.louisaslett.com/250620

Outline

\gdef\data{\mathbf{X}\@ifnextchar[{\datai}{}} \gdef\datai[#1]{^{(#1)}} \gdef\samp{\theta\@ifnextchar[{\sampi}{}} \gdef\sampi[#1]{_{#1}} \gdef\sampp{\theta^\star} \gdef\samps{\boldsymbol{\theta}} \gdef\Pr{\mathbb{P}} \gdef\given{\mid}

Motivation

\quad\to\ Real-world application

Background

\quad\to\ f-differential privacy

\quad\to\ Homomorphic secret sharing

Confidential Accept-Reject

Privacy Amplification

Software Implementation

Motivation

Data often distributed between different parties

\quad\to\ observation wise (most common)

\quad\to\ variable wise (a little rarer)

Parties not willing/able to share data

\quad\to\ trust

\quad\to\ governance issues

Goal: jointly fit models to all available data

FAIRVASC Consortium (I)

AAV covers a group of uncommon but:

serious
multisystem autoimmune disorders
characterized by blood vessel inflammation.

AAV diseases have similar

clinical and pathological features and
strong association with anti-neutrophil cytoplasm antibodies (ANCA) means this group
studied together and called “ANCA associated vasculitis”.

Many studies in past 30 years showed an increased incidence of AAV with time. A possible explanation is better awareness of these diseases among physicians together with availability of diagnostic tests including ANCAs. The current combined annual incidence of AAV in Europe is around 20 per million inhabitants.

Many of the small vessel diseases can cause damage to the kidneys and prevent them working.
Typical symptoms of impaired kidney function would be fatigue, poor appetite, and swollen legs and breathlessness due to retention of fluid.
Severe cases mean acid and toxins accumulate in the body causing severe breathlessness, nausea and vomiting, confusion and epileptic fits.
Blood pressure may also be very high which can cause damage to the kidneys, heart, brain and other organs.

Annual incidence 20 per million or 1 in 50,000

For context, WHO defines rare disease as fewer than 1 in 2,000 people affected, so this is rare

RITA = Rare Immunological Disorders network

ANCA associated vasculitis (AAV) diseases cause long-term damage to kidneys, can lead to renal failure
Aetiology largely unknown, annual rate \approx 2 \times 10^{-5}

European Vasculitis Society (EUVAS) and RITA European Reference Network project FAIRVASC
- bring together 7 European clinical data registries
- harmonise clinical terms and integrate into common ontology
- uplift data into secure data SPARQL registry
- design federated queries to run over all registries

FAIRVASC Consortium (II)

FAIRVASC Consortium (III)

Once data harmonised, federated SPARQL queries tested
- patient totals
- simple summary counts (treatment/stage/etc)

But, data governance nightmare …

7 registry groups + 7 national legal systems + pre-approve query by query
Require privacy options to facilitate simpler governance approvals and enable more complex modelling.

Objective

Data summaries not enough
Would like to jointly fit models to data from all registries
Aim to perform Bayesian inference to:
- Fit a common model
- Observations distributed among participants
- Honest but curious participants
- Differential privacy of outputs

Problem setting

Background

Differential Privacy and f-DP

Definition: (\varepsilon, \delta)-differential privacy (Dwork & Roth, 2014)

A randomised algorithm \mathcal{A} : \mathcal{X} \to \Omega satisfies (\varepsilon, \delta)-differential privacy if \Pr(\mathcal{A}(\data) \in \mathcal{S}) \le e^\varepsilon \Pr(\mathcal{A}(\data') \in \mathcal{S}) + \delta for all measurable \mathcal{S} \subseteq \Omega, where \data, \data' \in \mathcal{X} are neighbouring data sets.

Hypothesis testing view: observe \xi. Do membership inference attack (MIA) to distinguish: H_0: \xi \sim \mathcal{A}(\data) \quad\text{vs}\quad H_1: \xi \sim \mathcal{A}(\data') Rejection with probability \phi: \Xi \to [0,1], giving Type-I/II errors \alpha_\phi = \mathbb{E}_{\mathcal{A}(\data)}[\phi] \qquad \beta_\phi = 1 - \mathbb{E}_{\mathcal{A}(\data')}[\phi]

Trade-off function (Dong et al., 2022)

A trade-off function, T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big) : [0,1] \to [0,1], returns the Type-II error of the most powerful MIA for a given Type-I error \alpha \in [0,1], T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big)(\alpha) \coloneqq \inf_{\phi} \big\{ \beta_\phi : \alpha_\phi \le \alpha \big\} f : [0,1] \to [0,1] is a trade-off function \iff f is convex, continuous, non-increasing and f(x) \le 1 - x \ \ \forall\,x \in [0,1].

Definition: f-differential privacy (Dong et al., 2022)

Let f be a trade-off function. A randomised algorithm \mathcal{A} is f-differentially private if T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big) \ge f \quad \forall\,\data, \data' \ st\ \|\data-\data'\|_1 \le 1

f-Differential Privacy

lap(1)
epsdelta(1)

epsdelta(0.5, 0.1)

gdp(0.5),
est_epsdelta(gdp(0.5), delta=0.1)

gdp(0.5)
gdp(1)
gdp(3)
gdp(6)

Confidential Accept-Reject

Accept-Reject in Monte Carlo (I)

\data \coloneqq \big( \data[1], \data[2], \dots, \data[m] \big)

Often core of algorithm is accept/reject decision.

r(\sampp \mid \samp_1, \dots, \samp_n, \data)

	Acceptance Probability
Rejection Sampling	r(\sampp \given \data)
MCMC	r(\sampp \given \samp_n, \data)
Adaptive MCMC	r(\sampp \given \samp_1, \dots, \samp_n, \data)

Key assumption:

\log r(\sampp \given \samps, \data) = \sum_{j=1}^m \log r(\sampp \given \samps, \data[j])

\log r(\sampp \given \samps, \data) = \bigoplus_{j=1}^m \log r(\sampp \given \samps, \data[j])

❌ Not enough alone — deterministic

Accept-Reject in Monte Carlo (II)

For U \sim \text{U}(0,1), \begin{align*} r(\sampp \given \samps, \data) &\ge U \\ \iff \left[\bigoplus_{j=1}^m \log r(\sampp \given \samps, \data[j])\right] \oplus \left(-\log U\right) &\ge 0 \\ \end{align*}

-\log U \sim \mathrm{Exp}(1), infinitely divisible, so

\colorbox{white}{$\displaystyle \bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right]$} \ge 0

\colorbox{yellow}{$\displaystyle \bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right]$} \ge 0

where V^{(j)} \sim \mathrm{Gamma}(m^{-1}, 1).

Note, participant i can recover:

\bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right] - V^{(i)}

\implies Randomised algorithm Version 1

\mathcal{A}(\data) \coloneqq \sum_{j=1}^m \log r(\sampp \given \samps, \data[j]) + \sum_{\substack{j=1 \\ j \ne i}}^m V^{(j)}

(\varepsilon, \delta)-differential privacy of v1

Fixed \sampp, \samps, randomised algorithm \mathcal{A} : \mathcal{X} \to \R, \mathcal{A}(\data) \coloneqq \log r(\sampp \given \samps, \data) + V where V \sim \mathrm{Gamma}\left(m^{-1}(m-1), 1\right) V \sim \mathrm{Gamma}\left(m^{-1}(m-1), 1\right) V \sim \mathrm{Gamma}\left(m^{-1}(m-1), 1\right), m>1, and r(\sampp \given \samps, \data) \in (0,1).

\mathcal{A} is (\varepsilon,\delta)-DP,

\varepsilon = \Delta + \frac{1}{m} \log\left(1 + \frac{\Delta}{\tau}\right), \quad \colorbox{white}{$\delta = F_V(\tau)$}

\varepsilon = \Delta + \frac{1}{m} \log\left(1 + \frac{\Delta}{\tau}\right), \quad \colorbox{yellow}{$\delta = F_V(\tau)$}

\Delta = \sup_{\data, \data'} |\log r(\sampp \given \samps, \data) - \log r(\sampp \given \samps, \data')|
\tau \ge \Delta \tau \ge \Delta is a user-chosen constant
F_V(\cdot) is the cumulative distribution function of V.

f-differential privacy of v1

\begin{align*} &T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big)(\alpha) \\[1em] &= \begin{cases} 1 - F_V\Big( F_V^{-1}\big( \min \{ \alpha+F_V(-\eta), 1 \} \big) + \eta \Big) &\text{if } \eta \le 0 \\[10pt] F_V\big(F_V^{-1}( 1 - \alpha) + \eta\big) - F_V(\eta) &\text{if } \eta > 0 \end{cases} \end{align*} where \eta \coloneqq \log r(\sampp \given \samps, \data) - \log r(\sampp \given \samps, \data')

\implies f-differentially private for f(\alpha) = 1 - F_V(\Delta) - \alpha, \alpha \in [0, 1 - F_V(\Delta)) and f(\alpha) = 0 otherwise, where \Delta = \sup_{\data, \data'} |\eta|.

f-differential privacy of v1, vary \eta

car_v1(alpha, 0.5, m = 3)

car_v1(alpha, 0.5, m = 3)
car_v1(alpha, -0.5, m = 3)

car_v1(alpha, 0.1, m = 3)
car_v1(alpha, -0.1, m = 3)
car_v1_f(alpha, 0.1, m = 3)

f-differential privacy of v1, vary m

Accept-Reject in Monte Carlo (III)

Version 2: To avoid issue with differing support of randomised algorithm,

\left( \bigoplus_{j=1}^m \left[\log r(\sampp \given \samps, \data[j]) + V^{(j)} \right] \right) \colorbox{yellow}{$\displaystyle \otimes \underbrace{\left( \bigoplus_{j=1}^m W^{(j)} \right)}_{\sim\ \text{Exp}(1)}$} \ge 0

where V^{(j)}, W^{(j)} \sim \mathrm{Gamma}(m^{-1}, 1).

Trade-off function for v2

\mathcal{A}(\data)= \big(\log r(\sampp \given \samps, \data) + V\big) W where V, W \sim \mathrm{Exp}(1).

Let \rho \coloneqq \log r(\sampp \given \samps, \data). Then, T\big(\mathcal{A}(\data), \mathcal{A}(\data')\big)(\alpha) = \begin{cases} F_\mathcal{A}\big( F_\mathcal{A}^{-1}(1 - \alpha \given \data) \given \data' \big) &\text{if } \rho \le \rho' \\[10pt] 1 - F_\mathcal{A}\big( F_\mathcal{A}^{-1}(\alpha \given \data) \given \data'\big) &\text{if } \rho > \rho' \end{cases} where F_\mathcal{A}(\xi \given \data) = \begin{cases} \displaystyle e^{\rho} \int_{\rho}^{0} e^{-b-\xi/b} \, db & \text{if } \xi \le 0 \\[10pt] \displaystyle 1 - 2e^{\rho} \sqrt{\xi} K_1\left(2 \sqrt{\xi}\right) & \text{if } \xi > 0 \end{cases}

f-differential privacy of v2

Run first
-0.75
-0.25

Note, sensitivity has an effect
-0.75
-0.5

*But* so does the actual log-acceptance probability,
same gap ...
-1.75
-1.5

Confidential AR

Confidential AR by itself can provide f-differential privacy
- if we have r(\sampp \given \samp, \data) \in (0,1)
Work ongoing:
- private rejection sampling
- lazy MCMC sampling
- hierarchical Bayesian model sampling
Furthermore, Confidential AR is a drop-in privacy amplification tool for existing private Monte Carlo methods, without altering exactness.

Confidential AR for Privacy Amplification

Yıldırım & Ermiş (2019)

Use penalty method (Ceperley & Dewing, 1999) for exact MCMC with privacy. \log \hat{r}(\sampp \given \samp, \data) = \log r(\sampp \given \samp, \data) + \lambda - \sigma^2(\samp, \sampp)/2 where \lambda \sim \text{N}\left(0, \sigma^2(\samp, \sampp)\right).

For federated MCMC, \log \hat{r}(\sampp \given \samp, \data) = \sum_{j=1}^m \left( \log r(\sampp \given \samp, \data[j]) + \lambda_j \right) - m \sigma^2(\samp, \sampp)/2

DP of Yıldırım & Ermiş (2019)

Approximate DP holds when \sigma^2_n(\samp, \sampp) \coloneqq \tau^2 n^{2 \alpha} c^2(\samp, \sampp)

where,

\alpha > 0, C > 0 st \begin{align*} c(\samp, \sampp) &= \sup_{\data, \data'} \left\{ \left[ \ell(\sampp; \data) - \ell(\samp; \data) \right] - \left[ \ell(\sampp; \data') - \ell(\samp; \data') \right] \right\} \\ &\le C n^{-\alpha} \end{align*}
choose \tau > 0, trading off (\varepsilon, \delta) privacy, № iterations, and mixing

Empirical estimation of privacy amplification

pen(0.8)       # Approx gdp(1.7)
pen(0.8, m = 3)

Remove first line and build up
pen(0.8, m = 3)
pen_car_v1(0.8)
pen_car_v2(0.8)
pen_car_v2(0.4) # Match privacy, factor 2x better penalty term

Then add ...
est_epsdelta(pen_car_v2(0.8), delta = 0.1)
est_epsdelta(pen(0.8, m = 3), delta = 0.1)

Change first to ...
est_epsdelta(pen_car_v2(0.4), delta = 0.1)

Software Implementation

Have seen pre-release preview of {fdp} R package 📦

Time permitting, live demo of two other key packages which will release in the coming months:

📦 {hss}
📦 {confidential.ar}

Release will be announced at:

@louisaslett@fosstodon.org

@louisaslett.bsky.social

in/louisaslett

Confidential Accept-RejectBayescomp 2025

Outline

Motivation

Motivation

FAIRVASC Consortium (I)

FAIRVASC Consortium (II)

FAIRVASC Consortium (III)

Objective

Problem setting

Background

Homomorphic Secret Sharing, x_1+x_2+x_3

Homomorphic Secret Sharing, x_1+x_2+x_3

Homomorphic Secret Sharing, x_1+x_2+x_3

Homomorphic Secret Sharing: simplest approach

Homomorphic Secret Sharing

Differential Privacy and f-DP

f-Differential Privacy

Confidential Accept-Reject

Accept-Reject in Monte Carlo (I)

Accept-Reject in Monte Carlo (II)

(\varepsilon, \delta)-differential privacy of v1

f-differential privacy of v1

f-differential privacy of v1, vary \eta

f-differential privacy of v1, vary m

Accept-Reject in Monte Carlo (III)

Trade-off function for v2

f-differential privacy of v2

Confidential AR

Confidential AR for Privacy Amplification

Yıldırım & Ermiş (2019)

DP of Yıldırım & Ermiş (2019)

Empirical estimation of privacy amplification

Software Implementation

Software Implementation

Confidential Accept-Reject
Bayescomp 2025