Introduction

The ability to identify pathways through which a complex exposure mixture operates is critical for the development of public health policy, as mediation pathways can often be influenced via interventions.

Exposure to complex mixtures is ubiquitous, and recent work in toxicology and epidemiology now emphasizes assessing mixtures of chemicals. Since elements of a mixture have the potential to exhibit complex interactions,it is important to consider the whole mixture when evaluating the nature of the relationship of multiple chemicals on a health outcome.

One approach to quantifying biological mechanisms is the use of causal mediation analysis. Causal mediation analysis allows for the decomposition of a total effect (TE) of an exposure on an outcome into the pathway that operates indirectly through an intermediate (mediator) variable and the pathway that is independent of the intermediate variable, or that operates directly from the exposure to the outcome.

Few methods exist to estimate mediation effects when the exposure of interest is a mixture and exposure-mediator interactions are present. Current methods require specifying an outcome and a mediator regression model. If the media-tor variable is restricted to have a linear effect on the outcome, closed form solutions are available to estimate the natural direct effect (NDE), natural indirect effect (NIE), and controlled direct effects (CDEs) of a mediator on the relationship of a mixture on an outcome. In the presence of a nonlinear effect of the mediator on the outcome, to our knowledge, no other methods using a causal inference framework currently exist to estimate the NDE, NIE, and CDEs of a potentially complex exposure mixture on an outcome through a mediator variable.

This paper present a novel causal mediation method to estimate the NDE, NIE, and CDEs for a poten-tially complex mixture of exposures on an outcome operating through an intermediate variable. The method allow for highly complex exposure-mediator and exposure-mediator-response functions, in addition to complex exposure-covariate and exposure-mediator-covariate relationships with the mediator and outcome respectively, using Bayesian Kernel Machine Regression (BKMR).

This package causalbkmr implement the BKMR-CMA method, which use BKMR for the mediator and outcome regression models since BKMR allows for all possible nonlinearities and interactions among the elements included in the kernel with the specified outcome, without a prior specification, and credible intervals a BKMR fit inherently control for multiple testing due to the Bayesian nature of the model and the prior specification.

We predict counterfactuals using the posterior predictive distributions of the mediator and the outcome and present an algorithm for estimation of mediation effects. We also conduct a simulation study to compare how our approach performs relative to current mediation methods that assume a restrictive linear relationship between the exposure, mediator, and outcome.

Bayesian kernel machine regression

We first review Kernel Machine Regression (KMR) as a framework for estimating the effect of a complex mixture when only a single time point of exposure is measured. For each subject \(i = 1, ..., n\), we assume

\[\begin{equation} Y_{i}=h\left(\mathbf{a}_{i}\right)+\mathbf{c}_{i}^{\mathrm{T}} \boldsymbol{\beta}+\epsilon_{i} \end{equation}\] where \(\mathbf{a}_{i}=\left(a_{i 1}, \ldots, a_{i M}\right)^{\mathrm{T}}\) is a vector of \(M\) exposure variables.

It can be shown that the above model can be expressed as the mixed model \[y_{i} \sim N\left(h_{i}+\mathbf{c}_{i}^{T} \boldsymbol{\beta}, \sigma^{2}\right)\]

\[\begin{equation} \mathbf{h} \equiv\left(h_{1}, \ldots, h_{n}\right)^{\mathrm{T}} \sim N(\mathbf{0}, \tau \mathbf{K}) \end{equation}\]

where \(\mathbf{K}\) is the kernel matrix, has \((i, j)-\)element \(K(\mathbf{a}_{i}, \mathbf{a}_{j})\)

There are different choices of the kernel. We focus on the Gaussian kernel, which flexibly captures a wide range of underlying functional forms for \(h(\cdot)\), although the methods are applicable to a broad choice of kernels. We assume \[\operatorname{cor}\left(h_{i}, h_{j}\right)=\exp \left\{-(1 / \rho) \sum_{m=1}^{M}\left(a_{i m}-a_{j m}\right)^{2}\right\}\] where \(\rho\) is a tuning parameter that regulates the smoothness of the dose-response function. This assumption implies that two subjects with similar exposures will have more similar risks (\(a_i\) to \(a_j\), and \(h_i\) will be close to \(h_j\) ).

Bobb et al (2015) presented the Bayesian Kernel Machine Regression (BKMR) as a framework to estimate the effect of a complex mixture on a health outcome.

To fit (2), we assume a at prior on the coefficients for the confounding variables, \(\beta \sim 1\), and
\(\sigma^{-2} \sim \operatorname{Gamma}\left(a_{\sigma}, b_{\sigma}\right)\), where we set \(a_{\sigma} = b_{\sigma} = 0.001\).

We can parameterize BKMR by \(\lambda=\tau \sigma^{-2}\), and we assume a Gamma prior distribution for the variance component of \(\lambda\). For the smoothness parameter \(\rho\), we assume \(\rho \sim \operatorname{Unif}(a, b)\) with \(a=0\) and \(b=100\). For additional details regarding BKMR and prior specifications, see Bobb et al. (Bobb et al., 2015, 2018)

Causal Mediation Analysis

In order to define causal contrasts in a mediation context, we first define our notation. Let \(Y_{am}\) denote the counterfactual outcome \(Y\) if the exposure \(A\) was set to level \(a\) and mediator \(M\) was set to level \(m\). Let \(M_a\) be the counterfactual mediator \(M\) that would have been observed if the exposure \(A\) was set to level \(a\). Accordingly,\(Y_{aMa^∗}\) represents the counterfactual outcome \(Y\) if the exposure \(A\) was set to level \(a\) and the mediator \(M\) was set to the level it would have taken if the exposure \(A\) was set to level \(a^∗\).

The mediated effects of interest, the TE, the NDE, the NIE, and the CDEs, are formally defined as:

\[ \begin{aligned} \mathrm{TE} &=\mathrm{E}\left[Y_a-Y_{a^*}\right] \\ \mathrm{NDE} &=\mathrm{E}\left[Y_{a M_{a^*}}-Y_{a^* M_{a^*}}\right] \\ \mathrm{NIE} &=\mathrm{E}\left[Y_{a M_a}-Y_{a M_{a^*}}\right] \\ \mathrm{CDE}(m) &=\mathrm{E}\left[Y_{a m}-Y_{a^* m}\right] \end{aligned} \]

The NDE captures the average difference in the counterfactual outcomes for a change in exposure level \(a^∗\) to \(a\) , while fixing the mediator to the level it would have taken if the exposure was set to \(a^∗\). The NIE measures the average difference in counterfactual outcomes when fixing the exposure to level \(a\), while the mediator varies from the level it would have taken if the exposure was set to a compared to \(a^∗\). The CDE quantifies the average difference in the counterfactual outcomes for a change in exposure level from \(a^∗\) to \(a\) , while intervening to fix the mediator to a specified level, \(m\).

Bayesian kernel machine regression-causal mediation analysis

For BKMR-Causal Mediation Analysis (BKMR-CMA), we consider a single normally distributed health outcome \(Y\), single normally distributed mediator variable \(M\), continuous exposure mixture \(\mathbf{A}\), continuous effect modifiers \(\mathbf{E}_M\) that may have a complex relationship with the exposures on the mediator, and continuous effect modifiers \(\mathbf{E}_Y\) that may have a complex relationship with the exposures and mediator on the outcome. Let \(\mathbf{Z}_M=\left(\mathbf{A}, \mathbf{E}_M\right)\) and \(\mathbf{Z}_Y=\left(\mathbf{A}, \mathbf{E}_Y\right)\) be matrices containing all of the exposure variables (e.g., metals) and the effect modifiers to include in the prospective kernel functions. Although \(\mathbf{Z}_M\) and \(\mathbf{Z}_Y\) might be the same sets of variables, our framework allows for them to differ. Additionally, one or both of the sets of variables \(\mathbf{E}_M\) or \(\mathbf{E}_Y\) could be empty (i.e., there are no effect modifiers that interact with the exposures and/or have a nonlinear relationship with the mediator or outcome). Lastly, let \(\mathbf{C}\) be a matrix of confounders assumed to have a linear effect on the outcome. To allow for potentially complex relationships between the elements in \(\mathbf{Z}_M\), we model the mediator variable using the BMKR model:

\[ M_i=h_M\left(\mathbf{Z}_{M i}\right)+\mathbf{C}_i^T \boldsymbol{\beta}+\epsilon_{M i}, \]

where \(\epsilon_{M i} \stackrel{u d}{\sim} N\left(0, \sigma_M^2\right)\).

Since accounting for exposure-mediator interactions is important to obtain unbiased effect estimates, we include the mediator variable along with \(\mathbf{Z}_Y\) in the kernel function when modeling the health outcome in : \[ Y_i=h_Y\left(\mathbf{Z}_{Y i}, M_i\right)+\mathbf{C}_i^T \boldsymbol{\theta}+\epsilon_{Y i}, \] where \(\epsilon_{Y i} \stackrel{i i d}{\sim} N\left(0, \sigma_Y^2\right)\).

By fitting the models separately, we assume \(\epsilon_{M i}\) and \(\epsilon_{Y i}\) are independent. To model the TE of the exposure mixture on the outcome, we consider BKMR model : \[ Y_i=h_{T E}\left(\mathbf{Z}_{Y i}\right)+\mathbf{C}_i^T \gamma+\epsilon_{T E i}, \] where \(\epsilon_{T E i} \stackrel{i d d}{\sim} N\left(0, \sigma_{T E}^2\right)\)

We estimate the TE, NDE, and NIE using BKMR-CMA at particular levels of the effect modifiers, \(\mathbf{e}_M\) and \(\mathbf{e}_Y\), for a change in exposure mixture from \(\boldsymbol{a}^*\) to \(\boldsymbol{a}\) via the following algorithm, where we let \(\mathbf{z}_M=\left(\boldsymbol{\alpha}, \mathbf{e}_M\right), \mathbf{z}_M^*=\left(\boldsymbol{a}^*, \mathbf{e}_M\right), \mathbf{z}_Y=\) \(\left(\boldsymbol{a}, \mathbf{e}_Y\right)\), and \(\mathbf{z}_Y^*=\left(\boldsymbol{a}^*, \mathbf{e}_Y\right)\). This corresponds to a fixed intervention in the exposures. It is of utmost importance that careful thought is given to the choice of values for \(\boldsymbol{a}^*\) and \(\boldsymbol{a}\) as well as \(\mathbf{e}_M\) and \(\mathbf{e}_Y\) used to estimate the mediation effects. These values should lie within the support of the multivariate distribution of the data to ensure that the values compared are observable in the population under study.

For additional details regarding BKMR-CMA algorithm, see Devick et al.

References

Devick, Katrina L., et al. “Bayesian kernel machine regression‐causal mediation analysis.” Statistics in Medicine 41.5 (2022): 860-876.

Bobb JF, Claus Henn B, Valeri L, Coull BA. 2018. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health 17:67; doi:10.1186/s12940-018-0413-y.

Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, et al. 2015. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16:493–508; doi:10.1093/biostatistics/kxu058.

Valeri L, Mazumdar M, Bobb J, Claus Henn B, Sharif O, Al. E. 2017. The joint effect of prenatal exposure to metal mixtures on neurodevelopmental outcomes at 24 months: evidence from rural Bangladesh. Env Heal Perspect 125; doi:DOI: 10.1289/EHP614.