Bridging across patient subgroups in phase I oncology trials that incorporate animal data

In this paper, we develop a general Bayesian hierarchical model for bridging across patient subgroups in phase I oncology trials, for which preliminary information about the dose–toxicity relationship can be drawn from animal studies. Parameters that re-scale the doses to adjust for intrinsic differences in toxicity, either between animals and humans or between human subgroups, are introduced to each dose–toxicity model. Appropriate priors are specified for these scaling parameters, which capture the magnitude of uncertainty surrounding the animal-to-human translation and bridging assumption. After mapping data onto a common, ‘average’ human dosing scale, human dose–toxicity parameters are assumed to be exchangeable either with the standardised, animal study-specific parameters, or between themselves across human subgroups. Random-effects distributions are distinguished by different covariance matrices that reflect the between-study heterogeneity in animals and humans. Possibility of non-exchangeability is allowed to avoid inferences for extreme subgroups being overly influenced by their complementary data. We illustrate the proposed approach with hypothetical examples, and use simulation to compare the operating characteristics of trials analysed using our Bayesian model with several alternatives. Numerical results show that the proposed approach yields robust inferences, even when data from multiple sources are inconsistent and/or the bridging assumptions are incorrect.

represents the simulated animal data used for the numerical studies. The height of the bar represents the number of animal subjects treated, and the height of the dark grey segment counts the number of toxicity. Doses listed in brown are those administered to either rats and monkeys, which are translated onto an equivalent human dosing scale in black. Projections are made by scaling animal doses using the prior median of δ Rat or δ Monkey . With the simulated animal data and the priors specified in Section 3.1, we further obtain the marginal predictive priors that are shown graphically in Figure S2. We approximate these marginal predictive priors with a series of Beta(a, b) distributions and report the prior effective sample size (ESS) for the predicted human toxicity at each dose in Table S1.  Figure S2: Summaries about the robust, marginal predictive priors for human toxicity based on the animal data. Panel A shows median and 95% credible interval of the marginal predictive prior for human toxicity at each dose. Panel B presents the prior interval probability of overdose, and Panel C displays prior densities for the risks of toxicity at potential starting doses.

B. SIMULATION RESULTS THAT ARE NOT PRESENTED IN THE MAIN PAPER
In the main mauscript, we have compared Model A (the proposed approach) with Models B and D and interpreted the results in terms of benefit attributed to either using the robust bridging strategy (Model A versus Model B) or animal data (comparing Model A and D). Here, we present additional simulation results in Figure S3 to compare our Model A with Models C and E, which does not use co-data at all and which completely pools the data from trial T 1 to T 2 , respectively. Numerical results of the complete comparison based on Models A -E are listed in Table S2. In scenarios 1 -3 when the bridging assumption is correct and/or animal data provide high predictability of human toxicity, it is evident that Model A leads to significantly increased PCS and allocates more patients to the true MTD in both regions R 1 and R 2 . For example, comparing Models A and C, the PCS was increased from 14.7% to 42% in scenario 1 for trial T 1 due to the use of animal data, and from 17% to 45% for trial T 2 due to the use of animal data and correct bridging strategy. Model E does not leverage animal data into trial T 1 but completely pools in the T 1 trial data into trial T 2 . As a result, we observe that the PCS for trial T 1 was much lower than that of Model A in these scenarios, although a higher PCS is observed in scenario 2 for pooling in the consitent T 1 trial data. Using trial data from other human subgroup by such completely pooling approach could be harmful, which is evidenced by the results in scenario 6. Comparing Model A and Model E, we see that the latter allocated much more patients to excessively toxic doses which are 5, 10 and 20 mg/kg in this scenario, and more often decleared dose 1 mg/kg as the MTD. This is not surprising: pooling in the T 1 trial data means we suppose patients in regions R 1 and R 2 are exchangeable, and Model E would consequently underestimate the human toxicity in trial T 2 .
We now switch our focus to assessing the accuracy of the posterior point estimates based on the Bayesian analysis models. As has been illustrated, leveraging co-data that are (in)consistent with current trial data, should improve the accuracy of posterior estimates for the probability of toxicity. Analysis models that enable borrowing of information in this situation correspondingly will present (dis)advantages over those not permitting borrowing at all. We are thus much concerned with the accuracy of posterior estimates based on the proposed Bayesian model compared with its alternatives. In the simulation study, we preserve the point estimates (posterior medians) by the end of the completed trials, which may be a subset of or the entire 1000 pairs. We average across such point estimates per human dose under the analysis models A -E to approximate the dose-toxicity relationship specific to each region.  Figure S4 visualises how close the estimated dose-toxicity relationships would be to the true human toxicicty, which is marked with black cross per dose of interest. As we can see, Model C and Model E showed less satisfactory performance, compared with the rest, across nearly all the simulation scenarios. The dose-toxicity curves fitted using these two models are severely deviated from the true dose-toxicity relationship when the region-specific MTDs are very divergent, such as in scenario 6. Model B bears excessive sharing of information, which is therefore inappropriate for the divergent scenarios, either. In contrast, Models A and D give quite robust estimates about the dose-toxicity relationship in most cases. Moreover, these models converge well to the true MTD. Scenario 5 corresponds to an overly toxic situation, where most simulated trials are meant to be stopped early. The complement subset, i.e., completed trials, are those suggested the drug to be much less toxic than the fact, given fewer DLTs observed. Consequently, the fitted curves obtained from the completed trials only tend to underestimate the true probability of toxicity.

C. ON THE PRIOR PROBABILITIES OF (NON-)EXCHANGEABILITY
The operating characteristics of the proposed Bayesian hierarchical model for bridging studies also rely on the specification of prior probabilities of exchangeability and non-exchangeability. Of our particular interest is to investigate the impact of misspecifying such prior probabilities on the model performance in (i) estimating the MTD correctly and (ii) allocating patients to appropriate doses. For example, how the proposed procedure behaves if specifying a moderate to large prior probability of exchangeability to animal species, of which the toxicity profile actually is not commensurate with human's.
We use the two most challenging scenarios from the simulation study, i.e., scenario 4 where animal data do not predict the human toxicity well, and scenario 6 where animal data may suffice to support the estimation of MTD only in Trial T 2 while the bridging assumption does not hold. Focusing on the proposed Bayesian hierarchical model (Model A), we simulate 1,000 pairs of the phase I oncology trials, T 1 and T 2 setting the prior probabilities of exchangeability and non-exchangeability as follows.
• Wgt1: Scenario 4 Scenario 6 Trial T1  Figure S5 shows that in these two scenarios, Wgt1 and Wgt2 lead to higher proportion of times for correctly estimating the MTD. These two sets of weights set a relatively low weight on the rat data that predict dose 1 mg/kg as the most probable MTD (in fact far from the true MTD), making the dose-escalation procedure easier to down-weight such incommensurate animal data. By contrast, in T 2 under scenario 6, the Wgt5 that set relatively high weight on the rat data presents advantages brought by the data commensurability. Looking at the subfigure (ii) for the average number of patients allocated per dose, the Wgt 1 to Wgt 3 all yield fairly even allocation while Wgt 4 and Wgt 5 concentrate patients on the low doses, particularly 1 mg/kg. It would be of interest to check how the proposed methodology performs when the human equivelant doses, as translated from animal data based on δ Rat and δ Monkey , are far from the actual doses used in the human trials. We thus consider two new simulation scenarios A and B, which are adapted from the scenarios 4 and 6 investigated above, respectively. More specifically, the doses available for evaluation in trials T 1 and T 2 are far larger than the highest probable human equivalent doses after the translation across species, i.e., D = {1, 5, 10, 20, 40, 80} for = 1, 2, but the risks of toxicity remain identical to the levels specified in Table 2 of the main paper. This creates a larger discrepancy in the toxicity profile between animals and humans. Figure S6 summarises the meta-analytic predictive (MAP) priors for the risk of toxicity by source of information on the scale of human doses for evaluation. It suggests that even in this new configuration of human doses, animal data (particularly those collected from monkey studies) might still give relevant prediction towards human toxicity.
Adhering the same specification of simulation, we investigate two sets of prior probabilities of exchangeability and non-exchangeability featured by different levels of w R = 0.2, 0.5 for robust inferences:  Figure S7 compares the operating characteristics of the simulated phase I trials, conducted and analysed by using different levels of w R . As we can read from subfigure (i), by increasing w R from 0.2 to 0.5 to downplay the available animal data, the percentage of correctly selecting the MTD (marked by the black vertical line on the plots) is reduced per scenario, especially for trial T 2 , for which monkey data could be thought of as relevant. With subfigure (ii), we see that setting w R = 0.2 (overwhelming prior weight has been given to the monkey data) favours the allocation of patients to doses 5 to 20 mg/kg, which are suggested to be the probable target dose.

D. COMPARISON WITH THE BRIDGING CONTINUAL REASSESSMENT METHOD
As suggested by one anonymous reviewer, we investigate how the proposed Bayesian hierarchical model compares with the bridging continual reassessment method (CRM) by Liu et al. (2015) [3]. Although the latter was not meant to be used for incorporating animal data, it has been widely recognised in the field of phase I bridging trials, particularly in oncology. It is important to state at the outset that the aim of the following simulation study is not to compare the two Bayesian models which involve two distinct modelling strategies, but to illustrate the relative strength and weakness of each, when applied for the conduct of phase I dose-escalation trials in relevant patient subgroups.
For the present purpose, when implementing the bridging CRM, we use the medians of the robust MAP priors that synthesise both rat and monkey data, i.e., (0.095, 0.148, 0.181, 0.302, 0.376, 0.446) plotted using points in Panel A of Figure S2, as the skeleton probabilities for the CRM model to guide the dose-escalation procedure in trial T 1 . Following Liu et al. (2015) [3], three sets of skeleton probabilities are generated by the end of trial T 1 , based on the posterior medians of the toxicity risks per human dose. This leads to three CRM models, each with a probability of 1/3 for the model being true. The dose-escalation procedure in trial T 2 is then a result of the model averaging. Different from Liu et al. (2015) [3] who select the interim recommendation doses based on the point estimate (posterior mean) of toxicity, we adopt the same criterion as specified in our main paper to recommend a maximum dose, of which the probability of overdose is controlled below 25%. The way we choose a starting dose in trial T 2 , and the constraints of early stopping for safety as well as "never skipping a dose during escalation" are also kept the same to what we will use for implementing the proposed Model A. Figure S8 presents the simulation results for the trial operating characteristics yielded by the proposed Model A against the bridging CRM. In scenarios 1 and 2 where the co-data are consistent for both trials T 1 and T 2 , the proposed Model A is more advantageous than the bridging CRM in correctly selecting the MTD particularly in trial T 2 as well as allocating more patients to this dose. By contrast, the bridging CRM displays better performance particularly in scenario 4 where the inconsistent co-data are weighed considerably by the proposed Model A. Combining results in scenarios 5 and 6, the bridging CRM in our implementation is more prone to early stopping, while the proposed Model A gives satisfactory operating characteristics.
As we noted, this simulation study is mainly to give an intuition about both Bayesian models when applied in phase I bridging trials. Each of them has a number of parameters that could be tuned for the best model performance possible. We recommend extensive simulations to be done for either model that would be chosen for future implementation.