How can I calculate sample size while designing a retrospective cohort study?

I am aiming to develop a retrospective study involving chart review of cancer patients, all having a specific type of disease. I then aim to perform various types of survival analyses on my dataset such as Overall Survival and Progression Free Survival comparing a group of patients receiving specific treatment for < 5 months vs another group receiving it for 5 months or more. However, I would like to ascertain early on how many patients to review and collect data for in order to ensure good statistical power while carrying out my survival analysis. Are there any calculators available online that can be used to determine a sample size? Happy to share additional details about the study design to help clarify this question further.

asked Aug 2, 2022 at 19:29 157 1 1 silver badge 4 4 bronze badges

$\begingroup$ You'll need an estimate of the effect size you're looking for - very large effects will be easy to find and won't require many samples at all, while subtle effects will require more samples to ensure that observed differences aren't due to chance. If you want to compare treatment regimens, for example, you'll need a different sample size to reliably call a difference significant if the hazard ratio between treatments is 1.01, or 1.1, or 2. $\endgroup$

Commented Aug 2, 2022 at 19:49

$\begingroup$ Thanks, the difference in my HRs is approximately 0.50 or greater. Would you be able to suggest any online calculators that can be used to calculate approximate sample size based on this info? $\endgroup$

Commented Aug 2, 2022 at 21:16

$\begingroup$ Please edit the question to say more about the "various forms of survival analysis" you have in mind. There are simple formulas for things like comparisons between 2 groups, which might serve as a guide. But when you get into more complicated models and comparisons it's not always so straightforward. Please provide that information (e.g., examples of types of comparisons, numbers and types of covariates you're including in models, etc.) by editing the question, as comments are easy to overlook and can be deleted. $\endgroup$

Commented Aug 3, 2022 at 1:56

$\begingroup$ @EdM, thanks - just revised my question which might help clarify my query further. $\endgroup$

Commented Aug 3, 2022 at 12:19

1 Answer 1

$\begingroup$

For a simple two-group comparison the following formula is a useful guide. If the proportion of patients in one group is $p$ , your specified type 1 error is $\alpha$ , and the hazard ratio you would like to detect is $\text$ , then (adapting from Therneau and Grambsch, equation 3.14) you need to collect data on $d$ events determined by:

Here $c_$ is the critical value for the test at the specified type 1 error (e.g., 1.96 for a 2-sided test at $\alpha= 0.05$ ) and $z_\text$ is the upper standard normal quantile at the desired power (0.84 for 80% power, 1.28 for 90% power). As an example, for a 30/70 group percentage split, $\alpha= 0.05$ in a 2-sided test and the HR of 0.5 that you specify in a comment, you would need data on 104 events for 90% power. Use your estimate of event rates per group to determine how many charts you need to review to find that many total events.

Although Therneau and Grambsch present that formula in the context of exponential survival curves, Schoenfeld showed that it works more generally for the score test under the proportional hazards (PH) assumption. Hsieh and Lavori showed how to extend this to continuous predictors (replacing $p(1-p)$ in the denominator with $\sigma^2$ for the continuous predictor); they also showed how to take covariate adjustments and predictor correlations into account.

I use the tools provided when you load Frank Harrell's R rms package (which also loads his Hmisc package). He notes them briefly here. His cpower() function calculates power from assumed group sizes and event rates, allowing for prospective designs based on accrual rates and minimum follow-up times, and using a modified variance formula. Harrell's spower() function and the R powerSurvEpi package provide power estimates in some more complicated circumstances.

Working with Harrell's packages would also make available to you the extensive resources they provide for reliable regression modeling of many types, including survival models.

In your example you would need to be cautious in a few respects. First, if the difference is the extent of time over which the therapy was received, you would presumably have to count events only after the 5-months period when the groups diverged. Second, choices about how long therapy was provided or received often have to do with clinically significant, outcome-associated variables that you will have to adjust for very carefully. Third, I sense some risk of survivorship bias here, as having received therapy for 5 months or longer implies that the individual has already survived that long.