`vignettes/samplesizeCMH-samplesize.Rmd`

`samplesizeCMH-samplesize.Rmd`

The first method to estimate sample size for the CMH test was introduced by Gail (1973) using a least squares method. Munoz and Rosner (1984) later produced sample size estimators useful in the special case when both margins are fixed. However, this is often not the case in retrospective case-control designs. As such, Woolson, Bean, and Rojas (1986) introduced a calculation using the weighted difference between two binomial distributions, which assumes only one margin to be fixed. This was corroborated by Wittes and Wallenstein (1987) in their paper a year later. Later still, Nam (1992) improved upon that calculation by introducing a coefficient to address continuity corrected CMH tests.

The methods included in the package are based on calculations derived by Woolson, Bean, and Rojas (1986) for uncorrected statistics and Nam (1992) for continuity corrected statistics. A slightly more precise power calculation is used based on Wittes and Wallenstein (1987), but it is functionally equivalent to the calculation by Woolson, Bean, and Rojas (1986).

Starting with the example first used by Woolson, Bean, and Rojas (1986) and then
revisited by Nam (1992), we will explore
the usage of the central function of the `samplesizeCMH`

package, the `power.cmh.test()`

function.

This was a case–control study which looked at whether there was an
association between colon cancer and chlorinated drinking water among
males in Iowa, stratified by age. There were four age categories, 20–54
years, 55–69 years, 70–79 years, and 80–84 years; therefore the number
of levels of the stratifying variable \(J\) = 4. The relative proportion of each
age group to the total sample is as follows: \(t_1\) = 0.10, \(t_2\) = 0.40, \(t_3\) = 0.35, and \(t_4\) = 0.15. The National Collaborative
Bladder Cancer Study showed that exposure rate for controls for each age
stratum, denoted by \(\pi_{2j}\) (where
the subscript ‘2’ indicates second column proportions and \(j\) is the stratum), was \(\pi_{21}\) = 0.75, \(\pi_{22}\) = 0.70, \(\pi_{23}\) = 0.65, and \(\pi_{24}\) = 0.60. Level of significance
was set at 0.05 and desired power was 90% with an effect size of 3. They
matched equal number of cases and controls in each stratum,
*i.e.*, \(s_1=s_2=s_3=s_4=\)
0.5.

We will go over several variations on the experimental design to
explore the usage of the `power.cmh.test()`

function.

Suppose that the researchers first decided to use the original CMH
calculation, ignoring contnuity correction. As such, the parameter
`correct`

is set to `FALSE`

. We see that
the researcher has put in the control exposure proportions as a vector
(`p2`

), the effect size has been given (`theta`

),
and a vector of relative stratum sizes has been assigned
(`t`

). Since there are equal numbers of cases and controls
for each group, the `s`

parameter is not required. Only
`p1`

and `p2`

*or* one of the proportions
and a `theta`

are required to carry out the calculation. The
number of strata (\(J\)) is inferred
from the maximum vector length of `p1`

, `p2`

, or
`theta`

.

```
library(samplesizeCMH)
sample_size_uncorrected <- power.cmh.test(
p2 = c(0.75,0.70,0.65,0.60),
theta = 3,
power = 0.9,
t = c(0.10,0.40,0.35,0.15),
alternative = "greater",
correct = FALSE
)
print(sample_size_uncorrected, detail = FALSE)
```

```
## Power and sample size calculation for the Cochran Mantel Haenszel test
##
## N = 171
## Effective N = 174
## Significance level = 0.05
## Power = 0.9
## Alternative = greater
##
## CALL:
## power.cmh.test(p2 = c(0.75, 0.7, 0.65, 0.6), theta = 3, power = 0.9,
## alternative = "greater", t = c(0.1, 0.4, 0.35, 0.15), correct = FALSE)
```

Note that the `detail = FALSE`

parameter was passed to the `print`

method to reduce output.

After further reading, the researcher has now determined that a
continuity correction would be appropriate when performing the CMH test
(the default of the `mantaelhaen.test()`

is `correct = TRUE`

).
The `correct`

parameter in the code below has be flipped to
`TRUE`

to take
continuity correction into account.

```
sample_size_corrected <- power.cmh.test(
p2 = c(0.75,0.70,0.65,0.60),
theta = 3,
power = 0.9,
t = c(0.10,0.40,0.35,0.15),
alternative = "greater",
correct = TRUE
)
print(sample_size_corrected, n.frac = TRUE)
```

```
## Power and sample size calculation for the Cochran Mantel Haenszel test
##
## N = 191.538
## Effective N = 196
## Significance level = 0.05
## Power = 0.9
## Alternative = greater
##
## Number of subjects per each group:
## ______________________________________________
## Group | 1 2 3 4
## ==============================================
## Case | 9.58 38.31 33.52 14.37
## Control | 9.58 38.31 33.52 14.37
##
## CALL:
## power.cmh.test(p2 = c(0.75, 0.7, 0.65, 0.6), theta = 3, power = 0.9,
## alternative = "greater", t = c(0.1, 0.4, 0.35, 0.15), correct = TRUE)
```

We see that the `N`

has increased by \(\approx\) 12%, as is expected when
continuity correction is taken into account. The `n.frac`

option was used to get fractional n’s, rather than rounding the the next
whole number.

Consider now that the researcher performed some preparatory work and
has found that the number of cases available is much fewer than the
number of available controls. In fact, it is very difficult to find
cases that meet the inclusion criteria, but is relatively easy to find
controls that meet the inclusion criteria. Wanting to leverage all of
the available data, an adjustment was be made to the sample size
calculation performed earlier by including the `s`

parameter.
The design now supposes two controls matched to each case.

```
power.cmh.test(
p2 = c(0.75,0.70,0.65,0.60),
s = 1/3,
theta = 3,
power = 0.9,
t = c(0.10,0.40,0.35,0.15),
alternative = "greater",
correct = TRUE
)
```

```
## Power and sample size calculation for the Cochran Mantel Haenszel test
##
## N = 217
## Effective N = 220
## Significance level = 0.05
## Power = 0.9
## Alternative = greater
##
## Number of subjects per each group:
## ______________________________________
## Group | 1 2 3 4
## ======================================
## Case | 8 29 26 11
## Control | 15 58 51 22
##
## CALL:
## power.cmh.test(p2 = c(0.75, 0.7, 0.65, 0.6), theta = 3, power = 0.9,
## alternative = "greater", s = 1/3, t = c(0.1, 0.4, 0.35, 0.15),
## correct = TRUE)
```

Note that effective N has increased by 24 subjects. The CMH test is
at optimum efficiency when `s`

is balanced. Therefore,
altering the case/control ratio should be implemented only when there is
a compelling reason to do so, such as difficulty in finding cases or
prohibitive costs.

Let’s take a moment to look at the partial table of stratum \(j\) = 1. Given the information above, we
can reconstruct this table to better understand the question at hand. We
can also use this as an opportunity to use some of the peripheral
functions included in the `samplesizeCMH`

package.

The proportion of controls exposed to chlorinated drinking water was \(\pi_{21}\) = 0.75. It follows then that the complementary column proportion is 0.25. The table below shows column percents.

Case | Control | |
---|---|---|

Exposed | ? | 0.75 |

Not Exposed | ? | 0.25 |

Using the `effect.size()`

function, we can use the expected exposure rate of control to estimate
expected exposure rate of cases.

`effect.size(0.75,3)`

`## [1] 0.9`

We can now use that to fill in the remainder of our table.

Case | Control | |
---|---|---|

Exposed | 0.90 | 0.75 |

Not Exposed | 0.10 | 0.25 |

To determine the odds of exposure for the two groups, we can use
`prop2odds()`

.

```
# Control
prop2odds(0.75)
```

`## [1] 3`

```
# Case
prop2odds(0.9)
```

`## [1] 9`

We can also use either row or column proportions to calculate the odds ratio for a 2 \(\times\) 2 table.

`props2theta(0.90,0.75)`

`## [1] 3`

We see that this brings us back to our specified effect size,
`theta`

= 3, which was used in the sample size
calculaion.

There are several other peripheral functions in this package to
interconvert between proportions, odds, relative risk, and odds ratios.
Use `help("odds.and.proportions", "samplesizeCMH")`

to find out more.

Gail, M. 1973. “The Determination of Sample Sizes for Trials
Involving Several Independent 2x2 Tables.” *Journal of Chronic
Diseases* 26 (10): 669–73. https://pubmed.ncbi.nlm.nih.gov/4765321.

Munoz, A., and B. Rosner. 1984. “Power and Sample Size for a
Collection of 2 x 2 Tables.” *Biometrics* 40 (4):
995–1004.

Nam, J. 1992. “Sample Size Determination for Case-Control Studies
and the Comparison of Stratified and Unstratified Analyses.”
*Biometrics* 48 (2): 389–95. https://pubmed.ncbi.nlm.nih.gov/1637968.

Wittes, Janet, and Sylvan Wallenstein. 1987. “The Power of the
Mantel-Haenszel Test.” *Journal of the American Statistical
Association* 82 (400): 1104–9. https://www.jstor.org/stable/2289387.

Woolson, R. F., J. A. Bean, and P. B. Rojas. 1986. “Sample Size
for Case-Control Studies Using Cochran’s Statistic.”
*Biometrics* 42 (4): 927–32. https://pubmed.ncbi.nlm.nih.gov/3814733.