17.3.8.2 Algorithms (Two sample proportion test)
Let \(n_{1}\!\) be the size of sample 1 and \(x_{1}\!\)be the number of event or success ,then the sample proportion \(\tilde{p_{1}}\!\) can be expressed:\(\tilde{p_{1}}=\frac{x_{1}}{n_{1}}\).
Similarly,for another sample , sample size is \(n_{2}\!\) and \(x_{2}\!\) is the number of event,then sample proportion \(\tilde{p_{2}}=\frac{x_{2}}{n_{2}}\)
Contents
Hypotheses
Let \(p_{1}\!\) and \(p_{1}\!\) be the true population proportion for sample 1 and 2. and the \(d_{0}\!\) is the hypothesized difference between the population proportions.
\(H_0:p_{1}-p_{2}=d_{0}\!\) for two tailed test
\(H_0:p_{1}-p_{2}\ge d_{0}\!\) for One-tailed test
\(H_0:p_{1}-p_{2}\le d_{0}\!\) for One-tailed test
Normal Approximation
P Value
we can perform normal approximation test with assumptions : \(x_{1}\ge10\!\) and \(n_{1}-x_{1}\ge10\!\), \(x_{2}\ge10\!\) and \(n_{2}-x_{2}\ge10\!\) .
To perform the test, calculates the \(z\!\) and \( p_{value}\!\) value :
\(z=\frac{\tilde{p_{1}}-\tilde{p_{2}} -d_{0}}{\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}}+\frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}} \!\) .
A special case is that when \(d_{0} \) is zero, Origin can use a pooled estimate of p for the test if you check the "pooled" box to do this:
\(z=\frac{\tilde{p_{1}}-\tilde{p_{2}}}{\sqrt{\tilde{p_{0}}(1-\tilde{p_{0}})({\frac{1}{n_{1}}+ \frac{1}{n_{2}}}})}\!\) , where\(p_{0}=\frac{x_{1}+x_{2}}{n_{1}+n_{2}}\)
The p-values for each hypotheses are given by:
\(H_0:p_{1}-p_{2}=d_{0}\!\) ,\(p_{value}=2P(Z_{1}\ge|z|)\!\),for two tailed test
\(H_0:p_{1}-p_{2}\ge d_{0}\!\),\(p_{value}=P(Z_{1}\le z)\!\),for upper tailed test
\(H_0:p_{1}-p_{2}\le d_{0}\!\) ,\(p_{value}=P(Z_{1}\ge z)\!\)for lower tailed test
Confidence Interval
For a given confidence level\(1-\alpha\),the confidence interval for the sample proportion can be generated by:
| Null Hypothesis | Confidence Interval |
|---|---|
| \[H_0:p_{1}-p_{2}=d_{0}\!\] | \[\left[(\tilde{p_{1}}-\tilde{p_{2}})- Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}, (\tilde{p_{1}}-\tilde{p_{2}})+ Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}\right]\] |
| \[H_0:p_{1}-p_{2}\ge d_{0}\!\] | \[\left[(\tilde{p_{1}}-\tilde{p_{2}})- Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}, 1\right]\] |
| \[H_0:p_{1}-p_{2}\le d_{0}\!\] | \[\left[-1, (\tilde{p_{1}}-\tilde{p_{2}})+ Z_{\frac{\alpha}{2}}\sqrt{\frac{\tilde{p_{1}}(1-\tilde{p_{1}})}{n_{1}}+ \frac{\tilde{p_{2}}(1-\tilde{p_{2}})}{n_{2}}}\right]\] |
Fisher's Exact Test
Exact P_value
Fisher's exact test can be used for all sample sizes when \(d_{0} \!\) is zero. Let p(x) denote the probility of hypergeometric distribution when X=x.
\[P(X=x)=\frac{\begin{pmatrix}x_{1}+x_{2} \\{x}\end{pmatrix}\begin{pmatrix}{n_{1}+n_{2}-x_{1}-x_{2}}\\{n_{1}-x}\end{pmatrix}}{\begin{pmatrix}{n_{1}+n_{2}}\\{n_{1}}\end{pmatrix}}\]
Let M denote hypergeometric distribution mode: \(M=\left \lfloor \frac{(n_1+1)(x_1+x_2+1)}{n_1+n_2+2}\right \rfloor\)
The p-values for each hypothesis are given below:
\(H_0:p_{1}\ge p_{2}\!\), \(p_{value}=P(x\le x_{1})\!\)
\(H_0:p_{1}\le p_{2}\!\), \(p_{value}=P(x\ge x_{1})\!\)
When \(H_0:p_{1}= p_{2}\!\):
\(a:x_{1} < M\!\): \(p_{value} = P(X\le x_{1}) + P(X\ge y)\)
where y is the smallest integer \(\ge M\) such that \(p(y) \le p(x_1)\!\).
\[b:x_{1} = M\!\]
\[p_{value} = 1.0\!\]
\[c: x_1 > M\!\]
\[p_{value} = P(X\ge x_{1}) + P(X\le y)\]
where y is the largest integer \(\le M\) such that \(p(y) \le p(x_1)\!\).