According to Cappelleri and Darlington, (1994), Cohen Statistical Power Analysis is one of the most popular approaches in the behavioural sciences in calculating the required sampling size. According to Cohen (1998), in order to perform a statistical power analysis, five factors need to be taken into consideration:
1. significance level or criterion
2. effect size
3. desired power
4. estimated variance
5. sample size
Cohen (1988) statistical power analysis exploits the relationships among the five factors involved in statistical inferences. For any statistical model, these relationships are such that each is a function of the other four. Taking that into consideration, it means that if sample size is to be determined, it can be estimated for any given statistical test by specifying values for the other four factors: (1) significance level, (2) effect size, (3) desired power and (4) estimated variance.
When Cohen’s statistical power analysis is used to determine the sample size, the objective of the analysis is to calculate an adequate sampling size so as to optimise as opposed to maximising sampling effort within the constraint of time and money. Optimising sampling efforts will avoid situations where lack of subjects is considered giving rise to inconclusive inference-making. Contrary, maximising sampling efforts occur when the collection of data goes beyond the required level to achieving significant results, thereby, limited resources are wasted.
In order to determine an adequate sample size, the values of significance level, effect size, power and estimated variance have to be pre-determined. The statistical level of significance for most studies in the teaching field is often fixed at alpha = .05. Alpha is the probability of wrongly rejecting the null hypothesis, thus committing Type I error. Assigning a less stringent alpha would increase the risk of false rejection or ‘crying wolf’ (Eagle, 1999), casting doubts on the validity of the results. However, if the alpha is too conservative, evidence from the findings might fail to reject the null hypothesis in the presence of substantial
population effect. Therefore, setting the alpha at .05, is considered the most conventional level of significance, which is normally used in the field of education. (Ary, et al., 1996). The next factor to be determined is the effect size. Effect size generally means the degree to which the phenomenon is present in the population or the degree to which the null hypothesis is false (Cohen, 1988). It essentially measures the distance or discrepancy between the null hypothesis and a specified value of the alternative hypothesis. Each statistical test has its own effect size index.
All the indexes are scale free and continuous ranging from zero upwards statistical test, the null hypothesis has an effect size of zero. For example, in using the product-moment correlation to test a sample for significance, the effect size index is r, and H0 posits that r = 0. For multiple regression, the effect size index is f2 and H0 posits that f2 = 0. Effect size can be measured using raw values or standardised values. Cohen has standardised effect sizes into small, medium and large values depending on the type of statistical analyses employed. The effect sizes to test the significance of productmoment correlation coefficient, r, are, .10, .30, and .50, for small, medium and large respectively. For regression analysis, the effect size index, f2 for small, medium and large effect sizes are f2 = .02, .15, and .35 respectively. The smaller the effect size, the more difficult it would be to detect the degree of deviation of the null hypothesis in actual units of response. Cohen (1992) proposed that a medium effect size is desirable as it would be able to approximate the average size of observed effects in various fields. Cohen (1992) also argued that a medium effect size could represent an effect that would likely be “visible to the naked eye of a careful observer” (p156).
All the indexes are scale free and continuous ranging from zero upwards statistical test, the null hypothesis has an effect size of zero. For example, in using the product-moment correlation to test a sample for significance, the effect size index is r, and H0 posits that r = 0. For multiple regression, the effect size index is f2 and H0 posits that f2 = 0. Effect size can be measured using raw values or standardised values. Cohen has standardised effect sizes into small, medium and large values depending on the type of statistical analyses employed. The effect sizes to test the significance of productmoment correlation coefficient, r, are, .10, .30, and .50, for small, medium and large respectively. For regression analysis, the effect size index, f2 for small, medium and large effect sizes are f2 = .02, .15, and .35 respectively. The smaller the effect size, the more difficult it would be to detect the degree of deviation of the null hypothesis in actual units of response. Cohen (1992) proposed that a medium effect size is desirable as it would be able to approximate the average size of observed effects in various fields. Cohen (1992) also argued that a medium effect size could represent an effect that would likely be “visible to the naked eye of a careful observer” (p156).
Next to determine is the statistical power. The power of a statistical test is defined as the probability that a statistical significance test will lead to the rejection of the null hypothesis for a specified value of an alternative hypothesis (Cohen, 1988). Power analysis has the ability to reject the null hypothesis in favour of the alternative when there is sufficient evidence from a collected sample that a value of a parameter from the population of interest is different from the hypothesised value (High, 2000). Putting it simply, it is the probability of correctly rejecting the null hypothesis given that the alternative hypothesis is true. In statistical parlance, power is expressed as 1-b, where b is the probability of wrongly accepting the null hypothesis when it is actually false or failure to reject null hypothesis that is false. This is known as committing Type II error. The value can range between zero to one.
According to Thomas and Juanes (1996), power analysis is a critical component in designing experiments and testing results. However, computing power for any specific study can be a difficult task. High (2000) argued that when low power is used in a study, the risk of committing Type II error is higher, that is, there is little chance of detecting a significant effect, which can give rise to an indecisive result. Stating it differently, the effect is there but the power is too low to detect it. However, if the power is set too high, a small difference in the effect is detectable, which means that the results are significant, but the size of the effect is not practical or of little value. In addition, a larger power would result in a demand for N that is likely to exceed the resources of the researcher (Cohen, 1992). To avoid these problems, Cohen (1992) suggested fixing the power at .80 (b = .20), which is also a convention proposed for general use. However, this value is not fixed. It can be adjusted depending on the type of test, sample size, effect size as well as the sampling variation.
The fourth and last factor to determine is standard deviation, which is often used for estimating the variation in the response of interest. This value can be obtained, either from previous studies or pilot studies. However, when standardised measures are dimensionless quantities, the sampling variance is already implicitly incorporated. Such standardised measures include the d-values or correlation coefficients and as such the value of variance is not required (Thomas & Krebs,1997). Therefore if study aims to look at the correlation of variables, this value is not needed for calculating the sample size of the study. Using the factors mentioned above to estimate sample size, the next section aims to illustrate the use of the Cohen Statistical Power Analysis to calculate an adequate sample size. However, before the sample size is estimated, researchers need to predetermined factors pertaining to alpha size, effect size and power. Additionally, it is also important for researchers to know the underlying objectives of the study and how the data will be analyzed to achieve the objectives. This is because, the sampling size varies according to the type of statistical tests performed on the data gathered.
For instance, the factors pre-determined in order to estimate an adequate sample size for a study are, the alpha level is set at .05, the effect size is medium and the power is set at .80. For illustrative purposes, two statistical tests will be used to analyse the data of a study, such as, Pearson Product Moment Correlation and
Multiple regression analysis. Using the predetermined values and the two statistical tests as guidelines, the next section will illustrate on how to calculate a suitable sample size.
2 comments:
nice blog..
salam kenal ya..
I like this. I just find a company "Informatics Outsourcing". They are Market Research service company. They are doing Quantitative and Qualitative Market Research Worldwide. In Market Research they are handling Statistical Power Analysis service.
Post a Comment