*How should a research question be written in such a way that the corresponding statistical analysis is figured out? Here is an illustrative example.*

One of the difficulties encountered by my graduate students in statistics is how to frame questions in such a way that those questions will lend themselves to appropriate statistical analysis. They are particularly confused on how to write questions for test of difference or correlation. This article deals with the former.

How should the research questions be written and what are the corresponding statistical tools to use? This question is a challenge to someone just trying to understand how statistics work; with practice and persistent study, it becomes an easy task.

There are proper ways on how to do this; but you need to have a good grasp of the statistical tools available, at least the basic ones, to match the research questions or vice-versa. To demonstrate the concept, let’s look at the common ones, that is, those involving difference between two groups.

### Example Research Question to Test for Significant Difference

Let’s take an example related to education as the focus of the research question. Say, a teacher wants to know if there is a difference between the academic performance of pupils who have had early exposure in Mathematics and pupils without such exposure. Academic performance is still a broad measure, so let’s make it more specific. We’ll take summative test score in Mathematics as the variable in focus. Early exposure in Mathematics means the child played games that are Mathematics-oriented in their pre-school years.

To test for difference in performance, that is, after random selection of students with about equal aptitudes, the same grade level, the same Math teacher, among others; the research question that will lend itself to analysis can be written thus:

- Is there a significant difference between the Mathematics test score of pupils who have had early Mathematics exposure and those pupils without?

Notice that the question specifies a comparison of two groups of pupils: 1) those who have had early Mathematics exposure, and, 2) those without. The Mathematics summative test score is the variable to compare.

### Statistical Tests for Difference

What then should be the appropriate statistical test in the case described above? Two things must be considered: 1) sampling procedure, and 2) number of samples.

If the researcher is confident that he has sampled randomly and that the sample approaches a normal distribution, then a t-test is appropriate to test for difference. If the researcher is not confident that the sampling is random, or, that there are only few samples available for analysis and most likely the population approximates a non-normal distribution, Mann-Whitney U test is the appropriate test for difference. The first test is a parametric test while the latter is a non-parametric test. The nonparametric test is distribution-free, meaning, it doesn’t matter if your population exhibits a normal distribution or not. Nonparametric tests are best used in exploratory studies.

A random distribution is achieved if a lot of samples are used in the analysis. Many statisticians believe this is achieved with 200 cases, but this ultimately depends on the variability of the measure. The greater the variability, the greater the number required to produce a normal distribution.

A quick inspection of the distribution is made using a graph of the measurements, i.e., the Mathematics test score of pupils who have had early Mathematics exposure and those without. If the scores are well-distributed with most of the measures at the center tapering at both ends in a symmetrical manner, then it approximates a normal distribution (Figure 1).

If the distribution is non-normal or if you notice that the graph is skewed to the left or to the right (leans either to the left or to the right), then you will have to use a non-parametric test. A skewed distribution means that most students have low scores or most of them have high scores. This means that you favor selection of a certain group of pupils. Each pupil did not have an equal chance of being selected. This violates the normality requirement of parametric tests such as the t-test although it is robust enough to accommodate skewness to a certain degree. F-test may be used to determine the normality of a distribution.

### Writing the Conclusion Based on the Statistical Analysis

Now, how do you write the results of the analysis? If it was found out in the above statistical analysis that there is a significant difference between pupils who have had Mathematics exposure early in life compared to those who did not, the statement of the findings should be written this way:

*The data presents sufficient evidence that there is a significant difference in the Mathematics test score of pupils who have had early Mathematics exposure compared to those without. *

It can be written in another way, thus:

*There is reason to believe that the Mathematics test score of pupils who have had early Mathematics exposure is different from those without.*

Do not say, *it was proven that*… Nobody is 100% sure that this conclusion will always be correct. There will always be errors involved. Science is not foolproof. There are always other possibilities.

© 2013 October 12 P. A. Regoniel

You are doing a great job…keep it up!