How do you decide if, indeed, there is a statistically significant relationship between two variables in your study? What does the p-value output in statistical software analysis mean? This article explains the concept and provides examples with computations and video tutorial.
What does a researcher mean if he says there is a statistically significant relationship between two variables in his study? What makes the relationship statistically significant?
These questions imply that a test for correlation between two variables was made in that particular study. The specific statistical test could either be the parametric Pearson Product-Moment Correlation or the non-parametric Spearman’s Rho test.
Statistical Software Applications to Test a Statistically Significant Relationship
It is now easy to do computations using popular statistical software applications like the popular Statistical Package for the Social Sciences (SPSS) or Statistica and even using the data analysis function of spreadsheets like the proprietary Microsoft Excel and the open-source but less popular Gnumeric.
Once the statistical software has finished processing the data, You will get a range of correlation coefficient values along with their corresponding p-values denoted by the letter p and a decimal number for a one-tailed and two-tailed test. The p-value is the one that matters when trying to judge whether there is a statistically significant relationship between two variables.
Confusing Definition of p-value
Many of my students in the statistics course I teach are confused about the meaning of p-value. I understand this dilemma because the references I see online do not explain in easily understandable language the meaning of the p-value.
For example, Investopedia, the top resource that returns out of 83 million-plus searches on the meaning of p-value, is challenging to understand for a beginning researcher, an undergraduate, or even a graduate student for that matter. If you click the link I provided on the meaning of p-value, you will understand what I mean.
If you understand the definition provided there, then there is no need for you to read the rest of the sections. Even with my advanced graduate school education, I find the explanation vague if I would not exert an effort to try to understand it.
Hence, I explain the meaning of p-value in the next section in the best way I can in the simplest manner possible. I provide a detailed explanation of what it means in the next section.
The Meaning of p-value
What does the p-value mean? This value never exceeds 1. Why?
The computer-generated p-value represents the estimated probability of rejecting the null hypothesis (Ho) that the researcher formulated at the beginning of the study. The null hypothesis is stated in such a way that there is “no” difference between the two variables being tested.
Therefore, this means that as a researcher, you should be clear about what you want to test in the first place.
For example, your null hypothesis that will lend itself to statistical analysis should be written like this:
H0: There is no relationship between the long quiz score and the number of hours devoted by students in studying their lessons.
If the computed value is exactly 1 (p = 1.0), this means that the relationship is absolutely correlated. There is no doubt that the long quiz score and the number of hours spent by students in studying their lessons are correlated.
That p-value means a 100% probability (read simply as 100 percent sure) that the long quiz score and the number of hours devoted by the students in studying their lessons are correlated. The greater the number of hours devoted by students in learning their courses, the higher their long quiz scores. As simple as that.
Conversely, if the p-value is 0, this means there is no correlation at all. It means that whether the students study or not, their long quiz scores are not affected at all.
Why in Reality the p-value of 1 is Not Possible
In reality, unless you are specializing in Physics, a p-value of 1 is difficult to obtain when studying human behavior. In our example, many factors or variables influence the long quiz score. Variables like the student’s intelligence quotient, the teacher’s teaching skill, the difficulty of the quiz, among others, affect the score.
Now, this means that the p-value should not be 1 or numbers greater than that. If you get a p-value of more than 1 in your computation, that’s nonsense. Your p-value, I repeat once again, should range between 1 and 0.
To illustrate, if the p-value you obtained during the computation is equal to 0.5, this means that there is a 50% chance that one variable is correlated to the other variable. In our example, we can say that there is a 50% probability that the long quiz score is correlated to the number of hours spent by students in studying their lessons.
Or to put it simply, we can say that there is a statistically significant relationship between the long quiz scores of students and the number of hours that they spend studying their lessons.
Deciding Whether the Relationship is Significant
Suppose the probability in the example given above is p = 0.05. Is it good enough to say that there is a statistically significant relationship between long quiz scores and the number of hours spent by students studying their lessons?
The answer is NO. Why?
In today’s standard rule or convention in the world of statistics, statisticians adopt a significance level denoted by alpha (α) as a pre-chosen probability for significance. The value for significance is usually set at either 0.05 (statistically significant) or 0.01 (statistically highly significant). These numbers represent 5% and 1% probability, respectively.
Comparing the computed p-value with the pre-chosen probabilities of 5% and 1% will help you decide whether the relationship between the two variables is significant or not. If, say, the p-values you obtained in your computation are 0.5, 0.4, or 0.06, you should accept the null hypothesis. That is if you set alpha at 0.05 (α = 0.05). If the value you got is below 0.05 or p < 0.05, then you should accept your alternative hypothesis.
In the above example, the alternative hypothesis that should be accepted when the p-value is less than 0.05 will be:
H1: There is a relationship between the long quiz score and the number of hours devoted by students in studying their lessons.
The strength of the relationship is indicated by the correlation coefficient or r values. Guilford (1956) suggested the following categories as guide:
|< 0.20||slight; almost negligible relationship|
|0.20 – 0.40||low correlation; definite but small relationship|
|0.40 – 0.70||moderate correlation; substantial relationship|
|0.70 – 0.90||high correlation; marked relationship|
|> 0.90||very high correlation; very dependable relationship|
Computation of Correlation in SPSS
If you want to learn about how to use SPSS in computing correlations, the eight-minute tutorial by Dr. Bogdan Kostic below on the correlation between Intelligence Quotient (IQ) and Grade Point Average (GPA) will guide you. He demonstrates in detail how the data are encoded in SPSS, how the labels are written, and the process of statistical test selection. In this instance, the correlation between IQ and GPA using Pearson’s product-moment correlation with the accompanying computer output. This video will further strengthen your knowledge on how to determine if there is a significant relationship between two groups of variables.
More examples and demonstrations on how to find out if there is a statistically significant relationship between variables are given in the two articles below. These articles provide example computer outputs and how these are interpreted.
More Easy-to-Follow Tips
For very easy-to-follow tips on how to select the appropriate statistical tests for your study, see my eBook on statistics at the middle of the page in Simplyeducate.me‘s eBook store.
Guilford, J. P., 1956. Fundamental statistics in psychology and education. New York: McGraw-Hill. p. 145.
© 2014 May 29 P. A. Regoniel
Updated 13 November 2020