Tag Archives: example of t-test

How to Use Gnumeric in Comparing Two Groups of Data

Are you in need of a statistical software but cannot afford to buy one? Gnumeric is just what you need. It is a powerful, free statistical software that will help you analyze data just like a paid one. Here is a demonstration of what it can do.

Many of the statistical softwares available today in the windows platform are for sale. But do you know that there is a free statistical software that can analyze your data as well as those which require you to purchase the product? Gnumeric is the answer.

Gnumeric: A Free Alternative to MS Excel’s Data Analysis Add-in

I discovered Gnumeric while searching for a statistical software that will work in my Ubuntu Linux distribution, Ubuntu 12.04 LTS, which I enjoyed using for almost two years. I encountered it while looking for an open source statistical software that will work like the Data Analysis add-in of MS Excel. 

I browsed a forum about alternatives for MS Excel’s data analysis add-in. In that forum, a student lamented that he cannot afford to buy MS Excel but was in a quandary because his professor uses MS Excel’s Data Analysis add-in to solve statistical problems. A professor recommended Gnumeric in response to a student’s query in a forum about alternatives to MS Excel. Not just a cheap alternative but a free one at that.I described earlier how the Data Analysis function of Microsoft Excel add-in is activated and used in comparing two groups of data, specifically, the use of t-test.

One of the reasons why computer users avoid the use of free softwares such as Gnumeric is that these are lacking in features found in purchased products. But as what happens to any Linux software application, Gnumeric has evolved and improved much through the years based on the reviews I read. It works and produces statistical output just like MS Excel’s Data Analysis add-in. That’s what I discovered when I installed the free software using Ubuntu’s Software Center.

Analyzing Heart Rate Data Using Gnumeric

I tried Gnumeric in analyzing the same set of data on heart rate that I analyzed using MS Excel in the post before this one. I copied the data from MS Excel and pasted them into the Gnumeric spreadsheet.

To analyze the data, you just have to go to the menu, click on Statistics, select the column of the two groups one at a time including the label and input them in separate fields. Then click the Label box. If you click the Label box, you are telling the computer to use the first row as Label of your groups (see Figs. 1-3 below for a graphic guide).

In the t-test analysis that I employed using Gnumeric, I labeled one group as HR 8 months ago for heart rate eight months ago and another group as HR Last 3weeks as samples for my heart rate for the last six weeks.

t-test Menu in Gnumeric Spreadsheet 1.10.17

The t-test function in Gnumeric can be accessed in the menu by clicking on the Statistics menu. Here’s a screenshot of the menus to click for a t-test analysis. 

menu for t-test
Fig. 1 The t-test menu for unpaired t-test assuming equal variance.

Notice that the Unpaired Samples, Equal Variances: T-test … was selected. In my earlier post on t-test using MS Excel, the F-test revealed that there is no significant difference in variance in both groups so t-test assuming equal variances is the appropriate analysis.

highlight variable 1
Fig. 2. Highlighting variable 1 column inputs the range of values for analysis.

highlight variable 2

Fig. 3. Highlighting variable 2 column inputs the range of values for analysis.

After you have input the data in Variable 1 and Variable 2 fields, click on the Output tab. You may just leave the Populations and Test tabs at default settings. Just select the cell in the spreadsheet where you want the output to be displayed.

Here’s the output of the data analysis using t-test in Gnumeric compared to that obtained using MS Excel (click to enlarge):

Excel and Gnumeric output
Fig. 4. Gnumeric and MS Excel output.

Notice that the output of the analysis using MS Excel and Gnumeric are essentially the same. In fact, Gnumeric provides more details although MS Excel has a visible title and formally formatted table for the F-test and t-test analysis.

Since both software applications deliver the same results, your sensible choice is to install the free software Gnumeric to help you solve statistical problems. You can avail of the latest stable release if you have installed a Linux distribution in your computer. 

Try it and see how it works. You may download the latest stable release for your version of operating system in the Gnumeric homepage.

© 2014 May 3 P. A. Regoniel

Heart Rate Analysis: Example of t-test Using MS Excel Analysis ToolPak

This article discusses a heart rate t-test analysis using MS Excel Analysis ToolPak add-in. This is based on real data obtained in a personally applied aerobics training program.

Do you know that there is a powerful statistical software residing in the common spreadsheet software that you use everyday or most of the time? If you have installed Microsoft Excel in your computer, chances are, you have not activated a very useful add-in: the Data Analysis ToolPak.

See how MS Excel’s data analysis function was used in analyzing real data on the effect of aerobics on the author’s heart rate.

Statistical Analysis Function of MS Excel

Many students, and even teachers or professors, are not aware that there is a powerful statistical software at their disposal in their everyday interaction with Microsoft Excel. In order to make use of this nifty tool that the not-so-discerning fail to discover, you will need to install it as an Add-in to your existing MS Excel installation. Make sure you have placed your original MS Office DVD in your DVD drive when you do the next steps.

You can activate the Data Analysis ToolPak by following the procedure below (this could vary between versions of MS Excel; this one’s for MS Office 2007):

  1. Open MS Excel,
  2. Click on the Office Button (that round thing at the uppermost left of the spreadsheet),
  3. Look for the Excel Options menu at the bottom right of the box and click it,
  4. Choose Add-ins at the left menu,
  5. Click on the line Analysis ToolPak,
  6. Choose Excel Add-in in the Manage field below left, then hit Go, and
  7. Check the Analysis ToolPak box then click Ok.

You should now see the Data Analysis function at the extreme right of your Data menu in your spreadsheet. You are now ready to use it.

Using the Data Analysis ToolPak to Analyze Heart Rate Data

The aim of this statistical analysis is to test whether there’s really a significant difference in my heart rate eight months ago and last week. This is because in my earlier post titled How to Slow Down Your Heart Rate Through Aerobics, I mentioned that my heart rate is getting slower through time because of aerobics training. But I used the graphical method to plot a trend line. I did not test whether there is a significant difference in my heart rate or not, from the time I started measuring my heart rate compared to the last six weeks’ data.

Now, I would like to answer the question is: “Is there a significant difference in heart rate eight months ago and last six week’s record?”

Student’s t-test will be used to analyze 18 readings taken eight months ago and the last six weeks as data for comparison. I measured my heart rate upon waking up (that ensures I am rested) during each of my three-times a week aerobics sessions.

Why 18? According to Dr. Cooper, the training effect accorded by aerobics could be achieved within six weeks, so I thought my heart rate within six weeks should not change significantly. So that’s six weeks times three equals 18 readings.

Eight months would be a sufficient time to effect a change in my heart rate since I started aerobic running eight months ago. And the trend line in the graph I previously presented shows that my heart rate slows down through time.

These are the assumptions of this t-test analysis and the reason for choosing the sample size.

The Importance of an F-test

Before applying the t-test, the first test you should do to avoid a spurious or false conclusion is to test whether the two groups of data have a different variance. Does one group of data vary more than the other? If they do, then you should not use the t-test. Nonparametric methods such as Mann-Whitney U test should be used instead.

How do you make sure that this may not be the case, that is, that one group of data varies more than the other? The common test to use is an F-test. If no significant difference is detected, then you can go ahead with the t-test.

Here’s an output of the F-test using the Analysis ToolPak of MS Excel:

F test
Fig. 1. F-test analysis using the Analysis ToolPak.

Notice that the p-value for the test is 0.36 [from P(F<=f) one-tail]. This means that one group of data does not vary more than the other.

How do you know that the difference in variance in the two groups of data using the F-test analysis is not significant? Just look at the p-value of the data analysis output and see whether it is equal to or below 0.05. If it is 0.06 or higher, then the difference in variance is not significant and t-test could now be used.

This result signals me to go on with the t-test analysis. Notice that the mean heart rate during the last six weeks (i.e., 50.28) is lower than that obtained six months ago (i.e. 53.78). Is this really significant?

Result of the t-test

I had run a consistent 30-points per week last August and September 2013 but now I accumulate at least a 50-point week for the last six weeks. This means that I almost doubled my capacity to run. And I should have a significantly lower heart rate than before. In fact, I felt that I can run more than my usual 4 miles and I did run more than 6 miles once a week for the last six weeks.

Below is the output of the t-test analysis using the Analysis ToolPak of MS Excel:

t test
Fig. 2. t-test analysis using Analysis ToolPak.

The data shows that there is a significant difference between my heart rate eight months ago and the last three weeks. Why? That’s because the p-value is lower than 0.05 [i.e., P(T<=t) two-tail = 0.0073]. There’s a remote possibility that there is no difference in heart rate 8 months ago and the last six weeks.

I ignored the other p-value because it is one-tail. I just tested whether there is a significant difference or not. But because the p-value in one-tail is also significant, I can confidently say that indeed I have obtained sufficient evidence that aerobics training had slowed down my heart rate, from 54 to 50. Four beats in eight months? That’s amazing. I wonder what will be the lowest heart rate I could achieve with constant training.

This analysis is only true for my case as I used my set of data; but it is possible that the same results could be obtained for a greater number of people.

© 2014 April 28 P. A. Regoniel

Example of a Research Question and Its Corresponding Statistical Analysis

How should a research question be written in such a way that the corresponding statistical analysis is figured out? Here is an illustrative example.

One of the difficulties encountered by my graduate students in statistics is how to frame questions in such a way that those questions will lend themselves to appropriate statistical analysis. They are particularly confused on how to write questions for test of difference or correlation. This article deals with the former.

How should the research questions be written and what are the corresponding statistical tools to use? This question is a challenge to someone just trying to understand how statistics work; with practice and persistent study, it becomes an easy task.

There are proper ways on how to do this; but you need to have a good grasp of the statistical tools available, at least the basic ones, to match the research questions or vice-versa. To demonstrate the concept, let’s look at the common ones, that is, those involving difference between two groups.

Example Research Question to Test for Significant Difference

Let’s take an example related to education as the focus of the research question. Say, a teacher wants to know if there is a difference between the academic performance of pupils who have had early exposure in Mathematics and pupils without such exposure. Academic performance is still a broad measure, so let’s make it more specific. We’ll take summative test score in Mathematics as the variable in focus. Early exposure in Mathematics means the child played games that are Mathematics-oriented in their pre-school years.

To test for difference in performance, that is, after random selection of students with about equal aptitudes, the same grade level, the same Math teacher, among others; the research question that will lend itself to analysis can be written thus:

  1. Is there a significant difference between the Mathematics test score of pupils who have had early Mathematics exposure and those pupils without?

Notice that the question specifies a comparison of two groups of pupils: 1) those who have had early Mathematics exposure, and, 2) those without. The Mathematics summative test score is the variable to compare.

Statistical Tests for Difference

What then should be the appropriate statistical test in the case described above? Two things must be considered: 1) sampling procedure, and 2) number of samples.

If the researcher is confident that he has sampled randomly and that the sample approaches a normal distribution, then a t-test is appropriate to test for difference. If the researcher is not confident that the sampling is random, or, that there are only few samples available for analysis and most likely the population approximates a non-normal distribution, Mann-Whitney U test is the appropriate test for difference. The first test is a parametric test while the latter is a non-parametric test. The nonparametric test is distribution-free, meaning, it doesn’t matter if your population exhibits a normal distribution or not. Nonparametric tests are best used in exploratory studies.

A random distribution is achieved if a lot of samples are used in the analysis. Many statisticians believe this is achieved with 200 cases, but this ultimately depends on the variability of the measure. The greater the variability, the greater the number required to produce a normal distribution.

normal distribution
Fig. 1. Shape of a normal distribution of scores.

A quick inspection of the distribution is made using a graph of the measurements, i.e., the Mathematics test score of pupils who have had early Mathematics exposure and those without. If the scores are well-distributed with most of the measures at the center tapering at both ends in a symmetrical manner, then it approximates a normal distribution (Figure 1).

If the distribution is non-normal or if you notice that the graph is skewed to the left or to the right (leans either to the left or to the right), then you will have to use a non-parametric test. A skewed distribution means that most students have low scores or most of them have high scores. This means that you favor selection of a certain group of pupils. Each pupil did not have an equal chance of being selected. This violates the normality requirement of parametric tests such as the t-test although it is robust enough to accommodate skewness to a certain degree. F-test may be used to determine the normality of a distribution.

Writing the Conclusion Based on the Statistical Analysis

Now, how do you write the results of the analysis? If it was found out in the above statistical analysis that there is a significant difference between pupils who have had Mathematics exposure early in life compared to those who did not, the statement of the findings should be written this way:

The data presents sufficient evidence that there is a significant difference in the Mathematics test score of pupils who have had early Mathematics exposure compared to those without. 

It can be written in another way, thus:

There is reason to believe that the Mathematics test score of pupils who have had early Mathematics exposure is different from those without.

Do not say, it was proven that… Nobody is 100% sure that this conclusion will always be correct. There will always be errors involved. Science is not foolproof. There are always other possibilities.

© 2013 October 12 P. A. Regoniel

Example of a Research Activity Using t-test

Are you a statistics teacher looking for a simple example of a t-test activity that you can use in your class? Or are you a student who wants to have an idea how t-test works? I describe below an example of a situation where the t-test can be applied right after learning the procedures and understanding how it works. Read more to find out.

Teaching students through practical hands-on exercises enable them to appreciate how the different analytical tools used in research can help them address issues and problems that they encounter in their respective disciplines. I applied this approach in one of my classes in the graduate school. My students consisted of more than 44 graduates of different courses namely education, biology, nursing, environmental science, public administration, mathematics, business and tourism.

After giving them an LCD projector presentation about t-test, a statistical tool to test differences between two groups of data, I gave them a simple situation which can be applied right there in the classroom. This is to find out the difference between one’s heartbeat before and after exercise.

The t-test Research Activity

Since some of my students are graduates of nursing, they are the ones who took charge of recording the heartbeats per minute of all 44 students in every 5-member group before they exercise. After recording the heartbeats of each of their classmates, the whole class marched briskly in the classroom for about 5 minutes. It is expected that their heartbeats should be higher after the brisk walk in place.

I just can’t keep myself from getting amused seeing them enjoy the activity. I can see smiles in their faces while those who can’t keep their peace laughed it all the way. I even took a picture and a video to record this momentous occasion.

It was 7 o’clock in the evening as classes in the graduate school are held from 5:30 to 8:30 in the evening. This activity is quite beneficial to employees of the various government and non-government institutions where these students are working. Sleepiness and tiredness of the whole work day is dispelled for the moment as they stretch their leg as well as face muscles.

Right after the exercise, each student recorded their heartbeats and gave them to their group leaders. The group leaders then recorded the numbers on the board for everyone to see. Everyone in the class computed for the t-test value and compared their results with those of their classmates.

t test

Their findings showed, of course, a significant difference between heartbeats before and after exercise. But something intriguing happened. Some of the students have actually lower heartbeats after they exercised. These somehow puzzled us because before the exercise, everyone rested for about 10 minutes or even more.

Discussion of the t-test Results

This finding shows that there are unexpected things that could happen in the course of doing research. And explanation to this phenomenon requires further investigation. Why did the heartbeat decrease after exercise? Is this something worth investigating. Will we get the same results if a greater number of people are involved in the study?

My hypothesis in this case is that at the end of the day, everyone is quite stressed after work; thus, their rapid heartbeat. While doing the exercise, somehow their muscles relaxed and caused blood flow to be much more efficient, causing their heartbeats to drop.

This may be something that has already been discovered. A review of literature should be done to find out. This could be a groundbreaking study related to stress and exercise. And computing for the t-test value may be applied to find out differences in means before and after exercise.

Pedagogical Approach that Works!

The whole activity transpired within the three-hour duration of classes each week. It consisted of a short lecture followed by application of knowledge gained right there. The activity is quite memorable and found quite effective in getting across the principles of research and statistics and how it is applied in real life.

At the end of the day, the students were able to understand and actuate their learning through practical, hands-on experience. Most of the class were able to compute for the t-test value without a fuss.

© 2012 October 28 P. A. Regoniel