Data Analysis Statistics

The Importance of data visualization in business

The internet has grown immensely in the last decade and growth continues to accelerate. This growth means that the amount of data present on the web has also grown to an enormous size. This data can be used to the benefit of many businesses in understanding market trends, customer behaviors and the growth or decline of a product. According to a recent report on global data management, 95% of the organizations in the US use their data sets and big data to understand their market and develop business strategies. There are a number of ways how big data can help drive business intelligence in this data focused world.

This image has an empty alt attribute; its file name is -R-1On_wgdHpKJbYXCi2XcqvkkePzSXedEWwhNPl0GkF47F_Z-2R8hemL3hMD2BZ2ipIpmRViWBC_JwbRA9x66NNWmEJuiHMGCNj7P2R0Q9l0bEY8NEaFQ1lU57W0OtFzXjM03k

Ease the Understanding of Information

A picture is worth a thousand words. Or, in this case, a picture is worth many thousands of data entries. A data visualization tool as simple as a pie chart can help you visualize the data that would otherwise consume a massive data source such as a grid or table. Data visualization helps people understand and absorb information quickly, by making them look at the bigger picture instead of thousands of pieces of a puzzle. By looking at this bigger picture, people can easily correlate and understand the relationships between business conditions and bring them into focus. In short, data analysis and data visualization help you connect the dots in your business and your data.

For example, this simple pie chart sums up the data about the population of the entire world and classifies it based on the region. This includes data from 195 countries and 7 continents summed up in a simple, small chart.

Easily Convey your Message

These days with huge amounts of information flowing to people, it has become more difficult for businesses to grab the attention of their audience. If they manage to do so, it is almost impossible to hold it for longer than a minute until they lose their attention again. With such short attention spans, it is important to convey your message quickly and effectively. Data visualization helps you share your data and insights quickly without losing the interest of your audience. The dashboard of a fitness band application is a perfect example of this; it packs different aspects of data into amazing graphics and gives the user a clear idea of his fitness progress at a glance.

Reduce the Need for IT Geeks

Only a short time ago, when data visualization was not as popular as today, understanding big data was very difficult. Most organizations that wanted to reap the benefits of big data had to hire IT specialists. These IT specialists or data scientists would harvest the data from the web and work to understand the patterns. The problem with these IT specialists was that they did not know what to do with data or from what perspective to look at those trends in the big data. Nowadays data visualization software has made it easier for insight managers and non-technical people to understand complex data in real-time. Business users can now easily develop insights according to the information available and use it to the benefit of their organizations with self-service reporting that doesn’t require data scientists to configure.

Recognize the Outliers

Data visualization helps you recognize the outliers in your data. Seeing a drop in sales and being able to jump on and address that quickly can meaningfully impact your bottom line. Conversely, seeing a jump in sales and being able to maximize opportunities for your business as they happen can have a long term positive result. By avoiding negative impacts and expanding positive ones, paying attention to outliers in your data with business analytics can maximize your business returns and enhance data-driven decisions.


One of the main reasons behind the popularity of big data and data visualization is that it helps pull back the veil on business data and reveal important market trends. It provides them insight into what a customer likes in their product and also enabled them to learn some of the negative aspects of their products. In short data visualization helps businesses develop better strategies to improve their performance and decision making. Intelligence tools processing real-time data yield actionable insights and facilitate data exploration. Regardless of the large amounts of data in your organization, data analytics combined with interactive visualizations in your analytics platform helps key decision makers strategize and make informed decisions to drive your products forward!

Take Action

The last but most important step in understanding your data is taking action according to your understanding. Data visualization has helped us at every step in business. From understanding the data to presenting it to the audience, and building strategies. In the last step data visualization helps you review your strategies, implement them and evaluate them from time to time using bespoke solutions or prebuilt BI tools, it’s never been easier to perform in-depth data discovery for your business. And if any issues are found, data visualization helps you identify them and take action quickly to get better results.

This article is contributed by JSCharting.

Data Analysis Research

Information System: Its Definition and Role in Decision Making

What is an information system? How can it influence an organization’s effectiveness? This article defines information system and how it works.

The rapid pace of urban development in the information age is made possible by computer-based information systems. Middle level and upper-level managers benefit a lot from the outputs of a well-designed and efficient information system. In a highly competitive world, information systems define the winners and the losers in many areas: economic, political, social, among others.

But what is an information system? How does it work? How can managers make use of it?

Definition of Information System

An information system is an organized scheme of people and data collection and retrieval tools to produce information. Data is meaningless unless analyzed or processed to meet the needs of the users. Thus, data processors which may be human or machines, process the data and produce information. Information may be in the form of graphs, tables, figures or any output that translates data into understandable forms. Thus, information is processed data.

Modern organizations use computer-based or computer information systems because of its high efficiency in delivering information. Manual information systems, while still in use, is slower and relies mainly on the ability of people to process data.

In the age of information, information systems are synonymous with computer-based or computer information systems. That is because computers are used to process data into understandable chunks of information that the user needs. Slow data processing systems that rely on manual retrieval of data from physical folders or files in a metal cabinet are gradually phased out in modern workplaces.

information system
information system The information system in relation to the business world (Source:

How Does a Computer Information System Work?

A computer information system requires the input of data, a processing capability, and the ability to produce an output that can be stored for future use. The acronym IPOS summarizes the components of an information system. This acronym stands for Input, Process, Output, and Storage.

In a computer information system, an input is made through the use of a keyboard, a mouse, or a microphone. Process refers to data analysis using software applications that take advantage of the computer’s processor. Computers perform complex calculations to organize data into useful outputs that can be displayed on a screen or printed on paper. It makes sense of data whose raw form is meaningless.

The output may be used immediately or retrieved from a storage whenever necessary. Flash drives, hard disks, and cloud storage facilities are commonly used to store both data and information.

Requisites of Good Information

The information produced in an information system is only as good as the data used to generate it. It follows the GIGO principle: Garbage In, Garbage Out. Wrong information produces false results.

According to Zikmund (1999), useful information should be 1) relevant, 2) timely, 3) of high quality, and 4) complete.

Relevance is the degree to which the information produced is related or useful to the current issue that needs resolution.  Information is timely if it is available whenever needed. Information is of high quality if it is based on accurate data and analyzed correctly. And information is highly useful if it answers all of the user’s queries or requirements.

Good information, therefore, is helpful in decision making if it is produced through systematic means. The rigorous manner applied in conducting research plays an essential role in delivering information that makes clear a decision maker’s options.

See how information is generated in the post titled: Market Analysis: The Pizza Study.


Zikmund, W. (1999). Essentials of marketing research. Dryden Press. 422 pp.

Cite this article as: Regoniel, Patrick (June 8, 2016). Information System: Its Definition and Role in Decision Making [Blog Post]. In SimplyEducate.Me. Retrieved from
Data Analysis Quantitative Research Research Statistics

Market Analysis: The Pizza Study

What is market analysis? How is it done? This article describes how market analysis works using data on a pizza study.

After having defined marketing research in my previous post and giving an example conceptual framework for a pizza study, I decided to get into the details of market analysis using a standard multivariate statistical analysis tool. I saw the need of writing this article upon reading several articles on market analysis. There is a need to demonstrate what is market analysis.

Before everything else, the concept “market analysis” should be defined first.

What is market analysis and how is it used?

Market Analysis Defined

Marketing strategies work best when founded on a systematic evaluation of consumer preferences. What do consumers want? How do they respond to a product or service? Marketing research provides answers to these questions.

Hence, market analysis can be defined as the process of evaluating consumer preferences using a systematic approach such as marketing research, among others. Market analysis is a detailed examination of the elements or structure of the market.

Why is a market analysis done? An analysis is done to draw out important findings for interpretation, discussion and finally, a decision on what steps to make.

The Pizza Study

Once again, the conceptual framework given in the pizza study is given below to serve as a reference in the following discussion.

market analysis of a pizza study
Conceptual framework of the pizza study.

To find out what customers want, let us have a sample data of feedback from 200 pizza shop customers. To understand how analysis works, you need to read the article on variables as these are important units of analysis. If you already understand what variables are, then proceed to read the rest of the discussion.

Coding the Variables for Market Analysis

Let us have the following measures for the variables in this study namely pizza taste, service speed, and waiter courtesy:

Pizza Taste

1 – Very bad
2 – Bad
3 – Moderate
4 – Good
5 – Very good

Service Speed
1 – Satisfied
0 – Not satisfied

Waiter Courtesy
1 – Courteous
0 – Not courteous

Level of Satisfaction
Let us assume that the following Likert scale applies to the customer’s level of satisfaction:

1- Not at all satisfied
2 – Slightly satisfied
3 – Moderately satisfied
4 – Very satisfied
5 – Extremely satisfied

If for example, the customer is satisfied with pizza taste, service speed, and waiter courtesy; he rates everything “5.” If he is not satisfied with courtesy, then he might rate it a “0.”

Multiple Regression Analysis

Below is a data set representing the response of 200 pizza customers that serves as input to multiple regression analysis (you may try the data set if you know how to compute using multiple regression):

You may skip this table by clicking the link below:

Jump to the Results of Analysis.

A table summarizing the results of the pizza survey.

Customer #SatisfactionTasteSpeedCourtesy

Result of the Regression Analysis

The following table presents the results of the multiple regression analysis using a simple spreadsheet software application with regression capability – Gnumeric. The first part shows the general relationship between the dependent and independent variables. The second part shows the details of the relationship between satisfaction score and pizza taste, service speed, and waiter courtesy.

Part 1. Regression Statistics
Multiple R0.66
Standard Error0.47
Adjusted R^20.43
Part 2. Details
CoefficientsStandard Errort-Statisticsp-Value

Notice that the overall relationship has R values. Among these R values, the most important for interpretation is the Adjusted R^2 value. This value represents the relationship between variables of the study. The value obtained here is 0.43. This means 43% of the variation in satisfaction score is accounted for by the three variables.

Closer scrutiny of the details in Part 2 reveals that service speed significantly relates to satisfaction score. The red font indicates this significant relationship (for better understanding, please read the post on how to determine the significance of statistical relationships).

Interpretation of the Results

Based on the results of the statistical analysis, we can say with confidence that among the variables studied, service speed relates significantly to customer satisfaction. If you look closely at the entries in the data set, for every 5 or 4 satisfaction score, a 1 corresponds to service speed, meaning, the customer is satisfied with service speed. Take note, however, that this interpretation holds true only to the particular location where the study transpired.

Given this result, the marketing manager, therefore, should focus on the improvement of service speed to satisfy customers. This simple information can help the pizza business grow and gain a competitive edge. Market analysis guides decision-making and avoids incurring the unnecessary cost associated with the hit-and-miss approach.

Cite this article as: Regoniel, Patrick (May 21, 2016). Market Analysis: The Pizza Study [Blog Post]. In SimplyEducate.Me. Retrieved from
Data Analysis Statistics

How to Analyze Frequency Data

How do you analyze frequency data? How will you know that you have obtained frequency data in your research? What statistical test is appropriate for such data usually obtained from surveys?

This article explains answers to these questions. Read on to find out.

Earlier, I discussed the appropriate statistical tools to use based on the type of data a research project gathers. Analyzing the data itself is quite a challenge to students, especially if they do statistical analysis for the first time.

Now, I would like to focus on a single statistical test, i.e., Chi-square. This discussion is not about the computation per se but on the appropriateness of the test for certain questions pursued in a research investigation. Typically, Chi-square is used in analyzing survey data.

When is a Chi-square test employed? What type of data is appropriate for its use? The straightforward answer is that Chi-square is used when dealing with frequency data.

By the way, what is frequency data? I explain that here with an example.

Frequency Data Example

Frequency data is that data usually obtained from categorical or nominal variables (see the different types of variables and how these are measured). It is best used when you have two nominal variables in your study. The two variables with their respective categories can be arranged in column-wise and row-wise manner. Let me illustrate this arrangement by looking into the way two nominal variables are arranged.

A Hypothetical Survey

An electronics merchant might want to know which cellphone brand is popular among male and female students in a university so that he will be able to know the proportion of brands he should offer in the store. He also wants to know whether gender has anything to do with cellphone preference. He commissioned a business researcher to conduct a survey on cellphone preference.

The research question for this study is:

“Is there an association between gender and cellphone preference?”

The two variables in this study, therefore, are 1) the cellphone brand, and 2) gender. For sure, we know that gender has two categories namely, male and female. As for the cellphone brand, that will entirely depend on the businessman who commissioned the study. In his area, the three dominant brands used by students may be used, say, Nokia, Samsung, and Apple’s iPhone.

Organizing the Data Obtained in the Survey

To organize the data obtained in the aforementioned survey, a table may thus be created to see how gender and cellphone preference are related. A hypothetical frequency table based on a study of cellphone preference in a university is given below:

Brand of Cellphone Preferred









Given the distribution of cellphone preference among students in Table 1, the businessman might be inclined to say that females prefer Nokia over the other brands. But what he is looking into is just data organized in a table. No statistical test has been applied yet.

As both of the variables are nominal or can be classified into categories, the appropriate test to find out if indeed there is an association between gender and cellphone preference is Chi-square.

The formula for Chi-square is:


How should the data be input to the Chi-square formula? What is observed data and what is expected? Details on how to do it is given in another article I wrote in another site using a similar example. I provide a link below:

How to compute for the chi-square value and interpret the results

You may then apply what you have learned in that article to find out whether indeed there is an association between gender and cellphone preference in the example survey given above.

©2015 April 4 P. A. Regoniel

Quantitative Research Statistics

Statistical Sampling: How to Determine Sample Size

How do you determine the sample size required for your specific study? This is an important question considering that the answer determines how much effort you should devote to your research as well as how much money you have to allocate for it. This article explains how sample size should be estimated to obtain the optimal sample size.

As you would not want to sacrifice accuracy for convenience, and to make your research worthwhile, having the correct sample size makes your research more credible. If you sample too little, your results may not be reliable. If you sample too large a size, you will also be spending too much.

Sampling is especially true to quantitative studies, as it tries to define or describe a population by studying a part of it. But how many should be enough?

Here are important considerations when estimating the correct sample size.

4 Measures Required to Estimate Sample Size

Statisticians agree that you have to be familiar with at least four things before you draw a sample from your population. These are enumerated and described below.

1. Size of the Population

As a researcher, you should be familiar with your target population’s size. It is therefore necessary that you define your population so that you can approximate or find ways to estimate the total population and get the optimal size possible.

Let’s say you would want to find out the tourists’ average willingness to pay to access or see a natural park in view of estimating the value of the natural park’s aesthetic value. This means that your population should be the number of tourists who visit the park in one year if you are discussing an annual turnout of visitors. You can get this number from the tourism office especially if park access is for a fee.

Since you cannot interview all of the tourists, a sample may be drawn at a certain point in time which you will determine yourself, bearing in mind the peak and the off seasons to avoid bias. Familiarity with your population, therefore, is a must.

2. Margin of Error or Confidence Interval

Margin of error refers to the range of values that is acceptable to you as you estimate of the population’s mean or average value. What is the percentage of error that you will allow to give you the level of confidence you need? Whatever value you get in estimating say, the mean of your population is not an absolute number. You should allow for little deviations that are statistically acceptable and serve your purpose.

An analogy to illustrate the margin of error is like a hunter trying to hit a deer with his arrow. He aims for the heart but in the process hits the areas within 3 inches of the heart, either below, above, at the left or at the right. That is okay, because what he really wants is to be able to bring the deer home for his meal. Hitting the parts surrounding the heart serves the purpose of going home with the booty. Hitting the lungs or the other internal parts next to the heart can immobilize it.

3. Confidence Level

Confidence level is a little bit confused with margin of error. This is your level of certainty that your estimated mean (the statistic) will fall within the confidence interval that you have set for the estimate.

Again, back to the analogy of hitting the deer with an arrow. The question is “How confident is the archer in hitting the areas surrounding the heart?” If he is really a very good archer, he might say that out of 100 arrows, he is certain that 95 of this would hit the area within 3 inches of the heart. That’s his confidence level or percentage of certainty.

In statistics, the convention is to have a confidence level of either 95% or 99%. The former is a commonly used standard.

Assuming that your population has a normal distribution, the confidence level corresponds to a value of the z-distribution. A z-distribution is a standard normal distribution, meaning, the population approximates a bell-shaped curve.

4. Standard Deviation

The standard deviation is how spread out the numbers are from the mean. To make this concept clear, let’s go back to the hunter example.

Let’s say the hunter shot a target with a bullseye 500 times. As he is a very good archer, most of the arrows would have landed near or at the center but for sure, not always at the center. Those arrows that missed the bullseye are similar to the deviations from the mean. The way the arrows spread from the center indicates deviations from the average.

So how far will the arrows released by the hunter deviate from the center? We don’t know unless we measure the distance of each of the arrows from the center. But we don’t have time to measure all of the 500 arrows he released so we might as well take a sample, say 20 arrows. Those 20 arrows might show that the deviation from the bullseye is within 4 inches. So this value can be used to predict the deviation of the 500 arrows consequently released.

Getting the population standard deviation from 20 samples is analogous to a pilot study of the population. A portion of the population may be studied to estimate the population standard deviation. If it is not possible to do so, it is common practice that a standard deviation of 0.5 is used in estimating sample size.

The population standard deviation is computed by getting the square root of the variance. The variance is the average of the squared differences from the mean. This is denoted by the formula given below:

population standard deviation
Fig. 1 Population standard deviation.

Using Confidence Level, Standard Deviation and Margin of Error to Estimate the Sample Size

If you are now ready with at least three measures to estimate sample size, i.e., margin of error, confidence level and standard deviation, then you are now ready to estimate the sample size you need. For example, let’s have the following data:

Confidence level: 2.326 (the corresponding value in the z table indicating 99% of the population is accounted for)
Standard deviation: 0.5 (assuming that the population standard deviation is unknown)
Margin of error: 5% or 0.05

The following equation is used to compute the sample size:

estimating sample size
Fig. 2. Formula to estimate sample size.

Substituting given values to the equation:

Sample size = ((2.326)² x 0.5(0.5))/(0.05)²
= (5.4103 x 0.25)/ 0.0025
= 1.3526/0.0025
= 541.04 ~ 542 (always round up to the higher integer number)

Therefore, if your research requires interviewing people, the estimated number of interviewees is 542.


Niles, R. (n.d.). Standard deviation. Retrieved on 18 February 2015 from

Smith, S. (2013). Determining Sample Size: How to Ensure You Get the Correct Sample Size. Retrieved on 19 February 2015 from

©2015 February 22 P. A. Regoniel

Data Analysis Statistics

Statistical Analysis: How to Choose a Statistical Test

This article provides a guide for selection of the appropriate statistical test for different types of data. Examples are given to demonstrate how the guide works. This is an ideal read for a beginning researcher.

One of the difficulties encountered by many of my students in the advanced statistics course is how to choose the appropriate statistical test for their specific problem statement. In fact, I had this difficulty too when I started analyzing data for graduate students more than 15 years ago.

The computation part is easy as there are a lot of statistical software applications available, as stand-alone applications, or part of the common spreadsheet applications such as Microsoft Excel. If you really want to save money and is a Linux user, Gnumeric is an open source statistical application software that performs as well as MS Excel. I discovered this free application when I decided to use Ubuntu Linux as my primary operating system. The main reason for the switch was my exasperation with having to spend much time, as well as money for antivirus subscriptions, in an effort to remove persistent windows viruses.

Back to the issue of identifying the appropriate statistical test, I would say that experience counts a lot. But this is not the only basis for judging which statistical test is best for a particular research question, i.e., those that require statistical analysis. A guide on the appropriate statistical test for certain types of variables can steer you towards the right direction.

Guide to Statistical Test Selection

Table 1 below shows what statistical test should be applied whenever you analyze variables measurable by a certain type of measurement scale. You should be familiar with the different types of data in order to use this guide. If not, you need to read the 4 Statistical Scales of Measurement first before you can effectively use the table.

Type of Data
# of Groups
Test Hypothesis for
Statistical Test
1. Ratio/Interval
CorrelationKendall’s Tau/Pearson’s r
VariancesFmax test
VariancesAnalysis of Variance
2. Ordinal
CorrelationSpearman’s rho
CorrelationKruskal-Wallis ANOVA*
3. Nominal (frequency data)
2 Categories

*Used if samples are independent; if correlated, use Friedman Two-Way ANOVA

Some Examples to Illustrate Choice of Statistical Test

Refer to Table 1 as you go through the following examples on statistical analysis of different types of data.

Null Hypothesis: There is no association between gender and softdrink preference.
Type of Data: Gender and sofdrink brand are both nominal variables.
Statistical Test: Chi-Square

Null Hypothesis: There is no correlation between Mathematics score and number of hours spent in studying the Mathematics subject.
Type of Data: Math score and number of hours are both ratio variables
Statistical Test: Kendall’s Tau or Pearson’s r

Null Hypothesis: There is no difference between the Mathematics scores of Sections A and B.
Type of Data: Math scores of both Sections A and B are ratio variables.
Statistical Test: t-test

Once you have chosen a specific statistical test to analyze your data with your hypothesis as a guide, make sure that you encode your data properly and accurately (see The Importance of Data Accuracy and Integrity for Data Analysis). Remember that encoding a single wrong entry in the spreadsheet can make a significant difference in the computer output. Garbage in, garbage out.


Robson, C. (1973). Experiment, design and statistics in Psychology, 3rd ed. New York: Penguin Books. 174 pp.

©2015 February 18 P. A. Regoniel

Curriculum and Instruction Education Empirical Research Quantitative Research Research Statistics

A Research on In-service Training Activities, Teaching Efficacy, Job Satisfaction and Attitude

This article briefly discusses the methodology used by Dr. Mary Alvior in the preparation of her dissertation focusing on the benefits of in-service training activities to teachers. She expounds on the results of the study specifically providing descriptive statistics on satisfaction of in-service training to them and how this affected teaching efficacy, job satisfaction, and attitude in public school in the City of Puerto Princesa in the Philippines.


This study utilized the research and development method (R&D) which has two phases. During the first phase, the researcher conducted a survey and a focus group interview in order to triangulate the data gathered from the questionnaires. Then, the researcher administered achievement tests in English, Mathematics and Science. The results found in the research component were used as bases for the design and development of a model. The model was then fully structured and improved in the second phase.

The participants were randomly taken from 19 public high schools in the Division of Puerto Princesa City, Palawan. A total of fifty-three (53) teachers participated in the study and 2,084 fourth year high school students took the achievement tests.

The researcher used three sets of instruments which underwent face and content validity. These are

  1. Survey Questionnaires for Teacher Participants,
  2. Guide Questions for Focus Group Interview, and
  3. Teacher-Made Achievement Tests for English, Mathematics, and Science.

The topics in the achievement tests were in consonance with the Philippine Secondary Schools Learning Competencies (PSSLC) while the test items’ levels of difficulty was in accordance with Department of Education (DepEd) Order 79, series of 2003, dated October 10, 2003.

Results of Descriptive Statistics

Teachers’ insights on in-service training activities

Seminar was perceived to be the most familiar professional development activity among teachers but the teachers never considered it very important in their professional practice. They also viewed it applicable in the classroom but it had no impact on student performance.

Aside from seminar, the teachers also included conference, demo lesson, workshop and personal research as the most familiar professional development activities among them.

Nonetheless, teachers had different insights as to which professional development activities were applicable in the classroom. Science teachers considered team teaching, demo lesson, and personal research, but the English and Mathematics teachers considered demo lesson and workshop, respectively.

With regard to the professional development activities that were viewed very important in their professional practice and had great impact on student performance, all subject area teachers answered personal research. However, the Mathematics teachers added lesson study for these two categories while the teachers in Science included team teaching as a professional activity that had great impact on student performance.

Moreover, teachers had high regard for the INSET programs they attended and perceived them effective because they were able to learn and developed themselves professionally. They were also highly satisfied with the training they have attended as indicated in the mean (M=3.03, SD=.34). Particularly, they were highly satisfied with the content, design, and delivery of in-service training (INSET) programs, and with the development of their communication skills, instruction, planning, and organization.

Teachers’ teaching efficacy, job satisfaction and attitude

Teachers had high level of teaching efficacy (M=3.14, SD=.27) particularly on student engagement, instructional strategies, and classroom management but not in Information Communication and Technology (ICT). It seems that they were not given opportunities to hone their skills in ICT or they were not able to use these skills in the classrooms. Likewise, they had an average level of job satisfaction (M=2.91, SD=.27) and had positive attitude towards their teaching profession (M=2.88, SD=.44).

In conclusion, there are professional activities that are viewed very important in teaching and there are also which have great impact on students’ academic performance.  In addition, the study found the inclusion of ICT in teaching and for professional development.

To know more about the model derived from this study, please read 2 Plus 1 Emerging Model of Professional Development for Teachers.

© 2014 December 29 M. G. Alvior

Data Analysis Research

Technical Writing Tips: Interpreting Graphs with Two Variables

How do you present the results of your study? One of the convenient ways to do it is by using graphs. How are graphs interpreted? Here are very simple, basic tips to help you get started in writing the results and discussion section of your thesis or research paper. This article specifically focuses on graphs as visual representation of relationships between two variables.

My undergraduate students would occasionally approach me and consult on some of their difficulties they encountered while preparing their thesis. One of those things that they usually ask me is how they should go about the graphs in the results and discussion section of their paper.

How should the graphs and the table be interpreted by the thesis writer? Here are some tips on how to do it, in very simple terms.

Interpreting Graphs

Graphs are powerful illustrations of relationships between the variables of your study. It can show if the variables are directly related. This is illustrated by Figure 1. If one variable increases its value, the other variable increases, too.

graph of direct relationship
Fig. 1. Graph showing a direct relationship between two variables.

For example, if you pump air into a tire, the tire expands, and so does the air pressure inside it to hold the rubber up. This is the pressure-volume relationship. If pressure is increased, there is a corresponding increase in volume. The variables in this relationship are pressure and volume. Pressure may be measured in pounds per square inch (psi) and volume in liters (li) or cubic centimeters (cc).

How about if you have another graph like the one below (Figure 2)? Well, it’s simple like the first one. If one variable increases in value, the other variable decreases in proportionate amounts. This graph shows an inverse relationship between the two variables.

graph inverse relationship
Fig. 2. A graph showing an inverse relationship between two variables.

For example, as a driver increases the speed of the vehicle he drives, the time it takes to reach the destination decreases. Of course, this assumes that there are no obstacles along the way. The variables involved in this relationship are speed and time. Speed may be measured in kilometers per hour (km/hr) and time in hours.

The two examples given are very simplified representations of the relationship between two variables. In many studies, these relationships seldom occur. Graphs show something else. Not really straight lines but curves.

For example, how will you interpret the two graphs below? Some students have trouble interpreting these.

two graphs
Fig. 3. Two graphs showing different relationships between two variables.

Graph a actually just shows that the relationship between the two variables goes up and down then progressively increases. In general, the relationship is directly proportional.

For example, Graph a may show the relationship between profit of a company through time. The vertical line represents profit while the horizontal line represents time. The graph just portrays that initially, the profit increased then at a certain point in time decreased, then recovered and increased all the way through time.

Something may have happened that caused the initial increase to decline. The profit of the company may have declined because of recession. But then when recession was up, profits continued to increase and things get better through time.

How about Graph b? Graph b just means that a variable in question reaches a saturation point. This graph may represent the number of tourists visiting a popular island resort through time. Within the span that the study was made, say 10 years, at about five years since the beach resort started operating, the number of tourists reached a climax then started to decline. The reason may be a polluted coastal environment that caused tourists to shy away from the place.

There are many  variations in the relationship between two variables. It may look like an S curve going up or down, plain horizontal line, or U-shaped, among others. Those are actually just variations of direct and inverse relationship between the two variables. Just note that aberrations along the way are caused by something else, another variable or set of variables or factors that affect one or both variables, which you need to identify and explain.  That’s where your training, imagination, experience, and critical thinking come in.

©2014 November 20 Patrick Regoniel

Data Analysis Statistics

What is a Statistically Significant Relationship Between Two Variables?

How do you decide if indeed the relationship between two variables in your study is significant or not? What does the p-value output in statistical software analysis mean? This article explains the concept and provides examples.

What does a researcher mean if he says there is a statistically significant relationship between two variables in his study? What makes the relationship statistically significant?

These questions imply that a test for correlation between two variables was made in that particular study. The specific statistical test could either be the parametric Pearson Product-Moment Correlation or the non-parametric Spearman’s Rho test.

It is now easy to do computations using a popular statistical software like SPSS or Statistica and even using the data analysis function of spreadsheets like the proprietary Microsoft Excel and the open source but less popular Gnumeric. I provide links below on how to use the two spreadsheets.

Once the statistical software has finished processing the data, You will get a range of correlation coefficient values along with their corresponding p-values denoted by the letter p and a decimal number for one-tailed and two-tailed test. The p-value is the one that really matters when trying to judge whether there is a statistically significant relationship between two variables.

The Meaning of p-value

What does the p-value mean? This value never exceeds 1. Why?

The computer generated p-value represents the estimated probability of rejecting the null hypothesis (H0) that the researcher formulated at the beginning of the study. The null hypothesis is stated in such a way that there is “no” difference between the two variables being tested. This means, therefore, that as a researcher, you should be clear about what you want to test in the first place.

For example, your null hypothesis that will lend itself to statistical analysis should be written like this:

H0: There is no relationship between the long quiz score and the number of hours devoted by students in studying their lessons.

If the computed value is exactly 1 (p = 1.0), this means that the relationship is absolutely correlated. There is no doubt that the long quiz score and the number of hours spent by students in studying their lessons are correlated. That means a 100% probability. The greater the number of hours devoted by students in studying their lessons, the higher their long quiz scores.

Conversely, if the p-value is 0, this means there is no correlation at all. Whether the students study or not, their long quiz scores are not affected at all.

In reality however, this is not the case. Many factors or variables influence the long quiz score. Variables like the intelligence quotient of the student, the teacher’s teaching skill, difficulty of the quiz, among others affect the score.

Now, this means that the p-value should not be 1 or numbers greater than that. If you get a p-value of more than 1 in your computation, that’s nonsense. Your p-value, I repeat once again, should range between 1 and 0.

To illustrate, if the p-value you obtained during the computation is equal to 0.5, this means that there is a 50% chance that one variable is correlated to the other variable. In our example, we can say that there is a 50% probability that the long quiz score is correlated to the number of hours spent by students in studying their lessons.

Deciding Whether the Relationship is Significant

If the probability in the example given above is p = 0.05, is it good enough to say that indeed there is a statistically significant relationship between long quiz score and the number of hours spent by students in studying their lessons? The answer is NO. Why?

In today’s standard rule or convention in the world of statistics, statisticians adopt a significance level denoted by alpha (α) as a pre-chosen probability for significance. This is usually set at either 0.05 (statistically significant) or  0.01 (statistically highly significant). These numbers represent 5% and 1% probability, respectively.

Comparing the computed p-value with the pre-chosen probabilities of 5% and 1% will help you decide whether the relationship between two variables is significant or not. So, if say the p-values you obtained in your computation are 0.5, 0.4, or 0.06; you should accept the null hypothesis. That is, if you set alpha at 0.05 (α = 0.05). If the value you got is below 0.05 or p < 0.05, then you should accept your alternative hypothesis.

In the above example, the alternative hypothesis that should be accepted when the p-value is less than 0.05 will be:

H1There is a relationship between the long quiz score and the number of hours devoted by students in studying their lessons.

The strength of the relationship is indicated by the correlation coefficient or r values. Guilford (1956) suggested the following categories as guide:

< 0.20slight; almost negligible relationship
0.20 – 0.40low correlation; definite but small relationship
0.40 – 0.70moderate correlation; substantial relationship
0.70 – 0.90high correlation; marked relationship
> 0.90very high correlation; very dependable relationship

You may read the following articles to see example computer outputs and how these are interpreted.

How to Use Gnumeric in Comparing Two Groups of Data

Heart Rate Analysis: Example of t-test using MS Excel Analysis ToolPak


Guilford, J. P., 1956. Fundamental statistics in psychology and education. New York: McGraw-Hill. p. 145.

© 2014 May 29 P. A. Regoniel

Data Analysis Statistics

How to Use Gnumeric in Comparing Two Groups of Data

Are you in need of a statistical software but cannot afford to buy one? Gnumeric is just what you need. It is a powerful, free statistical software that will help you analyze data just like a paid one. Here is a demonstration of what it can do.

Many of the statistical softwares available today in the windows platform are for sale. But do you know that there is a free statistical software that can analyze your data as well as those which require you to purchase the product? Gnumeric is the answer.

Gnumeric: A Free Alternative to MS Excel’s Data Analysis Add-in

I discovered Gnumeric while searching for a statistical software that will work in my Ubuntu Linux distribution, Ubuntu 12.04 LTS, which I enjoyed using for almost two years. I encountered it while looking for an open source statistical software that will work like the Data Analysis add-in of MS Excel. 

I browsed a forum about alternatives for MS Excel’s data analysis add-in. In that forum, a student lamented that he cannot afford to buy MS Excel but was in a quandary because his professor uses MS Excel’s Data Analysis add-in to solve statistical problems. A professor recommended Gnumeric in response to a student’s query in a forum about alternatives to MS Excel. Not just a cheap alternative but a free one at that.I described earlier how the Data Analysis function of Microsoft Excel add-in is activated and used in comparing two groups of data, specifically, the use of t-test.

One of the reasons why computer users avoid the use of free softwares such as Gnumeric is that these are lacking in features found in purchased products. But as what happens to any Linux software application, Gnumeric has evolved and improved much through the years based on the reviews I read. It works and produces statistical output just like MS Excel’s Data Analysis add-in. That’s what I discovered when I installed the free software using Ubuntu’s Software Center.

Analyzing Heart Rate Data Using Gnumeric

I tried Gnumeric in analyzing the same set of data on heart rate that I analyzed using MS Excel in the post before this one. I copied the data from MS Excel and pasted them into the Gnumeric spreadsheet.

To analyze the data, you just have to go to the menu, click on Statistics, select the column of the two groups one at a time including the label and input them in separate fields. Then click the Label box. If you click the Label box, you are telling the computer to use the first row as Label of your groups (see Figs. 1-3 below for a graphic guide).

In the t-test analysis that I employed using Gnumeric, I labeled one group as HR 8 months ago for heart rate eight months ago and another group as HR Last 3weeks as samples for my heart rate for the last six weeks.

t-test Menu in Gnumeric Spreadsheet 1.10.17

The t-test function in Gnumeric can be accessed in the menu by clicking on the Statistics menu. Here’s a screenshot of the menus to click for a t-test analysis. 

menu for t-test
Fig. 1 The t-test menu for unpaired t-test assuming equal variance.

Notice that the Unpaired Samples, Equal Variances: T-test … was selected. In my earlier post on t-test using MS Excel, the F-test revealed that there is no significant difference in variance in both groups so t-test assuming equal variances is the appropriate analysis.

highlight variable 1
Fig. 2. Highlighting variable 1 column inputs the range of values for analysis.
highlight variable 2
Fig. 3. Highlighting variable 2 column inputs the range of values for analysis.

After you have input the data in Variable 1 and Variable 2 fields, click on the Output tab. You may just leave the Populations and Test tabs at default settings. Just select the cell in the spreadsheet where you want the output to be displayed.

Here’s the output of the data analysis using t-test in Gnumeric compared to that obtained using MS Excel (click to enlarge):

Excel and Gnumeric output
Excel and Gnumeric output Fig. 4. Gnumeric and MS Excel output.

Notice that the output of the analysis using MS Excel and Gnumeric are essentially the same. In fact, Gnumeric provides more details although MS Excel has a visible title and formally formatted table for the F-test and t-test analysis.

Since both software applications deliver the same results, your sensible choice is to install the free software Gnumeric to help you solve statistical problems. You can avail of the latest stable release if you have installed a Linux distribution in your computer. 

Try it and see how it works. You may download the latest stable release for your version of operating system in the Gnumeric homepage.

© 2014 May 3 P. A. Regoniel