Tag Archives: data analysis

Data Accuracy, Reliability and Triangulation in Qualitative Research

As a researcher, you might want to make sure that whatever information you gather in the field can be depended upon. How will you be able to ensure that your data is accurate and reliable? This article explains the importance of verifying information through a technique called triangulation.

Data Accuracy and Reliability

Do you know what the GIGO rule is? GIGO is acronym for Garbage In, Garbage Out. This rule was popular in the early periods of computer use where whatever you input into the computer is processed without question.

Data accuracy and reliability are very important concerns in doing good research because inaccurate and unreliable data lead to spurious or wrong conclusions. If, for some reason, you inadvertently input wrong data into the computer, output will still be produced. But of course, the results are erroneous because the data entered is faulty. It is also possible that you input the data correctly but then the data does not reflect what you really want to measure.

Thus, it is always good practice to review whatever data you have before entering it into your computer through a software application like a spreadsheet or a statistical software. Each data should be verified for accuracy and must be input meticulously. Once entered, the data, again, must be reviewed for accuracy. An extra zero in whatever number you entered in a cell will affect the resulting graph or correlation analysis. Or data input into the wrong category can destroy data reliability.

This data verification strategy will work for quantitative data which are obtained mainly through the application of standardized measurement scales such as nominal or categorical, ordinal, interval, and ratio. The latter two measurements offer the most accurate measurement scales by which the data obtained will allow for sound statistical analysis. Although measurement data will vary between observers as some researchers apply a meticulous approach to what they are doing while some do it casually, the errors of measurement can be controlled to a certain degree.

In the case of qualitative research, which in nature is highly subjective, there are also ways by which data can be verified or validated. This is through the so-called triangulation method.

What is the Triangulation Method?

Triangulation is one of the popular research tools that researchers commonly use in an attempt to verify the accuracy of data obtained from the field. As the word connotes, it refers to the application of three approaches or methods to verify data.

Why three? This works just like a global positioning system or GPS where you need at least three satellites to tell you your exact location. Simply put, this just means that you need not only one source of information to provide answers to your questions. And at least three should be put to practical use.

At best, the questions you pose in qualitative research represent people’s viewpoints, and these viewpoints should be verified through other means. If it so happened that you have only one source of information and that information is false, then that becomes 100% erroneous. Consequently, your conclusions are faulty. Having several information sources give researchers confidence that the data they are getting approximates the truth.

Data

Image Source

Methods of Triangulation in Qualitative Research

The most common methods used as a demonstration of triangulation are the household interview or HHI, key informant interview (KII), and focus group discussion (FGD). These approaches rely on the information provided by a population of respondents with a predetermined set of characteristics, knowledgeable individuals, and a multi-sectoral group, respectively.

HHI utilizes structured questionnaires administered by trained interviewers to randomly selected individuals, usually the household head as the household representative. It is a rapid approach to getting information from a subset of the population in an attempt to describe the characteristics of the general population. The data obtained are largely approximations and highly dependent on the honesty of the respondents.

Second, the KII approach obtains information from key informants. A key informant is someone who is expected to be well-familiar with issues and concerns besetting the community. Almost always, the key informants are elders or someone who had lived the most and familiar with community dynamics or changes in the community through time.

Third, FGD elicits responses from representatives of the different sectors of society. These representatives are usually called the stakeholders, meaning, they have a stake or are influenced by whatever issue or concern is being investigated. Fishers, for example, are affected by the establishment of protected areas in their traditional fishing grounds.

Conclusion

Data accuracy is threatened by the inherent subjectivity of data obtained through qualitative methods. Therefore, a combination of qualitative methods such as household interview, key informant interview, and focus group discussion can reduce errors and provide greater confidence to researchers employing qualitative approaches. This is referred to as triangulation.

Reference:

Janssen, C. n.d. Garbage In, Garbage Out (GIGO). Retrieved on July 28, 2013 from http://www.techopedia.com/definition/3801/garbage-in-garbage-out-gigo

© 2013 July 28 P. A. Regoniel

Simplified Explanation of Probability in Statistics

Do you have trouble understanding the concept of probability? Do you ask yourself why you have to read that section on probability in your statistics book that seems to have no bearing on your research? Don’t despair. Read the following article and have a clear understanding of this concept that you will find very useful in your research venture.

One of the topics in the Statistics course that students had difficulty understanding is the concept of probability. But is “probability” really a difficult thing to understand? In reality, it is not that difficult as long as you gain understanding on how it works when trying to compare differences or correlations between variables.

It simply works this way:

The classic example to illustrate probability is demonstrated using a coin. Everybody knows that a coin has two sides: the head, which normally has face of someone on it with the corresponding amount it represents or the tail, which typically shows the government bank which issued the currency.

Now, if you flick the coin, it will land and settle with one side up; unless you get a weird result that the coin unexpectedly landed on its edge or in-between the head and tail sides! (see Fig. 1). This, however, could be a possibility as there is a middle ground that will make this possible though very, very remote (what if the government decides to have a coin thick enough to make this possible if ever you flick a coin?). I just included this because it so happened I flicked a coin before and it landed next to an object that made it stand on its edge instead of falling on either the head or the tail side. That just means that unexpected things could happen given the right circumstances that will make it possible.

coins
Fig. 1. Head, in-between, tail (L-R)

I just have to illustrate this with a picture because some students do not understand what is a head and what is a tail in a coin. So, no excuses for not understanding what we are talking about here.

For our purpose, we’ll just leave the in-between possibility and just concentrate on either the possibility of getting a head or a tail when a coin is flipped and allowed to settle on level ground or on top of your palm. Since there are only two possibilities here, we can then say that there is a 50-50, 0.5 or 1/2 possibility that the coin will land as head or tail. If we would like to represent this as a symbol in statistics to show this possibility, it is written thus:

p = 0.5

where p is the probability symbol and the value 0.5 is the estimated outcome that the coin will land on either the head or the tail. Alternatively, this can be stated that there is an equal chance that you will get a head or a tail in a series of tossing a coin and letting it land on level ground.

Therefore, if you toss a coin 10 times, the probability of getting either a head or a tail is 50%, 0.05 or 1/2. That means in 10 tosses, there will likely be 5 heads and 5 tails. If you toss it 100 times, you will likely get 50 heads and 50 tails.

If you have a six-sided dice, then the probability of each side in each throw is 1/6. If you have a cube, then the probability of each side is 1/4.

Application

This background knowledge can help you understand the importance of the p-value in statistical tests.

For example, if you are interested in knowing if a significant difference between two sets of variables exists (say a comparison of the test scores of a group of students who were given remedial classes as opposed to another group that did not undergo remedial classes), and a statistical software was used to analyze the data (presumably a t-test was applied), you just have to look at the p-value to find out if indeed there is a significant difference in achievement between the two groups. If the p-value is 0.05 or lower than that, then you can safely say that there is sufficient evidence that students who underwent remedial classes performed better (in terms of their test scores) than those who did not undergo remedial classes.

For clarity, here are the null and alternative hypotheses that you can formulate for this study:

Null Hypothesis: There is no significant difference between the test scores of students who took remedial classes and students who did not take remedial classes.

Alternative Hypothesis: There is a significant difference between the test scores of students who took remedial classes and students who did not take remedial classes.

The p-value simply means that there is a 5% probability, possibility or chance that students who were given remedial classes perform similarly with those who were not given remedial classes. This probability is quite low, such that you may reject your null hypothesis that there is no difference in test scores of students with or without remedial classes. If you reject the null hypothesis, then you should accept your alternative hypothesis which is: There is a significant difference between the test scores of students who took remedial classes and students who did not take remedial classes.

Of what use is this finding then? The results show that indeed, giving remedial classes can provide benefit to students. As the results of the study indicated, it can significantly increase the student’s test scores.

You may then present the results of your study and confidently recommend that remedial classes be given to students to help improve their test scores in whatever subject that may be.

That’s how statistics work in research.

©2013 May 15 Patrick Regoniel

Four Statistical Scales of Measurement

To measure appropriately the research variables identified and reflected in the conceptual framework, a budding researcher must be very familiar with the four statistical scales of measurement. What are the four statistical scales of measurement and what variables do these measure? The following article enumerates and describes the four statistical scales of measurement and provides examples with exercises.

In the course of gathering your data, you should be very well familiar with the different statistical scales of measurement. This knowledge will help you adequately and appropriately measure the variables that you have identified in your conceptual framework. Further, once you make the variables quantifiable, application of the appropriate statistical test is possible.

I previously discussed the role that variables play in the conduct of research, i. e., it primarily serves as the focal points of the whole research process because the phenomenon is abstract in nature. It takes some skill to isolate such research variables, but with constant practice and familiarity, the identification of these variables becomes easy.

How can you say that the factors studied are variables?

One of the primary attributes of variables is that these lend themselves to statistical scales of measurement. Research variables must be measurable. Statisticians devised four statistical scales of measurement. These are nominal or categorical, ordinal, interval and ratio statistical scales.

The Four Major Statistical Scales of Measurement

1. Nominal or categorical

The nominal or categorical statistical scale of measurement is used to measure those variables that can be broken down into groups. Each group has attributes distinctly different from the other. The most commonly used nominal or categorical variables measured using this research scale of measurement are gender, civil status, nationality, or religion. These variables and their corresponding categories are as follows:

  • gender – male or female
  • civil status – single or married
  • nationality – Filipino, Chinese, Singaporean, Malaysian, Indonesian, Vietnamese
  • religion – Muslim, Christian, Buddhist, Shinto

Notice that the categories of each nominal variable do not indicate that one is superior or greater than the other. These are mainly classifications that separate one group from the other.

The nominal scale of measurement is referred to by statisticians as the crudest statistical scale of measurement. While this may be the crudest, this is a powerful statistical scale of measurement when correlating two nominal variables like gender and reproductive health bill position.

The statistical question in this instance is “Is there a correlation between gender and reproductive health position?” Chi-square is the appropriate statistical test for this question.

2. Ordinal

The ordinal statistical scale of measurement applies to variables that signify, as the root word suggests, “order” of the different groups. It is possible to rank order the different groups because each group shows attributes that are convincingly superior or greater than the other or vice-versa.

To illustrate this statistical scale simply and clearly, examples of variables that are measured using this scale of measurement are the following:

  • order of child in the family – eldest, second eldest … youngestfamily
  • socioeconomic status of families – upper, middle, lower
  • educational attainment – elementary, high school, college, graduate
  • size – small, medium, large

Notice that while the different groups follow an order of magnitude, there is no discernible distance between them or that the distances could vary between each group. Say, the eldest child may be older by two years to the next eldest child, but the second eldest child may be three years older than the next child, and so on. No specific income difference describes the socioeconomic status, and so on. The number of years spent in the elementary is not the same as the number years in high school or the graduate school. The size difference between small, medium and large can vary widely.

3. Interval

The interval scale of measurement measures variables better than the rank order mode of the ordinal scale of measurement. There is now an equal spacing between the different groups that composes the variable. Examples of variables that can be measured using this statistical scale of measurement are the following:

  • household income in PhP5,000 brackets – 1st group: earns up to PhP5,000, 2nd group: PhP10,000, 3rd group: PhP15,000
  • temperature in 5 degree intervals – 5, 10, 15, 20
  • number of student absences in one week – week 1 absence, week 2 absence, week 3 absence
  • water volume in 5 milliliter increments – 5 ml, 10 ml, 15 ml, 20 ml

4. Ratio

The ratio scale of measurement works similarly with the interval scale. In fact, in using statistical tests, these two statistical scales of measurement are not treated differently from the other. The only difference between the ratio and the interval scale is that the former (i.e., the ratio scale) has an absolute zero point.

Examples of ratio variables are the following:

  • weight in kilograms or pounds
  • height in meters or feet
  • distance of school from home
  • amount of money spent during vacation

Exercises

To test your skill at this point, identify which statistical scale of measurement applies for the following variables. Compare your answer with your classmates to confirm.

  1. beauty of contestants
  2. light intensity
  3. water turbidity
  4. environmental awareness
  5. emotional intelligence
  6. number of accidents
  7. vehicle speed
  8. allowance of students
  9. brand of cellphone
  10. softdrink preference

Enjoy!

© 2012 December 16 P. A. Regoniel

Cite this article as: Regoniel, Patrick A. (December 16, 2012). Four Statistical Scales of Measurement. In SimplyEducate.Me. Retrieved from http://simplyeducate.me/2012/12/16/4-statistical-scales-of-measurement/

The Importance of Data Accuracy and Integrity for Data Analysis

Data analysis is only as good as the quality of data obtained during the data collection process. How can you ensure data accuracy and integrity? Here are three pointers.

Data analysis is a very important part of the research process. Before performing data analysis, researchers must make sure that numbers in their data are as accurate as possible. Clicking the menus and buttons of statistical software applications like SPSS, Microstat, Statistica, Statview among others is easy, but if the data used in such automated data  analysis is faulty, the results are nothing more than just plain rubbish. Garbage in, garbage out (GIGO).

For many students who just want to comply with their thesis requirement, rigorous and critical data analysis are almost always given much less attention than the other parts of the thesis. At other times, data accuracy is deliberately compromised because of the apparent inconsistency of findings with expected results.

Data should be as accurate, truthful or reliable as possible for if there are doubts about their collection, data analysis is compromised. Interpretation of results will be faulty that will lead to wrong conclusions.

How can you make sure that your data is ready or suitable for data analysis? Here are three pointers to remember to ensure data integrity and accuracy. The following points focus on data collection during interviews.

3 Points to Remember to Ensure Data Integrity and Accuracy

1. Review data entries

Be meticulous about overlooked items in data collection. When dealing with numbers, ensure that the results are within sensible limits. Omitting a zero here or adding a number there can compromise the accuracy of your data.

Watch out for outliers, or those data that seems to be out-of-bounds or at the extremes of the scale of measurement. Verify if the outlier is truly an original record of data collected during the interview. Outliers may be just typographical errors.

2. Verify the manner of data collection

Cross-examine the data collector. If you asked somebody to gather data for you, throw him some questions to find out if the data was collected systematically or truthfully. For paid enumerators, there is a tendency to administer questionnaires in a hurry. In the process, many things will be missed and they will just have to fill-out missing items. To filter out this possibility, the information gathered should be cross-checked.

interview

The following questions may be asked to ensure data quality:

  • How much time did you spend in interviewing the respondent of the study?
  • Is the respondent alone or with a group of people when you did the interview?

To reduce cheating in doing the interview, it will help if you tell your enumerators to have the interviewees sign the interview schedule right after they were interviewed. Ask the enumerators to write the duration of the interview, taking note of the start and end time of the interview.

3. Avoid biased results

Watch out for the so-called ‘wildfire effect’ in data gathering. This happens when you are dealing with sensitive issues like fisherfolk’s compliance to ordinances, rules and regulations or laws of the land. Rumors on the issues raised by the interviewer during the interview will prevent other people from answering the questionnaire. Respondents may become apprehensive if answers to questions intrude into their privacy or threaten them in some way.

Thus, questionnaire administration must be done simultaneously within, say, a day in a given group of interviewees in a particular place. If some of the respondents were interviewed the next day, chances are they have already gossiped among themselves and become wary of someone asking them about sensitive issues that may incriminate them.

Wildfire effect is analogous to a small spark of a match that can ignite dry grass leaves and cause an uncontrollable forest fire. This is the power of the tongue. Hence, the term wildfire effect.

There are many other sources of bias that impact negatively on data quality. These are described in greater detail in another post titled How to Reduce Researcher Bias in Social Research.

Data analysis may then be employed once data accuracy and integrity are ensured.

© 2012 December 6 P. A. Regoniel

Example of a Research Using Multiple Regression Analysis

Data analysis using multiple regression analysis is a fairly common tool used in statistics. Many people find this too complicated to understand. In reality, however, this is not that difficult to do especially with the use of computers.

How is multiple regression analysis done? This article explains this very useful statistical test when dealing with multiple variables then provides an example to demonstrate how it works.

Multiple regression analysis is a powerful statistical test used in finding the relationship between a given dependent variable and a set of independent variables. The use of multiple regression analysis requires a dedicated statistical software like the popular Statistical Package for the Social Sciences (SPSS), Statistica, Microstat, among other sophisticated statistical packages. It will be near impossible to do the calculations manually.

However, a common spreadsheet application like Microsoft Excel can help you compute and model the relationship between the dependent variable and a set of predictor or independent variables. But you cannot do this without activating first the set of statistical tools that ship with MS Excel. To activate the add-in for multiple regression analysis in MS Excel, view the Youtube tutorial below.

Example of a Research Using Multiple Regression Analysis

I will illustrate the use of multiple regression by citing the actual research activity that my graduate students undertook two years ago. The study pertains to the identification of the factors predicting a current problem among high school students, that is, the long hours they spend online for a variety of reasons. The purpose is to address the concern of many parents on their difficulty of weaning their children away from the lures of online gaming, social networking, and other interesting virtual activities.

Upon reviewing the literature, the graduate students discovered that there were very few studies conducted on the subject matter. Studies on problems associated with internet use are still in its infancy.

The brief study using multiple regression is a broad study or analysis of the reasons or underlying factors that significantly relate to the number of hours devoted by high school students in using the Internet. The regression analysis is broad in the sense that it only focuses on the total number of hours devoted by high school students to activities online. The time they spent online was correlated with their personal profile. The students’ profile consisted of more than two independent variables; hence the term “multiple”. The independent variables are age, gender, relationship with the mother, and relationship with the father.

The statement of the problem in this study is:

“Is there a significant relationship between the total number of hours spent online and the students’ age, gender, relationship with their mother, and relationship with their father?”

The relationship with their parents was gauged using a scale of 1 to 10; 1 being a poor relationship, and 10 being the best experience with parents. The figure below shows the paradigm of the study.

multiple regression conceptual framework
Research paradigm of the multiple regression study showing the relationship between the independent and the dependent variables.

Notice that in multiple regression studies such as this, there is only one dependent variable involved. That is the total number of hours spent by high school students online. Although many studies have identified factors that influence the use of the internet, it is standard practice to include the profile of the respondents among the set of predictor or independent variables.

Hence, the common variables age and gender are included in the multiple regression analysis. Also, among the set of variables that may influence internet use, only the relationship between children and their parents were tested. The intention is to find out if parents spend quality time to establish strong emotional bonds between them and their children.

Findings of the Study

What are the findings of this exploratory study? The multiple regression analysis revealed an interesting finding.

The number of hours spent online relates significantly to the number of hours spent by a parent, specifically the mother, with her child. These two factors are inversely or negatively correlated. The relationship means that the greater the number of hours spent by the mother with her child to establish a closer emotional bond, the lesser the number of hours spent by her child in using the internet. The number of hours spent online relates significantly to the number of hours spent by the mother with her child

The number of hours spent online relates significantly to the number of hours spent by the mother with her child

While this may be a significant finding, the mother-child bond accounts for only a small percentage of the variance in total hours spent by the child online. This observation means that there are other factors that need to be addressed to resolve the problem of long waking hours and abandonment of serious study of lessons by children. But establishing a close bond between mother and child is a good start.

Conclusion

The above example of multiple regression analysis demonstrates that the statistical tool is useful in predicting the behavior of dependent variables. In the above case, this is the number of hours spent by students online.

The identification of significant predictors can help determine the correct intervention resolve the problem. The use of multiple regression approaches prevents unnecessary costs for remedies that do not address an issue or a problem.

Thus, in general, research employing multiple regression analysis streamlines solutions and brings into focus those influential factors that must be given attention.

©2012 November 11 Patrick Regoniel

Cite this article as: Regoniel, Patrick A. (November 11, 2012). Example of a Research Using Multiple Regression Analysis. In SimplyEducate.Me. Retrieved from http://simplyeducate.me/2012/11/11/example-of-a-research-using-multiple-regression-analysis/