Excel's Easiest and Most Robust Normality Test - The Chi-Square Goodness-of-Fit Test
Ranked #984 in Business & Work, #47,701 overall
Excel's Easiest and Most Robust Normality Test - The Chi-Square Goodness-Of-Fit Test
Introduction to the Chi-Square Goodness-Of-Fit Test
-
The Chi-Square Goodness-Of-Fit test is less well known than some other normality test such as the Kolmogorov-Smirnov test, the Anderson-Darling test, or the Shapiro-Wilk test. The Chi-Square Goodness-Of-Fit test is, however, a lot less complicated, every bit as robust, and a whole lot easier to implement in Excel (by far) than any of the more well known normality tests. Let's run through an example:
Excel Histogram Numerical Output
The 1st Step of the Chi-Square Goodness-Of-Fit Test In Excel
Apply Excel's "Descriptive Statistics" Function to the Sample Data
We need to know the mean, standard deviation, and sample size of the data that we are about to test for normality. Use the Descriptive Statistics Excel tool to obtain this information. In Excel 2003, this tool can be found at Tools / Data Analysis / Descriptive Statistics. The resulting output for this test is as follows:
How the Chi-Square Goodness-Of-Fit Test Works
The Chi-Square Goodness-Of-Fit test is a hypothesis test. The Null and Alternative Hypotheses being tested are:
H0 = The data follows the normal distribution.
H1 = The data does not follow the normal distribution.
A quick summary of the test is as follows:
3 to 4 - 1 sample had a value in this range
4 to 5 - 1 sample had a value in this range
5 to 6 - 2 samples had a value in this range
6 to 7 - 4 samples had a value in this range
7 to 8 - 6 samples had a value in this range
8 to 9 - 7 samples had a value in this range
9 to 10 - 7 samples had a value in this range
10 to 11 - 4 samples had a value in this range
11 to 12 - 4 samples had a value in this range
12 to 13 - 3 samples had a value in this range
13 to 14 - 1 sample had a value in this range
The figures above represent the observed number of samples in each bin range. We now need to calculate how many sample we would expect to occur in each bin if the sample was normally distributed with the same mean and standard deviation as the sample taken (mean = 8.634 and standard deviation = 2.5454).
The expected number of sample in each bin is calculated by the following formula:
(Area of the normal curve bounded by the bin's upper and lower boundaries) x (Total number of samples taken)
For example, if there were only 2 bins that meet at the mean, then the corresponding normal curve would have 2 regions with a boundary at the mean of the normal curve. Each of the two regions of the normal curve would contain 50% of the area under the entire normal curve. We would therefore expect 50% of the total number of samples taken to fall in each bin. If, for example, 42 samples were taken, we would expect 21 samples to occur in each bin if the samples were normally distributed.
Given the bin ranges we have established for the Excel Histogram and the number of observed samples in each bin, we now need to calculate the number of samples we would expect to find in each bin. We assume that the samples are normally distributed with the same mean and standard deviation as measured from the actual sample. Given these assumptions, we use the method described above to calculate how many samples would be expected to occur in each bin.
Once we know the observed and expected number of samples in each bin, we calculate the Chi-Square Statistic.
A Chi-Square Statistic is created from the data using this formula:
Chi-Square Statistic = %u03A3 [ [ ( Expected num. - Observed num.)^2 ] / (Expected num.) ]
A p Value is calculated in Excel from this Excel formula:
p Value = CHIDIST ( Chi-Square Statistic, Degrees of Freedom )
We take all of the samples and divide them up into groups. These groups are called bins. We will use the same bins as was used when creating the Histogram in Excel. The bins are as follows:
The size of the p Value determines whether or not we go with the assumption that the samples are normally distributed.
The Decision Rule
Breaking the Normal Curve into Regions
The Resulting Excel Histogram
When we created the Excel Histogram from the data, we had to specify how many "bins" the samples would be divided into. Excel counted the number of observed samples in each bin and then plotted the results in the above histogram.
Since Excel has already counted how many observed samples are in each bin, we wil also use the bins as our sections for the Chi-Square Goodness-Of-Fit test. We know how many actual samples have been observed in each bin. We now need to calculate how many samples would have been expected to occur in each bin.
Calculating the Expected Number of Samples in Each Bin
Here is a simple example that will hopefully clarify the above paragraph. If we were evaluating a data set for normality, we would be trying to determine whether the data fits the normal curve. We have to determine what the bins ranges that we will divide the data into. The simplest bin arrangement would be to place all the data into only two bins on either side of the sample's mean. If the data were normally distributed, we would expect half of the samples to occur in each bin.
In other words, if the bins were placed along the x-axis relative to the sample's mean so each bin would be directly under 50% of a normal curve with the same mean, then we would expect 50% of the samples to occur in each bin. If there were 60 total samples taken, we would expect 30 samples to occur in each bin.
The expected number of samples for a single bin = Exp.
Exp. = (Area under the normal curve over the top of the bin) x (Total number of samples)
Calculating the CDF
We can obtain the normal curve area over each bin by using the Cumulative Distribution Function (CDF). The CDF at any point on the x-axis is the total area under the curve to the left of that point. We can obtain the percentage of area in normal curve for each bin by subtracting the CDF at the x-Value of bin's lower boundary from the CDF at the x-Value of the bin's upper boundary.
The normal distribution that we are trying to fit data has as its two and only parameters the sample's mean and standard deviation.
The CDF of this normal distribution at any point on the x-Axis can be determined by the following Excel formula:
CDF = NORMDIST ( x Value, Sample Mean, Sample Standard Deviation, TRUE )
Once again, this formula calculate the CDF at that x Value, which is the area under the normal curve to the left of the x Value. That normal curve has as its parameters the sample's mean and standard deviation.
Calcuating Expected Number of Samples in Each Bin
Excel Calculations for Expected Number of Samples in Each Bin
We can now calculate the Expected number of samples in each bin by the following formula:
Exp. number of samples in each bin =
( Percentage of Curve Area in that Bin ) x Total number of samples
This calculation for each bin is completed in the 1st column below. There are 42 total samples taken for this exercise.
Calculation of the Chi-Square Statistic
Excel Calculations of the Chi-Square Statistic
Degrees of Freedom
Excel Calculation of Degrees of Freedom
The Chi-Square-Goodness-Of-Fit test requires the number of Degrees of Freedom be calculated for the specific test being run. The formula for this is as follows:
Degrees of Freedom = df = (number of filled bins) - 1 - (number of parameters calculated from the sample)
The number of filled bins = 12
We calculated the mean and standard deviation from the sample. This is 2 parameters.
df = 12 - 1 - 2 = 9
We can now calculate the p Value from Chi-Square Statistics and the Degrees of Freedom as shown directly above.
The p Value's Graphical Interpretation
An Excel Graph Showing the p Value
The p Value's graphical interpretation is shown below. The p Value represents the percentage of area (in red) to the right of X = 4.653 under a Chi-Square distribution with 9 Degrees of Freedom. If the p Value (.8634) is greater than the Level of Significance (0.05), we do not reject the Null Hypothesis.
In this case, we state that we do not reject the Null Hypothesis and do not have sufficient evidence that the data is not normally distributed.
By Far, The Clearest Graduate-Level Statistics Lessons - All In Excel! ---> Click On Image To See Reviews !
The Excel Statistical Master eBook - Over 400 Pages and Videos of Easy-To-Understand, Step-By-Step Business Statistics in Excel - Download It Now !
1) It's LOADED With Completed Problems ALL in Excel.
2) All Solved Problems Are REAL-WORLD, BUSINESS Problems.
3) Nothing But SIMPLE Explanations.
4) More Than Half of the Lessons Are Supplemented With VIDEOS To Help You Understand Quicker.
5) Lessons Are All in BITE-SIZE Chunks.
6) The Statistics Are MBA-LEVEL - Over 400 Pages and Videos.
7) All Lessons Are In CLEAR, EASY-TO-FOLLOW FRAMEWORKS.
8) NO USELESS THEORY Is Taught.
9) NO PRIOR STATISTICAL KNOWLEDGE Is Needed.
10) Every Lesson Is Entirely In Excel. NO LOOKING UP ON CHARTS.
11) You Already Know Excel. NO NEW SOFTWARE NECESSARY.
Wanna Be a Master of Business Statistics FAST ? Check Out My Other Lenses !
My Other Lens That Will Make You a Master of Business Statistics - FAST !
- The 22 Most Common Mistakes Made During Statistical Analysis
- If you make any of these mistakes, you'll either get a totally wrong answer or you will spend way more time than necessary getting to the right answer.
- The 7 Biggest Reasons That Your Data Is Not Normally Distributed
- Here is a list of the main (but correctable) reasons that your data does not appear to be normally distributed. If your data is normally distributed, you can perofrm most statistical tests on that data. This lens tells you why your data doesn't appear to be normally distributed and what you can do to correct that.
- How To Use Dummy Variable Regression in Excel To Do Conjoint Analysis
- Conjoint Analysis is a great marketing technique that will tell which products attributes your customers like the best and also quantifies the customers' degree of liking of each attribute. This lense shows exactly how to do it in Excel. This lens has a video to make it easier to understand.
- Taguchi Testing - Is It a Good Tool For Landing Page Optimization?
- This lens explains in simple terms exactly what Taguchi testing is and why it is or is not a good tool for landing page optimzation.
- How To Create a User-Interactive Graph Of the Normal Distribution in Excel
- You will see in step-by-step, easy-to-follow lessons, which includes a video, exactly to create a user-interactive graph of the normal distribution in Excel. The user will be able to vary the mean and standard deviation and watch those changes reflected in the Excel graph that you will be able to make after viewing this lens.
- How To Use the t Test in Excel To Tell If Your New Marketing Is Working
- The t Test is the easiest tool in the world to tell if your new marketing or advertising campaign really worked. Here's how to do it in Excel.
- How To Find Out What Makes Your Customers Buy WIth the Chi-Square Independence Test In Excel
- The Chi-Square Indepedence Test is a versatile test for determining if one thing is related to the other.l Here's how to do it in Excel, This lens has a video in it to make it easy to understand.
- How To Quickly Read the Output of Regression in Excel
- The output of regression in Excel in complicated. Here is how to quickly look at this output and identify the 4 main, important parts and what they mean. This lens has a video in it for easier understanding.
- Download the 400+ Page eManual About How To Do Business Statistics in Excel
- The 400+ page eManual, The Excel Statistical Master, simplfies business statistics by presenting everything in frameworks that are easy-to-follow and easy-to-duplicate. Loaded with worked-out problems, this eManual focuses on solving real-world business statistics problems. Everything is conveniently solved with Microsoft Excel.
Links To Learn Business Statistics At a More Basic Level - These Lessons Are Easy-To-Follow !
Business Statistics Made SIMPLE - For Those Who Don't Speak Math !
- Confidence Intervals For the Business Manager - With Lots of Worked-Out Problems !
- Confidence Intervals made easy-to-understand for the business manager. Lots of solved problems. Each problems solved with a step-by-step framework that you will be able to duplicate and use quickly. No wasting time with academic theory that is not useful for solving problems.
- The Normal Distribution For the Business Manager - With Lots of Worked-Out Problems !
- How to use the Normal Distribution to solve real-world business problems. Lots of worked-out problems solved with and easy-to-follow- step-by-step framework. The emphisis is entirely on solving problems with no time wasting on complicated acedemic theory that is not useful for problem solving. Loaded with solved problems !
- Prediction Using Regression For the Business Manager - With Lots of Worked-Out Problems !
- Regression analysis has a lot of important uses in business. This lesson show you in detail how to perform regression without confusing you. Loaded with solved problems, this lesson teaches regression in an easy-to-follow, step-by-step method that you will pick up right away.
- How To Solve ALL Hypothesis Tests in Only 4 Steps in Excel
- This lesson presents a framework that shows you show to solve ALL of the many kinds of hypothesis tests in only 4 steps in Excel. Extremely useful !
- The t Distribution For the Business Manager
- The t Distribution is often used for analyzing small samples. It is very close to the normal distribution but very useful when sample size is less than 30.
- How To Solve ALL Hypothesis Tests in Only 4 Steps in Excel
- This lesson presents a framework that shows you show to solve ALL of the many kinds of hypothesis tests in only 4 steps in Excel. Extremely useful !
- Combinations and Permutations for the Business Manager - With Lots of Worked-Out Problems !
- Combinations and Permutations are extremely useful statistical tools. Anyone who does statistical analysis should know them.
- Correlation and Covariance For the Business Manager - With Lots of Worked-Out Problems !
- Correlation and Covariance describe linear relationships between different variables. It allows you to quickly determine to what degree the movement of one object is related to the movement over another object.
- The Binomial Distribution For the Business Manager - With Lots of Worked-Out Problems !
- The Binomial Distribution is one of the most valuable and commonly used statistical tools. The Binomial Distribution is used whenever a process has only two possible outcomes. This is a very common occurrence.
- ANOVA for the Business Manager - With Lots of Worked-Out Problems !
- ANOVA, Analysis of Variance, is a test to determine whether the means of several groups are equal. As such, ANOVA is often used to determine if three or more different methods or treatments have the same effect on a population. For example, ANOVA testing might be used to determine if three different training methods produce the same sales results with a group of salespeople.
- Other Useful Distributions For the Business Manager - With Lots of Worked-Out Problems !
- Learn how many other distributions, such as the Weibull and the Poisson Distributions can be used in business.
- Download the 400+ Page eManual About How To Do Business Statistics in Excel
- The 400+ page eManual, The Excel Statistical Master, simplfies business statistics by presenting everything in frameworks that are easy-to-follow and easy-to-duplicate. Loaded with worked-out problems, this eManual focuses on solving real-world business statistics problems. Everything is conveniently solved with Microsoft Excel.
by solvermark
solvermark has helped numerous marketers and business students understand how to use business statistics to solve real-world problems. His web site, h... more »
Explore related pages
- Your Etsy Business on Autopilot Your Etsy Business on Autopilot
- How To Use Dummy Variable Regression In Excel To Do Conjoint Analysis How To Use Dummy Variable Regression In Excel To Do Conjoint Analysis
- The 7 Biggest Reasons That Your Data Is Not Normally Distributed The 7 Biggest Reasons That Your Data Is Not Normally Distributed
- Your Etsy Shop Numbers Your Etsy Shop Numbers
- How To Quickly Read the Output of Regression in Excel How To Quickly Read the Output of Regression in Excel
- Demographics Made Slightly Less Difficult Demographics Made Slightly Less Difficult




