Standard deviation is one of those statistical terms in the corporate world, which allows you to increase the credibility of people who managed to successfully screw it during a conversation or presentation, and leaves a vague misunderstanding of those who do not know what it is, but are embarrassed to ask. In fact, most managers do not understand the concept of standard deviation, and if you are one of them, you should stop living in a lie. In today's article, I will tell you how this underrated statistical measure will help you better understand the data you are working with.
What measures standard deviation?
Imagine that you own two stores. And to avoid losses, it is important that there is a clear control of stock balances. In an attempt to find out which of the managers manages inventory better, you decided to analyze stocks of the last six weeks. The average weekly cost of stock for both stores is approximately the same and amounts to about 32 conventional units. At first glance, the average value of the flow shows that both managers work the same way.
But if you carefully study the activities of the second store, you can see that although the average value is correct, the flow variability is very high (from 10 to 58 cu). Thus, we can conclude that the average value does not always correctly evaluate the data. This is where the standard deviation comes to the rescue.
The standard deviation shows how the values are distributed relative to the average in our sample. In other words, we can understand how large the spread of the runoff value is from week to week.
In our example, we used the Excel function STD to calculate the standard deviation along with the average.
In the case of the first manager, the standard deviation was 2. This tells us that each value in the sample on average deviates by 2 from the average value. Is it good? Let's look at the issue from a different angle - the standard deviation of 0, tells us that each value in the sample is equal to its average value (in our case, 32.2). So, the standard deviation of 2 is not much different from 0, and indicates that most of the values are near the average value. The closer the standard deviation to 0, the more reliable the average. Moreover, the standard deviation close to 0 indicates a small data variability. That is, the flow rate with a standard deviation of 2 indicates an incredible sequence of the first manager.
In the case of the second store, the standard deviation was 18.9. That is, the cost of runoff deviates on average by 18.9 from the average value from week to week. Crazy scatter! The farther the standard deviation from 0, the less accurate the average value. In our case, the figure of 18.9 indicates that the average value (32.8 cu per week) simply cannot be trusted. It also tells us that the weekly runoff is highly variable.
This is the concept of standard deviation in a nutshell. Although it does not provide an idea of other important statistical measurements (Fashion, Median ...), in fact, the standard deviation plays a decisive role in most statistical calculations. Understanding the principles of standard deviation will shed light on the essence of many processes in your business.
How to calculate the standard deviation?
So, now we know what the standard deviation figure says. Let's see how it is considered.
Consider a data set from 10 to 70 in steps of 10. As you can see, I have already calculated the standard deviation for them using the STANDOTLON function in cell H2 (orange).
The steps that Excel takes to arrive at 21.6 are described below.
Please note that all calculations are visualized for a better understanding. In fact, Excel calculates instantly, leaving all the steps behind the scenes.
First, Excel finds the average value of the sample. In our case, the average turned out to be 40, which in the next step is subtracted from each sample value. Each difference obtained is squared and summarized. We got a sum equal to 2800, which must be divided by the number of sample elements minus 1. Since we have 7 elements, it turns out that we need to divide 2800 by 6. From the result we find the square root, this figure will be the standard deviation.
For those who are not entirely clear on the principle of calculating the standard deviation using visualization, I give a mathematical interpretation of finding this value.
Excel standard deviation calculation functions
Excel has several varieties of standard deviation formulas. You just need to dial = STANDOTLON and you will see for yourself.
It is worth noting that the STANDOTCLON.V and STANDOTKLON.G functions (the first and second functions in the list) duplicate the STANDOTLON and STANDOTCLONP functions (the fifth and sixth functions in the list), respectively, which were left for compatibility with earlier versions of Excel.
In general, the difference in the endings of the .B and .G functions indicates the principle of calculating the standard deviation of the sample or the population. I already explained the difference between these two arrays in the previous article of the dispersion calculation.
A feature of the STANDOTLON and STDOTLONP functions (the third and fourth functions in the list) is that when calculating the standard deviation of the array, logical and text values are taken into account. Textual and true logical values are equal to 1, and false logical values are equal to 0. It is difficult for me to imagine a situation when I could need these two functions, therefore, I think that they can be ignored.
Renat, good afternoon.
I like the article, and most importantly the way the material is presented. The calculation visualization also pleased with the novelty of the approach, although it also required more time for understanding (classical Soviet education). I agree that no one really knows about the standard deviation, but in vain ...
There is an error in the formula: under the sign of the root, it is necessary to sum the squared deviations
Determination of standard deviation
Immediately determine what the standard deviation is and what its formula looks like. This value is the square root of the arithmetic mean of the squares of the difference of all values of the series and their arithmetic mean. There is an identical name for this indicator - standard deviation. Both names are completely equivalent.
But, naturally, in Excel, the user does not have to calculate this, since the program does everything for him. Let's find out how to calculate standard deviation in Excel.
Method 1: Function Wizard
- Select the cell on the sheet where the finished result will be displayed. Click on the button "Insert function"located to the left of the function line.
In the list that opens, look for the entry STANDOTLON.V or STANDOTLON.G. There is also a function in the list STD, but it is left over from previous versions of Excel for compatibility purposes. After the record is selected, click on the button OK.
Method 2: Formulas Tab
You can also calculate the standard deviation value through the tab "Formulas".
- Select the cell to display the result and go to the tab "Formulas".
Method 3: manually enter the formula
There is also a way in which you do not need to call the argument window at all. To do this, enter the formula manually.
- Select the cell to display the result and prescribe in it or in the formula bar the expression according to the following pattern:
= STANDOTLON.G (number1 (cell_address1), number2 (cell_address2), ...)
= STDB.V (number1 (cell_address1), number2 (cell_address2), ...).
In total, up to 255 arguments can be written if necessary.
As you can see, the mechanism for calculating standard deviation in Excel is very simple. The user only needs to enter numbers from the population or a link to the cells that contain them. All calculations are performed by the program itself. It is much more difficult to realize what the calculated indicator is and how the calculation results can be applied in practice. But comprehension of this already relates more to the field of statistics than to training in working with software.
Thank the author, share the article on social networks.
Maximum and minimum value
Let's start with the maximum and minimum formulas. Maximum is the largest value from the analyzed data set; minimum is the smallest. These are extreme values in the aggregate of data indicating the boundaries of their variation. For example, minimum / maximum prices for something, choosing the best or worst solution to a problem, etc.
There are special functions for calculating these indicators - MAX and MIN respectively. Access is directly from the tape, in the drop-down list of avosumma.
If you use function insertion, then you should turn to the "Statistical" category.
In general, to call the function of maximum or minimum actions will be required no more than to calculate the arithmetic mean.
Mean linear deviation
The average linear deviation is the average of the absolute (modulo) deviations from the arithmetic mean in the analyzed data set. The mathematical formula has the form:
a - average linear deviation,
X - analyzed indicator,
X̅ - the average value of the indicator,
n - the number of values in the analyzed data set.
In Excel, this function is called AVERAGE.
After selecting the AVERAGE function, indicate the data range over which the calculation should take place. Click OK.
Perhaps not everyone knows what a variance of a random variable is, so I’ll explain - this is a measure of the spread of data around the mathematical expectation. However, usually only a sample is available, so the following dispersion formula is used:
s 2 - sample variance calculated from observational data,
X - individual values
X̅- arithmetic mean of the sample,
n - the number of values in the analyzed data set.
The corresponding Excel function is DISP.G. When analyzing relatively small samples (up to about 30 observations), unbiased sample variance should be used, which is calculated using the following formula.
The difference, as can be seen, is only in the denominator. In Excel, to calculate the sample unbiased variance there is a function DISP.V.
We select the desired option (general or selective), indicate the range, click the "OK" button. The resulting value may turn out to be very large due to preliminary squaring of the deviations. Dispersion in statistics is a very important indicator, but it is usually used not in its pure form, but for further calculations.
The standard deviation (RMS) is the root of the variance. This indicator is also called the standard deviation and is calculated by the formula:
You can simply extract the root from the variance, but in Excel there are ready-made functions for the standard deviation: STANDOTLON.G and STANDOTLON.V (according to the general and sample aggregate, respectively).
I repeat, the standard and standard deviation are synonyms.
Next, as usual, specify the desired range and click on "OK". The standard deviation has the same units of measure as the analyzed indicator, therefore, it is comparable with the source data. About it below.
The coefficient of variation
All indicators discussed above are tied to the scale of the source data and do not allow to obtain a figurative idea of the variation of the analyzed population. To obtain a relative measure of data scatter use the coefficient of variationcalculated by dividing standard deviation on average. The formula for the coefficient of variation is simple:
There is no ready-made function for calculating the coefficient of variation in Excel, which is not a big problem. The calculation can be made by simply dividing the standard deviation by the average value. To do this, in the formula bar write:
A range of data is indicated in parentheses. If necessary, use the standard deviation of the sample (STD. B).
The coefficient of variation is usually expressed as a percentage, so a cell with a formula can be framed in a percentage format. The desired button is located on the ribbon on the "Home" tab:
You can also change the format by selecting "Format Cells" from the context menu after selecting the desired cell and right-clicking.
The coefficient of variation, unlike other indicators of the scatter of values, is used as an independent and very informative indicator of data variation. In statistics, it is generally accepted that if the coefficient of variation is less than 33%, then the data set is homogeneous, if more than 33%, then it is heterogeneous. This information may be useful for a preliminary description of the data and to determine the possibilities for further analysis. In addition, the coefficient of variation, measured in percent, allows you to compare the degree of dispersion of various data, regardless of their scale and units. Useful property.
Another indicator of the scatter of data today is the oscillation coefficient. This is the ratio of the range of variation (the difference between the maximum and minimum value) to the average. There is no ready-made Excel formula, so you have to put together three functions: MAX, MIN, AVERAGE.
The oscillation coefficient shows the extent of variation relative to the average, which can also be used to compare different data sets.
In general, using Excel, many statistics are calculated very simply. If something is not clear, you can always use the window to search in the function insert. Well, Google to help.