Box Plots
Box Plots are just one of the ways that data can be represented; both discrete and continuous, yet for the time being we are just going to focus on the concept of box plots, how they are constructed and how to read and compare box plots.
What is a Box Plot?
I hear you ask. Well box plots, otherwise known as box-and-whisker plots, are used to compare distributions of two sets of data and is often set up on graph paper. They often show the median, upper and lower quartiles, and the smallest and largest values of a set of data.
Median - The median value of the data is the number which is the middle number when all the values have been put in numerical order. The median term can be calculated, when n is the number of values you have in a set of data, and is the [(n+1)/2]th term when values are in numerical order. For an example see Example 1.
Upper Quartile - The Upper Quartile divides the upper half of the set of data (in numerical order) into two. The term for the Upper Quartile can be found by 3(n+1)/4
Lower Quartile - The Lower Quartile divides the lower half of the set of data (in numerical order) into two. The term for the Lower Quartile can be found by (n+1)/4
Example 1
John gathers a set of data about the scores that his 21 classmates received in their last Maths Test. This is the data he gathered:
23 12 36 28 25 25 18 24 33 29 31
34 27 19 32 18 22 23 27 33 31
Draw a box plot to represent this data.
Step 1: Firstly, calculate everything you need:
Median: The median is the [(n+1)/2]th term.
So, when n is 21, we are looking at the [(21+1)/2]th term
[22/2]th term => 11th term
We put the values in numerical order:
12 18 18 19 22 23 23 24 25 25 27 27 28 29 31 31 32 33 33 34 36
Then we find the 11th term which is = 27
So the median is 27
Lower Quartile: The lower quartile is the [(n+1)/4]th term.
So when n is 21, we are looking for the [(21+1)/4]th term.
[(21+1)/4]th term => 5.5th term
Using the numbers we ordered previously in numerical order, because we are looking for the 5.5th term, we add terms 5 and 6 together before dividing by two.
Lower Quartile = (22+23)/2 = 22.5
Upper Quartile: The upper quartile is the [3(n+1)/4]th term.
So when n is 21, we are looking for [3(21+1)/2]th term.
[3(21+1)/4]th term => 16.5th term
Using the numbers we ordered previously in numerical order, because we are looking for the 16.5th term, we add terms 16 and 17 together before dividing by two.
Upper Quartile = (31+32)/2 = 31.5
The lowest value is 12 and the highest value is 36.
Step 2: Now we use these figures to draw a box plot.
At each of these values (12, 22.5, 27, 31.5 and 36), draw a vertical line of a substantial height, but remember that each line must be the same height.
Then with two horizontal lines, join the points of the lower and upper quartile and then, in the middle of the vertical line join the lowest value and lower quartile, and then join the highest value and upper quartile.
How to read Box Plots
Now that you know how to construct box plots it should be considerably easier to read them.
In an exam you will be expected to be able to read a box plot and understand that the lower quartile (the first vertical line of the box) represents 25%, the median represents 50% and the upper quartile represents 75%. The first "whisker" represents the lowest value whilst the last whisker represents the highest value. They may also ask you to read from the plot the values of a certain feature.
Example 2
1) Using the box plot shown above, state the value of the lower quartile.
Q1 = 40
2) What is the interquartile range of the values shown in the box plot above.
Q3 = 60 Q1 = 40
IQR = Q3 - Q1 = 60 - 40 = 20
3) Fred says that exactly 50% of the values are between 40 and 80. Is Fred correct? You must give a reason.
40 is the lower quartile = 25%
80 is the largest value = 100%
100% - 25% = 75%
No, Fred is not correct, because 75% of the values are between 40 and 80, not 50%.
In an exam, 1 mark would be available for question three yet only one in three students will get it. You have to give a reason for your answer so yes or no must appear in your answer for it to be marked.
Comparing Box Plots
In an exam they may give you more than one box plot and then ask you to write about the differences about them. When you are comparing distributions you can us the range, interquartile range the small and largest values and the median, however it is better to compare measures of spread rather than the medians or end points. The interquartile range is definitely a point to mention.
Example 3
Make two comparisons between both box plots on the left.
The first box plot has an Interquartile range of (70-40=30) 30 whereas the second box plot has a small interquartile range of (80-60) 20.
So A IQR > B IQR
The median on the second box plot (B) is greater than the first box plot's median.
A median < B median