Click here to print a blank answer sheet.

Descriptive Statistics and Graphing

Measures of Central Tendency

Measures of central tendency are used to find typical numbers in a data set. There are different ways to find a "typical" number and there may be advantages and disadvantages of each, depending on the data. The mean, median, and mode are described below.

Mean

The mean is the average. The mean of a group of numbers is obtained by adding the numbers and then dividing the sum by the total number of data. For example, the mean of the six numbers below is equal to the sum (1620) divided by the number of data points (6) = 270.

Data: 400, 200, 220, 210, 340, 250

Median

The median is the middle observation. It is obtained by arranging the data in increasing (or decreasing) order. The number that is in the middle is the median. 

Example

Data: 27, 40, 3, 51, 34

Data arranged in ascending order: 3, 27, 34, 40, 51

The median for this data set is 34.

If there is an even number of data points, there will be two middle numbers. In this case, the value that is half-way between the two middle numbers is the median. This is equivalent to taking the mean of the two middle numbers.

Example

Data: 400, 200, 220, 210, 340, 250

Data arranged in ascending order: 200, 210, 220, 250, 340, 400

The numbers 220 and 250 are both the middle numbers. The median is therefore the mean of 220 and 250. It is 235.

Mode

The mode is the most common value.

Example

Data: 6, 7, 6, 1, 3, 3, 4, 9, 0, 6, 8, 1

The mode for the data set above is 6 because it occurs three times.

Which to use? Some Disadvantages of each:

Mode – The mode is good if you want to know the most common occurrence but it may not be representative of the data, particularly with small data sets. For example, in the following data the mode is 2, but 2 is not a central measurement: 2, 2, 12, 15, 17, 19, 20. Another problem is that there may be more than one mode.

Median – The median is useful if the data distribution is skewed (does not follow a bell-shaped curve). With a small number of data points, the median may not be a central number. In the following data set, the median is 0, which is not a representative number: 52, 50, 48, 0, 0, 0, 0.

Mean – The mean gives a good representation of the data if the data are normally distributed (a bell-shaped curve). The mean is influenced by outliers, particularly if the number of data points is small. If there are outliers, the median may be a better because it is more immune to this sort of bias. For example, suppose that a population biologist obtains the following population estimates for waterfowl in several lakes:

400
200
220
210
340
250
44,000

The last number in this data set does not seem to belong with the rest; it is an outlier. The typical lake seems to have approximately 200 to 400 birds but the mean for this data set is 6,517. The median (250) may be a better measurement of central tendency.

Measures of Dispersion

Dispersion is the spread of the data. Two measures of dispersion- the range and the standard deviation- are discussed below.

Range

The range is the difference between the largest and smallest data points.

Example

Each of the following data sets have a mean of 50. Notice that data set 1 is more dispersed; the range is 100. The range of data set two is 4.

Data set 1: 100, 75, 50, 25, 0

Data set 2: 52, 51, 50, 49, 48

Variance and Standard Deviation

Variance and standard deviation refer to the the difference between a typical data point and the mean.

Calculation

The calculations below are for the following data set:

25, 26, 26, 31, 35, 36, 38

1. Calculate the mean. The mean for the data set above is 31.

Data

25  
26  
26  
31  
35  
36  
38  
Sum = 217  
Mean = 217/7 = 31  

2. Calculate the deviation from the mean.

Data

Deviation
25   31 - 25 = 6
26   31 - 26 = 5
26   31 - 26 = 5
31   31 - 31 = 0
35   31 - 35 = -4
36   31 - 36 = -5
38   31 - 38 = -7
Sum = 217  
Mean = 217/7 = 31  

3. Square the deviations.

Data

Deviation Deviation squared
25   31 - 25 = 6 36
26   31 - 26 = 5 25
26   31 - 26 = 5 25
31   31 - 31 = 0   0
35   31 - 35 = -4 16
36   31 - 36 = -5 25
38   31 - 38 = -7 49
Sum = 217  
Mean = 217/7 = 31  

4. Sum the squared deviations.

Data

Deviation Deviation squared
25   31 - 25 = 6

36

26   31 - 26 = 5 25
26   31 - 26 = 5 25
31   31 - 31 = 0   0
35   31 - 35 = -4 16
36   31 - 36 = -5 25
38   31 - 38 = -7 49
Sum = 217  
Mean = 217/7 = 31  
Sum = 176

5. Divide the sum of squared deviations by the number of data points minus 1 (also called n-1). This is the variance.

Data

Deviation Deviation squared
25   31 - 25 = 6 36
26   31 - 26 = 5 25
26   31 - 26 = 5 25
31   31 - 31 = 0   0
35   31 - 35 = -4 16
36   31 - 36 = -5 25
38   31 - 38 = -7 49
Sum = 217  
Mean = 217/7 = 31  
Sum = 176
       Variance = Sum/(n-1) = 29.33       

6. The standard deviation is the square root of the variance.

Data

Deviation Deviation Squared
25   31 - 25 = 6 36
26   31 - 26 = 5 25
26   31 - 26 = 5 25
31   31 - 31 = 0   0
35   31 - 35 = -4 16
36   31 - 36 = -5 25
38   31 - 38 = -7 49
Sum = 217  
Mean = 217/7 = 31  

 

Sum = 176
Variance = Sum/(n-1) = 29.33

Standard Deviation = Sqrt 29.33 = 5.42

Normal Distribution

A normal distribution is a bell-shaped curve. For example, suppose that 10,000 runners finish a marathon race (26.2 miles). A few of the finishers will be among the fastest runners and a few will be the slowest. Most of the finishers will be closer to the mean. The plot below represents the type of data that are expected.

Approximately 68% of the data points fall within one standard deviation of the mean (mean + or – 1 S.D.).

In the example above, this is 31 + or - 5.42

31+5.42 = 36.42

31-5.42 = 25.58

Therefore, approximately 68% of the data will fall between 25.58 and 36.42.

Approximately 95% of the data points fall within two standard deviations of the mean.

31 + (2*5.42) = 41.84

31 – (2*5.42) = 20.16.

Therefore approximately 95% of the data points fall between 20.16 and 41.84.

Approximately 99% of the data points fall within three standard deviations of the mean.

Presenting Data

Tables and graphs are convenient for presenting data. They present the data in an organized format, enabling the reader to find information quickly.

Tables

Tables should be numbered sequentially beginning with “Table 1.” Include a descriptive title. The title should enable the reader to understand the table without reading the rest of the document.

When creating tables, be sure to state the units of all measurements. Some examples are minutes, milliliters, parts per million, etc.

Graphs

Graphs should be created with the independent variable on the X-axis and the dependent variable on the Y-axis. For example, if time is a variable, it goes on the X-axis because it is independent.

Graphs, drawings, and other illustrations are called “Figures.” They should be numbered and titled just like tables.

The scale used on each axis should cover the values in the data set. For example, if you are graphing 52 to 87, start at 45 or 50 and go to 90 or 95.

Bar Graphs

Bar graphs are best when the data are in groups or categories.

Example 1 – plot the number of each kind of mammal that occurs in a woodlot.

Grey squirrel – 8

Red squirrels – 4

Chipmunks - 17

White-footed mice – 26

White-tailed deer – 2

Line graphs

Line graphs are used when the data are continuous.

Example 2 – pH of an a pond in Clinton County , NY on 5/11/05

 1:00 AM – 5.2

3:00 AM – 5.1

5:00 AM – 5.1

7:00 AM – 6.0

9:00 AM – 6.6

11:00 AM – 6.9

1:00 PM – 7.0

3:00 PM – 7.0

5:00 PM – 6.6

7:00 PM – 5.9

9:00 PM – 5.3

11:00 PM – 5.2

Scatter Plots

Scatter plots are useful to see if there is a relationship between two variables.

Example 3 – Suppose that a researcher wants to learn if there is a relationship between the size of the forest and the number of bird species that live in the forest. She collects the following data from different woodlots around Clinton County, NY in January 2006.

Size of Woodlot (ha)

# Bird Species

3.3

11

8

7

3.6

14

1.4

12

1.1

13

11.0

12

7.4

17

6.6

14

14.7

13

8

12

Significant Digits

It is often desirable to round numbers. For most purposes in this laboratory course, numbers should be rounded to 3 significant digits. Some examples below illustrate this concept.

The number 35,832,487 can be rounded to 35,800,000. We use the three digits that are furthest to the left; the rest become zeros. The number 35,852,487 becomes 35,900,000. If the number to the right of the 3rd digit is 5 or greater, the 3rd digit is rounded up. If it is less than 5, it is rounded down.

The number 2.4815 becomes 4.28. The number 2.4855 becomes 2.49.

Exercise

Answer the questions below on the answer sheet.

1) Create and print a graph of each of the data sets given above (mammals, pH, # of bird species). Attach these graphs to the answer sheet. Either of the two programs below (Create A Graph or Excel) can be used. If you do not have Excel installed on your computer, you must use Create A Graph.

Instructions for using Create A Graph

Create A graph

Creating graphs using Excel

Be sure that you have done the following.

a) The graph should have a title. The title should let the reader know what the graph is about. Several sentences may be necessary.

b) Choose a minimum value that is slightly less than the smallest data point on each axis. Choose a maximum value of each axis that is slightly larger than the largest data point. This will stretch the graph out into the entire space. For example, suppose that data on the Y-axis ranged from 12,255 to 12,359. If the minimum value on the Y-axis were 0 and the maximum were 13,000, then all of the data points would be near the top of the graph. However, if the minimum value of the Y-axis were 12,200 and the maximum value were 12,400, then the entire graph space would be used.

c) Label each axis. Units of measurement (mm, ha, etc.) should be indicated.

2) Explain why a bar graph is used for the mammal data.

3) Explain why a line graph is used for the pH data.

4) Explain why a scatter plot is used for the bird species data.

5) Calculate the mean, median and mode for the pH data above. Write your answers on the answer sheet using 3 significant digits.

6) Which measure(s) of central tendency do you think is (are) best to use for the pH data above? Which measure(s) do you think are the worst? Explain your answers to these questions.

7) Calculate the mean and standard deviation for the mammal data (example 1 above). Show all of your work. Your answer should contain 3 significant digits.

8) Calculate the mean number of bird species in example 3. Use the Excel spreadsheet provided to calculate the standard deviation. Write your answers on the answer sheet using 3 significant digits.

Click here to print a blank answer sheet.

The Biology Web Home page