0% found this document useful (0 votes)
102 views174 pages

Bio Statistics

This document outlines the contents and topics covered in a course on medical statistics. The course covers introductory concepts in statistics including descriptive and inferential statistics, samples vs populations, and qualitative vs quantitative variables. It then covers various statistical methods for describing data through frequency tables, graphs, and numerical measures. Additional topics include probability, estimation and confidence intervals, hypothesis testing for one and two samples, analysis of variance, correlation, regression, and statistical software. The overall goal is to teach students statistical techniques and methodology for collecting, analyzing, and interpreting medical and biological data to assist in decision making.

Uploaded by

imdad hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views174 pages

Bio Statistics

This document outlines the contents and topics covered in a course on medical statistics. The course covers introductory concepts in statistics including descriptive and inferential statistics, samples vs populations, and qualitative vs quantitative variables. It then covers various statistical methods for describing data through frequency tables, graphs, and numerical measures. Additional topics include probability, estimation and confidence intervals, hypothesis testing for one and two samples, analysis of variance, correlation, regression, and statistical software. The overall goal is to teach students statistical techniques and methodology for collecting, analyzing, and interpreting medical and biological data to assist in decision making.

Uploaded by

imdad hussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 174

MIDICAL STATISTICS

OTHER BY LACTURER: ABDURAHMAN IBRAHIM ABDI


Contents/Topics Chapters

What Is Statistics:Introduction 1
 List ways that statistics is
used.
 Know the differences
between descriptive and
inferential statistics.
 Understand the differences
between a sample and a
population.
 Explain the difference
between qualitative and
quantitative variables.

Describing Data: Frequency Tables, 2


Frequency Distributions, and Graphic
Presentation:

 Organize qualitative data


into a frequency table.
 Present a frequency table
as a bar chart or a pie
chart.
 Organize quantitative data
into a frequency
distribution.
 Present a frequency
distribution for quantitative
data using histograms,
frequency polygons, and
cumulative frequency
polygons.
3
Describing Data: Numerical Measures
 Explain the concept of central
tendency.
 Identify and compute the arithmetic
mean.
 Determine the median.
 Identify the mode.
 Explain and apply measures of
dispersion.
 Compute and explain the variance
and the standard deviation.
 Compute the mean and standard
deviation of grouped data.

Probability 4

 Basic Probability for One Event


 Outcomes with Two Events
 Probability Using Listings
 Multiplication Law for Independent
Events
 Conditional Probability

Estimation and Confidence Intervals 5


 Define point estimate.
 Define level of confidence.
 Compute a confidence interval
for the population mean when
the population standard
deviation is known.
 Compute a confidence interval
for the population mean when
the population standard
deviation is unknown.
 Compute a confidence interval
for a population proportion.
 Calculate the required sample
size to estimate a population
proportion or opulation mean.

One-Sample Tests of Hypothesis: 6


 Introduction
 What Is a Hypothesis?
 What Is Hypothesis Testing?
 Five-Step Procedure for Testing a Hypothesis
 One-Tailed and Two-Tailed Tests of
Significance
 Testing for a Population Mean: Known
Population Standard Deviation
 p-Value in Hypothesis Testing
 Testing for a Population Mean: Population
Standard Deviation Unknown
 Tests Concerning Proportions
 Type II Error

Two-Sample Tests of Hypothesis: 7


 Introduction
 Two-Sample Tests of Hypothesis: Independent
Samples
 Two-Sample Tests about Proportions
 Comparing Population Means with Unknown
Population Standard Deviations
 Two-Sample Tests of Hypothesis: Dependent
Samples
 Comparing Dependent and Independent
Samples

Analysis of Variance: 8
 Explain the concept of central
tendency.
 Identify and compute the arithmetic
mean.
 Determine the median.
 Identify the mode.
 Explain and apply measures of
dispersion.
 Compute and explain the variance and the
standard deviation.
 Compute the mean and standard deviation of
grouped data.

Correlation and Linear Regression : 9


 Introduction
 What Is Correlation Analysis?
 The Correlation Coefficient
 Testing the Significance of the Correlation
Coefficient
 Regression Analysis
 Testing the Significance of the Slope
 Evaluating a Regression Equation’s Ability to
Predict
 Interval Estimates of Prediction
 Transforming Data

Multiple Regression Analysis: 10


 Introduction
 Multiple Regression Analysis
 Evaluating a Multiple Regression Equation
 Inferences in Multiple Linear Regression
 Evaluating the Assumptions of Multiple
Regression
 Qualitative Independent Variables
 Regression Models with Interaction
 Stepwise Regression

SPSS and Tool Pack Software’s 11


Chapter One: Introduction to Biostatistics

1
Learning Objectives
LO1: List ways that statistics is used.
LO2: Know the differences between descriptive
and Inferential statistics.
LO3: Understand the differences between a
sample and a population.
LO4 Explain the difference between qualitative
and quantitative variables.
1.1. What is meaning by Statistics?

 It is very difficult to define a fast growing subject like


statistics. In 1935 W.F. Willcox listed over a hundred
definitions of statistics and the list was even then far from
being exhaustive. Some definitions are old and narrow
while others are modern and more comprehensive.

 Today the subject statistics originated from two quite


dissimilar field viz. political and probability theory.

 Definition According to professor R.A. Fisher, “ The


science of statistics is essentially a branch applied
mathematics and may be regarded as mathematics
applied to observational data”

 Here statistics is considered as branch of applied


mathematics.

 Definition Croxton and cowden gave more


comprehensive definition According to them statistics my
be defined as: “The science which deals with collection
presentation, analysis interpretation of numerical data ”

 This definition clear points out of four stages in statistics


investigation but excludes the organization of data and
inferential statistics. Moreover statistics is considered here
as a branch of science only.

 So modern and more comprehensive definition of


statistics my be given by “ statistics is a branch of
scientific knowledge refers to the body of techniques and
methodology developed for the collection, classification,
organization, presentation and analysis of statistical data
and for the use of such data in decision-making in the face
of uncertainty in any field of enquiry”

 Definition: Statistics is the science of collecting,


organizing, presenting, analyzing, and interpreting
numerical data to assist in making more effective
decisions.
 Definition of Biostatistics: is the branch of applied
statistics directed toward applications in the health
sciences and biology.
 Bio-statistics in Various Area:
a. Health statistics
b. Medical statistics
c. Vital statistics
 In public Health or Community Health, it is called Health
Statistics
 In Medicine, it is called Medical Statistics. In this we
study the defect, injury, disease, efficacy of drug,
treatment etc..
 In population related study it is called Vital Statistic.
Example study vital events like births, marriages, and
deaths.
Application and uses of Biostatistics as a science
1. In physiology
a. To find what is normal/health in a population.
b. To find limits of normality.
c. To find difference between mean and proportion of
normal at a two places or in different periods.
d. To find the correlation between two variables 𝑿 and 𝒀
such as in height or weight.
2. In pharmacology
a. To find action of drug
b. To compare action of two drugs
c. To find relative potency of a new drug with respect to
a standard drug.
3. In Medicine
a. To compare efficacy of particular drug, operation or
line of treatment
b. To find association between two attributes eg. Oral
cancer and smoking.
c. To identify signs and symptoms of diseas

Who Uses Statistics?

 The uses of statistics are unlimited. It is much harder to


name a field in which statistics is not being used. To day
statistics tools are used in every spheres of life such as
are used extensively by marketing, accounting, quality
control, consumers, hospital administrators, industry,
biology, business, educators, politicians, physicians, etc...
Why Study Statistics?

If you look through your university Catalog, you will find that
statistics is required for many college programs. Why is this
so? What are the differences in the statistics courses taught in
the Bio-statistics College, Engineering College, the Psychology
or Sociology Departments in the Liberal Arts College, and the
College of Business? The biggest difference is the examples
used. The course content is basically the same.

 In Bio-statistics we are interested in such things as health


sciences and biology

 In the College of Business we are interested in such things as profits,


hours worked, and wages.
 Psychologists are interested in test scores.
 Engineers are interested in how many units are
manufactured on a particular machine.
 However, all four are interested in what is a typical value
and how much variation there is in the data. There may
also be a difference in the level of mathematics required.
An engineering statistics course usually requires calculus.
Statistics courses in colleges of business and education
usually teach the course at a more applied level. You
should be able to handle the mathematics in this text if
you have completed high school algebra.
1.2. Types of Statistics
The study of statistics is usually divided into two
categories: descriptive statistics and inferential statistics.
Descriptive Statistics
The definition of statistics given earlier referred to
“organizing, presenting, analyzing
. . . data.” This facet of statistics is usually referred to as
descriptive statistics.
1. DESCRIPTIVE STATISTICS Methods of organizing,
summarizing, and presenting data in an informative way.

 For instance, the United States government reports the


population of the United States was 179,323,000 in
1960; 203,302,000 in 1970; 226,542,000 in 1980;
248,709,000 in 1990; 265,000,000 in 2000; and
308,400,000 in 2010. This information is descriptive
statistics. It is descriptive statistics if we calculate the
percentage growth from one decade to the next.
However, it would not be descriptive statistics if we
used these to estimate the population of the United
States in the year 2020 or the percentage growth from
2010 to 2020. Why? The reason is these statistics are
not being used to summarize past populations but to
estimate future populations. The following are some
other examples of descriptive statistics.
 There are a total of 46,837 miles of interstate highways
in the United States. The interstate system represents
only 1 percent of the nation’s total roads but carries
more than 20 percent of the traffic. The longest is I-90,
which stretches from Boston to Seattle, a distance of
3,099 miles. The shortest is I-878 in New York City,
which is 0.70 of a mile in length. Alaska does not have
any interstate highways, Texas has the most interstate
miles at 3,232, and New York has the most interstate
routes with 28.
 The average person spent $103.00 on traditional
Valentine’s Day merchandise in 2010. This is an
increase of $0.50 from 2009. As in previous years, men
will spend nearly twice the amount women spend on
the holiday. The average man spent $135.35 to impress
the people in his life while women only spent $72.28.
Family pets will also feel the love, the average person
spending $3.27 on their furry friends, up from $2.17
last year.
2. Inferential Statistics: The second type of statistics is
inferential statistics—also called statistical inference.
Our main concern regarding inferential statistics is finding
something about a population from a sample taken from
that population. For example, a recent survey showed
only 46 percent of high school seniors can solve problems
involving fractions, decimals, and percentages; and only
77 percent of high school seniors correctly totaled the cost
of a salad, burger, fries, and a cola on a restaurant menu.
Since these are inferences about a population (all high
school seniors) based on sample data, we refer to them as
inferential statistics. You might think of inferential
statistics as a “best guess” of a population value based on
sample information.
 INFERENTIAL STATISTICS The methods used to
estimate a property of a population on the basis of a
sample.
1.3. Population and Sample
 Definition of Population: is the totality or collection of all
objects or individuals on which observations are taken on
the basis of some characteristic of the objects in any field.
 A population is a collection of all possible individuals,
objects, or measurements of interest.
 Examples the population may be
1) All workers of a hospital
2) All employees of firm
3) All students in ZAMZAM university
 Definition: each individual of a population is called an
experimental unit. Observations are collected on
experimental units.
Example: workers, employees, and students are the
experimental unit of the above population.
 Population my be finite or infinite
 Definition finite population: A population is called finite if
it is contains finite number of experimental unit.
 Examples: all three examples mentioned above are
example of finite population.
 Definition infinite population: A population is called infinite
if it is contains infinite number of experimental unit.
 Example: in a coin tossing experiment ,number tosses
required to get a head.
 Definition of sample: A sample is a part of population
that is taken and considered for study. Sample should
represent the population characteristics under study.
Examples of sample
1) Some workers of a hospital.
2) Some employees of firm.
3) Some students in SIMAD University.
 Usually sample is small but representative part of
population which is contains a finite number of
observations.
1.4. Types of Variables
 It is very important concept in statistics.
 Definition : A variable is a changeable characteristic of
experimental units under consideration. It is customary to
represent variables by the last capital letters of English
alphabets. That means the variable generally denoted by
X, Y, Z etc.
 Examples
1. No. of patients.
2. Height
3. Sex etc.
4. Educational Level
 The bold words of above examples denoted the variables.
According to whether a variable takes numerical or non-
numerical values, it can be classified into two categories.
Types of Variables:
i. Qualitative and. Quantitative variable
a. Qualitative Variables:
The values of a qualitative variable are words or
attributes indicating to which category an element
belong.
Examples: Blood type, Nationality, Students Grades –
Educational level eye color etc……….
b. Quantitative variable
 Quantitative variable is a characteristic that can be
measured. The values of a quantitative variable are
numbers indicating how much or how many of
something.
Examples:
(i) Family Size (ii) No. of patients
(iii) Weight (iv) height
 Types of Quantitative Variables:
a. Discrete Variables: There are jumps or gaps
between the values.
Examples: - Family size (x = 1, 2, 3, … )
Number of patients (x = 0, 1, 2, 3, … )
 Discrete variables: A variable which can take only
isolated or countable finite or infinite number of
values is called a discrete variable. Examples:
number of children per family.

b. Continuous Variables: There are no gaps between


the values. A continuous variable can have any
value within a certain interval of values.
Examples: - Height (140 < x < 190) , Blood sugar
level (10 < x < 15)

 Continuous variable A variable which can take


infinitely may be values in a certain range is called
continuous variable. EXAMPLE: The pressure in a
tire, the weight of a pork chop, or the height of
students in a class.
Summary of Types of Variables
Chapter Two: Describing Data: Frequency Tables,
Frequency Distributions, and Graphic Presentation
G

Learning Objectives When you have completed This chapter,


you will be able to:

• LO1 Organize qualitative data into a frequency table.

• LO2 Present a frequency table as a bar chart or a pie


chart.

• LO3 Organize quantitative data into a frequency


distribution.

• LO4 Present a frequency distribution for quantitative


data using histograms, frequency polygons, and
cumulative frequency polygons.
2.1. Organize qualitative data into a frequency
table.

Recall from Chapter 1 that techniques used to describe a set


of data are called descriptive statistics. To put it another way,
descriptive statistics organize data to show the general pattern
of the data and where values tend to concentrate and to
expose extreme or unusual data values. The first procedure we
discuss is a frequency table.

In Chapter 1, we distinguished between qualitative and


quantitative variables. To review, a qualitative variable is
nonnumeric, that is, it can only be classified into distinct
categories. There is no particular order to these categories.
Examples of qualitative data include political affiliation
(Republican, Democrat, Independent), state of birth (Alabama,
. . . , Wyoming)

 Constructing a Frequency Distribution for


qualitative.
i. Choose the category into which the data are to be
grouped.
ii. Tally the items or observation into appropriate categories.
iii. Count the number of items or observations falling in each
category.
iv. Display the results in a FREQUENCY TABLE.

Example 1:
 The grades of students in biostatistics are below:
 C, C, D, A, C, C, C, C, C, C, C, A, A, D, C, C, B, B, C, C, D,
D, A, B, B, B, C, C, D, D, C, B, B, D, D, C, C, D, A, C, C,
C, C, B, C C, A, D, D, B, B, B, C, B, B, C, B, B, A, D
 Construct a frequency distribution table for the above
categorical data and comment.
Solution: Table frequency distribution of shirts sold by
the departmental shore

GRADE Frequency (# of
grade)
B 15
C 26
D 12
A 7
Total 60

 Comment: This frequency distribution provides summary


information of the distribution of the different of grades of
students in biostatistics. Viewing the frequency
distribution we can say that Grade C is the most popular
grade of students in biostatistics.
 Relative and Percent Frequency Distribution
 To get more information about characteristics of the data,
we need relative and percent frequency distribution
obtained in data
 A relative frequency captures the relationship between a
class total and the total number of observations

Example 2:
 Construct relative and percent frequency distribution table
for the frequency distribution gives Example(1)

GRADE Frequency (# of
grade)

B 15
C 26
D 12
A 7
Total 60
Solution

Table(2): Relative & Percent

Grad Frequency (# of Relative Percent


grade) Frequency
B 15 0.25 25
C 26 0.43 43
D 12 0.20 20
A 7 0.12 12
Total 60 1.00 100

Comment: The percentage of the Grade C is is 43%, which is


most frequence. On the other hand, the lowest percentage is
the Grade A, which is only 12%. This table gives us better
information about the characteristics of the data.
2.2. Present a frequency table as a bar chart or a
pie chart.

Bar Charts
Type one Simple bar diagram
 Simple bar diagram is most popular diagrammatical
representation of qualitative data. By simple bar diagram,
only one qualitative variable can be exhibited.
 Firs a frequency distribution table of qualitative data is
constructed.
 Then a bar of a fixed width above each category is
drowning.
 Actually bars are looked like rectangles of equal width
over classes.
 The height of bar over each class is equal to the class
frequency .
Example: 3
a. Construct a bar diagram with The frequency distribution.
b. Construct a bar diagram with The precentege frequency
distribution.
Grade Frequency (# number of
shirts)
B 15
C 26
D 12
A 7
Total 60

 Solution
a. The bar diagram of the above data .

Frequency (# of shirts)
30

25

20

15
Frequency (# of shirts)

10

0
small medium large Extra large

b. Percent bar
Percent
50
45
40 43
35
30
25
25 Percent
20
15 20

10 12
5
0
Small Mediam Large Extra-large

Type two Multiple bar diagram


 Multiple bar diagram: In Multiple bar diagram two or more
sets of interrelated are represented.
 The technique of drowning such a diagram is the same as
that of a simple bar diagram.
 The only difference is that since more than one
phenomenon is represented.
 Different shades, colors, dots are used to distinguish
between the bars.

Example 4

 The following data give the pieces of different shirts sold


by the departmental stores for last five years drown a
multiple bar diagram.
Years/Size of 2005 2006 2007 2008 2009
the shirt

Small 40 50 60 80 100

Medium 100 175 225 300 370

Large 60 90 140 180 220

Extra-Large 20 40 40 60 90

Solution

 The multiple bar diagram for the above data can be


constructed by plotting the different year in the horizontal
axis and all types of sizes in vertical axis. The multiple bar
diagram is shown in figure blow.

Multiple bar diagram of different Size of shirts for


the year 2005-2009
400
350
300 Small
250
Medium
200
150 Large
100 Extra-Large
50
0
2005 2006 2007 2008 2009
Pie Charts

CONSTRUCTION OF PIE CHART

1. Convert the absolute frequency into relative frequencies


for each category of the variable.

2. Multiply the relative frequencies so converted by 360 for


each category. The resulting values are the angles
expressed in degrees.

3. Check that column obtained in step (2) adds to 360.

4. Draw a circle of appropriate radius.

5. Preset the figures obtained in step (2) in the circle with


the help of a protractor. The resulting figure is the desired
pie diagram of your data.
Example 05

 Construction a pie chart with the flowing frequency table.

Size of shirts Frequency (#


n= sh)

small 15

medium 26

large 12

Extra large 7

Total 60

Solution

Size of Frequency Relative Percent Angles


shirts (# of Frequency in
shirts) degree

small 15 0.25 25 90

medium 26 0.43 43 154.8

large 12 0.2 20 72

Extra 7 0.12 12 43.2


large

Total 60 1 100 360


Extra large
12% small
25%

large
20%

medium
43%
pie giagram

2.3. Constructing Frequency Distributions:


Quantitative Data
 There are two types of quantitative data namely.

1. Discrete data and

2. Continuous data.

Frequency distribution of discrete data: we get discrete


data from discrete variable and a discrete frequency
distribution can be obtained from it. If the values of the
discrete variable are finite and limited then each value of the
variable can be considered as a class.
Procedure of construction frequency distribution of
discrete data

1. The first column contains the classes. Here the number of


classes will be the possible values of the discrete variable.

2. The second column contains tally marks. The number of


observations in each class is denoted by the marks.

3. The number of observations falls in each class is the class


frequency. It is shown in the third column.

Example:6

 The management of the a factory wants to know the


family structure of the workers of his factory to educate
their children. For this they collected data on the number
of children from the 45 workers of the factory. They are
follows

 Construct a frequency table with the above data.


Solution

 Table shows frequency distribution of the children for 45


workers of a factory

Number Frequency
of
children:
X

0 1

1 5

2 4

3 6

4 16

5 9

6 4

Total 45

Comment: Table gives us information about the distribution of


the number children of the 45 families of workers of the
factory. It is seen that 16 families have 4 children and only one
family have no children. The picture will be clear if we construct
a relative frequency and percent frequency distribution of data.
construct a relative frequency and percent
frequency

Number Frequency Relative %


of Frequency
children:
X

0 1 0.02 2

1 5 0.11 11

2 4 0.09 9

3 6 0.13 13

4 16 0.36 36

5 9 0.20 20

6 4 0.09 9

Total 45 1 100

 Comment: It appears that 36% families have four


children, 20% families have 5 children, 13% families have
three children. Now the management of the factory will be
able to recommend manageable size school building for
the children of the workers.

Construction of a frequency Distribution for


continuous data

 Important steps for a Constructing of a frequency


Distribution for continuous table.

1. The number of classes depends on the range of the data.


Range is the difference between the largest value and the
smallest value of the data set. Rang = largest value –
smallest value

2. Number of class: Number of class should not be too large


or too small. As a general rule, the number of class
should be rang from 5 to 25. Another rule of thumb is that
the number of classes should be around √𝑛 where n is
number of observational data. Another rule the number of
class can be determined using formula

𝑟𝑎𝑛𝑔𝑒
3. 𝑊𝑖𝑑𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠

4. Tally marks observations are counted and marked by


tally marks

5. Number of columns: usually there will be three columns


in a frequency table: class interval, tally marks, and
frequency.
 There are two methods of classifying the data according to
class intervals, namely

a) Exclusive method, and.

b) Inclusive method

 Exclusive method: when the class-intervals are so fixed


that the upper limit of one class is the lower limit of the
next class
 Inclusive method:under the inclusive method of
classification the upper limit of one class is included in
that class itself

Frequency Distribution

 Class midpoint: A point that divides a class into two equal


parts. This is the average of the upper and lower class
limits.
 Class frequency: The number of observations in each
class.
 Class interval: The class interval is obtained by
subtracting the lower limit of a class from the lower limit
of the next class.
Example 7

Example:
The following table gives the hemoglobin level (g/dl) of a
sample of 50 men.
We wish to summarize these data using the following
class
 From frequencies: The number of people whose
hemoglobin levels are between 17.0 and 17.9 = 10
 From cumulative frequencies: The number of people
whose hemoglobin levels are less than or equal to 15.9 =
23 The number of people whose hemoglobin levels are
less than or equal to 17.9 = 49
 From percentage frequencies: The percentage of
people whose hemoglobin levels are between 17.0 and
17.9 = 20%
 From cumulative percentage frequencies: The
percentage of people whose hemoglobin levels are less
than or equal to 14.9 = 16% The percentage of people
whose hemoglobin levels are less than or equal to 16.9 =
78%
Example 7
The following data relate to the audit-time of 20 clients:

10, 15, 20, 28, 13, 18, 24, 29, 12, 16,
23, 34, 14, 17, 22, 17, 21, 16, 18, 19
Solution

Rang = largest value – smallest value

Class boundaries and mid-point of a class


Graphic Presentation of a Frequency Distribution

The three commonly used graphic forms are:

 Histograms

 Frequency polygons

 Cumulative frequency distributions

Histogram

 Histogram for a frequency distribution based on


quantitative data is very similar to the bar chart showing
the distribution of qualitative data. The classes are
marked on the horizontal axis and the class frequencies
on the vertical axis. The class frequencies are represented
by the heights of the bars.

Assignment One

1. The administration of a factory want to known the health


status of the workers of his factory, they conducted a
survey on 55 workers the health conditions of the 55
workers of the factory were found as follows:

G, P, A, P, P, A, A, A, A, A, A, A, A, A, A,

A, P, G, G, P, A, A, P, G, P, A, A, P, G, A,
A, G, P, A, A, A, A, G, P, P, A, A, G, P, P,
P, A, G, A, A, A, A, P, P, G

Here G, A, and P denote good, Average, and poor health


respectively.
a. Construct a frequency distribution table and comment.
b. Construct a relative and percent frequency distribution.
c. Construct a simple bar diagram with frequency table.
d. Construct a simple bar diagram with percent.
2. Twenty-five army inductees were given a blood test to
determine their blood type. The data set is

a) Construct a frequency distribution table and


comment.
b) Construct a relative and percent frequency
distribution.
c) Construct a simple bar diagram with frequency table.
d) Construct a simple bar diagram with percent
3. This data involves keypunching errors made by a data-
entry operator. To be entered were 156 lines of data, each
line containing data on the number of crib deaths for a
particular month in King County, Washington, for the
years 1965–1977. Other data on
a) Construct a frequency distribution table and
comment.
b) Construct a relative and percent frequency
distribution.
c) Construct a simple bar diagram with frequency table.
d) Construct a simple bar diagram with percent
4. In the following table the weights of 40 male students at
State University are recorded to the nearest pound.
a. Construct a frequency distribution. For exclusive and
inclusive methods.
b. Construct a relative and percent frequency distribution
and cumulative
5. In the following, the heights of 45 female students at
Midwestern University are recorded to the nearest inch.
construct a histogram.
Chapter Three: Describing Data:
Numerical Measures

Learning Objectives
When you have completed
this chapter, you will be
able to:
LO1 Explain the concept of central tendency.
LO2 Identify and compute the arithmetic mean.
LO3 Determine the median.
LO4 Identify the mode.
LO5 Explain and apply measures of dispersion.
LO6 Compute and explain the variance and the standard
deviation.
LO7 Compute the mean and standard deviation of grouped
data.
3.1. Introduction to the concept of central
tendency
 Chapter 2 began our study of descriptive statistics. To summarize
raw data into a meaningful form, we organized qualitative data into
a frequency table and portrayed the results in a bar chart. In a
similar fashion, we organized quantitative data into a frequency
distribution and portrayed the results in a histogram. We also
looked at other graphical techniques such as pie charts to portray
qualitative data and frequency polygons to portray quantitative
data.
 This chapter is concerned with two numerical ways of describing
quantitative variables, namely, measures of location and
measures of dispersion. Measures of location are often referred
to as averages. The purpose of a measure of location is to pinpoint
the centre of a distribution of data. An average is a measure of
location that shows the central value of the data. Averages appear
daily on TV, on various websites, in the newspaper, and in other
journals. Here are some examples:
 The average U.S. home changes ownership every 11.8 years.
 An American receives an average of 568 pieces of mail per
year.
 The average American home has more TV sets than people.
There are 2.73 TV sets and 2.55 people in the typical home.
 We begin by discussing measures of location. There is not
just one measure of location; in fact, there are many. We
will consider three: the arithmetic mean, the median,
and the mode, . The arithmetic mean is the most widely
used and widely reported measure of location. We study the
mean as both a population parameter and a sample statistic.
3.2. Arithmetic mean: of the Population

 For raw data—that is, data that have not been grouped in a
frequency distribution— the population mean is the sum of
all the values in the population divided by the number of
values in the population. To find the population mean, we
use the following formula
𝐒𝐮𝐦 𝐨𝐟 𝐚𝐥𝐥 𝐭𝐡𝐞 𝐯𝐚𝐥𝐮𝐞𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧 𝐦𝐞𝐚𝐧 =
𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐯𝐚𝐥𝐮𝐞𝐬 𝐢𝐧 𝐭𝐡𝐞 𝐩𝐨𝐩𝐮𝐥𝐚𝐭𝐢𝐨𝐧
 Instead of writing out in words the full directions for
computing the population mean (or any other measure), it is
more convenient to use the shorthand symbols of
mathematics. The mean of the population using
mathematical symbols is:

 where:
𝝁 = Represents the population mean. It is the Greek lowercase
letter “mu.”
𝑵 = is the number of values in the population.
𝑿 = represents any particular value.
𝜮 = is the Greek capital letter “sigma” and indicates the
operation of adding.
𝜮𝑿 = is the sum of the X values in the population.
 Any measurable characteristic of a population is called a
parameter. The mean of a population is an example of
a parameter.
The Sample Mean
 For raw data—that is, ungrouped data—the mean is the
sum of all the sampled values divided by the total number
of sampled values. To find the mean for a sample:

The mean of a sample and the mean of a population are


computed in the same way, but the shorthand notation
used is different. The formula for the mean of a sample is:
Example:
Suppose that we have a population of N = 5 values:
The arithmetic mean is a widely used measure of location. It
has several important properties:
1. Every set of interval- or ratio-level data has a mean.
Recall from Chapter 1 that ratio-level data include such
data as ages, incomes, and weights, with the distance
between numbers being constant.
2. All the values are included in computing the mean.
3. The mean is unique. That is, there is only one mean in a
set of data. Later in the chapter, we will discover an
average that might appear twice, or more than twice, in a
set of data.
4. The sum of the deviations of each value from the
mean is zero. Expressed symbolically:

∑( 𝑥 − 𝑥̅ )
 MEDIAN: The midpoint of the values after they have
been ordered from the smallest to the largest, or the
largest to the smallest.
3.4. Mode

Advantages and disadvantages of the mode:


3.5.
 We will consider several measures of dispersion. The range is
based on the largest and the smallest values in the data set,
that is, only two values are considered. The mean deviation,
the variance, and the standard deviation use all the values in
a data set and are all based on deviations from the
arithmetic mean.
 Range: The simplest measure of dispersion is the range. It
is the difference between the largest and the smallest values
in a data set. In the form of an equation:

 Mean Deviation: A defect of the range is that it is


based on only two values, the highest and the lowest; it
does not take into consideration all of the values. The
mean deviation does. It measures the mean amount
by which the values in a population, or sample, vary
from their mean. In terms of a definition:
 MEAN DEVIATION The arithmetic mean of the
absolute values of the deviations from the arithmetic
mean
 Why do we ignore the signs of the deviations from
the mean? If we didn’t, the positive and negative
deviations from the mean would exactly offset each
other, and the mean deviation would always be zero.
Such a measure (zero) would be a useless statistic.
Example
Solution

3.6. Variance and Standard Deviation


 The variance and standard deviation are also based
on the deviations from the mean. However, instead of
using the absolute value of the deviations, the variance
and the standard deviation square the deviations.
 VARIANCE The arithmetic mean of the squared
deviations from the mean. The variance is non-
negative and is zero only if all observations are the
same.
 STANDARD DEVIATION The square root of the
variance.
 Population Variance The formulas for the population
variance and the sample variance are slightly different.
The population variance is considered first. (Recall that a
population is the totality of all observations being
studied.) The population variance is found by:

 Where:

𝝈𝟐 Is the population variance (𝝈 is the lowercase Greek

letter sigma). It is read as “sigma squared.”

𝑿 Is the value of an observation in the population

𝜇 Is the arithmetic mean of the population.


𝑵 is the number of observations in the population.
Note the process of computing the variance.
1. Begin by finding the mean.
2. Find the difference between each observation and the
mean, and square that difference.
3. Sum all the squared differences.
4. Divide the sum of the squared differences by the number
of items in the population.
Example

Solution
Sample Variance The formula for the population mean We
just changed the symbols for the sample mean
themconversion from the population variance to the sample
variance is not as direct. It requires a change in the
denominator. Instead of substituting n (number in the
sample) for N (number in the population), the denominator
is n - 1. Thus the formula for the sample variance is:

 Where:
𝑠 2 is the sample variance.
𝑥 is the value of each observation in the sample.
𝑥̅ is the mean of the sample.
𝑛 is the number of observations in the sample.
Example
The hourly wages for a sample of part-time employees at
Home Depot are: $12, $20, $16, $18, and $19. What is the
sample variance?
Solution
 Sample Standard Deviation The sample standard
deviation is used as an estimator of the population
standard deviation. As noted previously, the population
standard deviation is the square root of the population
variance. Likewise, the sample standard deviation is the
square root of the sample variance. The sample standard
deviation is most easily determined by:
3.7. The Mean and Standard Deviation
of Grouped Data
 Arithmetic mean of grouped data: To approximate
the arithmetic mean of data organized into a frequency
distribution, we begin by assuming the observations in
each class are represented by the midpoint of the class.
The mean of a sample of data organized in a frequency
distribution is computed by:

 Where:
̅ = The designation for the sample mean.
𝒙
𝑴 = The midpoint of each class
𝒇 = The frequency in each class
𝒇𝑴 = The frequency in each class times the midpoint of
the class.
∑ 𝒇𝑴 = The sum of these products
𝒏 = The total number of frequencies
 Standard Deviation: To calculate the standard
deviation of data grouped into a frequency distribution,
we use this formula
 Where :
𝒔 = The symbol for the sample standard deviation
𝑴= is the midpoint of the class
𝒇 = Is the class frequency
𝒏= is the number of observations in the sample.
̅ = The designation for the sample mean.
𝑿
Example
Determine the sample mean, sample variance and the sample
standard deviation of the following frequency distribution.

Solution
class frequency midpoint FM
0 to 5 2 2.5 5
5 to 10 7 7.5 52.5
10 to 15 12 12.5 150
15 to 20 6 17.5 105
20 to 25 3 22.5 67.5

Total 30 380

∑ 𝑓𝑀 380
̅=
𝒎𝒆𝒂𝒏 = 𝒙 = = 12.6
𝑛 30
class 𝒇 𝑴 FM ̅
𝒎−𝒙 ̅ )𝟐
(𝑴 − 𝑿 ̅ )𝟐
𝒇(𝑴 − 𝑿
0 to 5 2 2.5 5 -10.17 103.4289 517.1445
5 to 10 7 7.5 52.5 -5.17 26.7289 1403.267
10 to 15 12 12.5 150 -0.17 0.0289 4.335
15 to 20 6 17.5 105 4.83 23.3289 2449.535
20 to 25 3 22.5 67.5 9.83 96.6289 6522.451
Total 30 380 10896.73

̅ )𝟐
∑ 𝒇(𝑴 − 𝑿 𝟏𝟎𝟖𝟗𝟔. 𝟕𝟑
𝒔𝒂𝒎𝒑𝒍𝒆 𝒗𝒂𝒓𝒊𝒂𝒏𝒄𝒆 = 𝒔𝟐 = = = 𝟑𝟕𝟓. 𝟕𝟓
∑𝒇 − 𝟏 𝟐𝟗
̅ )𝟐
∑ 𝒇(𝑴− 𝑿
𝐬𝐚𝐦𝐩𝐥𝐞 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐝𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧 = 𝐬 = √ ∑ 𝒇−𝟏
= √𝟑𝟕𝟓. 𝟕𝟓 = 19.38427
4 Probability - Two Events
10.1 Recap: Basic Probability for One Event
In this section we revise the use of probabilities for single events, remembering that:

number of successful outcomes


Probability of an event =
number of possible outcomes

Example 1
A tube of sweets contains 10 red sweets, 7 blue sweets, 8 green sweets and 5 orange
sweets. If a sweet is taken at random from the tube, what is the probability that it is:
(a) red,
(b) orange,
(c) green or red,
(d) not blue ?

Solution

There are 30 sweets in the tube.

(a) There are 10 red sweets in the tube, so


10
p ( red ) =
30
1
=
3

(b) There are 5 orange sweets in the tube, so


5
p (orange) =
30
1
=
6

166
MEP Y8 Practice Book A

(c) There are 8 green sweets and 10 red sweets in the tube, so
8 + 10
p (green or red) =
30
18
=
30
3
=
5

(d) There are 23 sweets that are not blue in the tube, so
23
p ( not blue) =
30

Example 2
Nine balls, each marked with a number from 1 to 9, are placed in a bag and one
ball is taken out at random. What is the probability that the number on the ball is:
(a) odd,
(b) a multiple of 3,
(c) a 5,
(d) not a 7 ?

Solution
There are 9 possible outcomes in each case.
(a) There are 5 possible odd numbers, so
5
p (odd ) =
9

(b) There are 3 numbers that are multiples of 3, so


3
p ( multiple of 3) =
9
1
=
3

(c) There is only 1 ball numbered 5, so


1
p (5) =
9

(d) There are 8 numbers that are not 7, so


8
p ( not 7) =
9

167
04.1 MEP Y8 Practice Book A

Exercises
1. There are 16 girls and 8 boys in the tennis club. One of these is chosen at
random to enter a competition. What is the probability that a girl is chosen?

2. A bag contains 8 blue balls, 7 green balls and 5 red balls. A ball is taken at
random from the bag. What is the probability that the ball is:
(a) red,
(b) blue,
(c) green,
(d) yellow?

3. A card is taken at random from a standard 52-card pack of playing cards.


What is the probability that it is:
(a) a seven,
(b) a heart,
(c) a red card,
(d) a red six ?

4. If you roll a fair dice, what is the probability that the number you get is:
(a) 5
(b) an odd number,
(c) a number greater than 1,
(d) a multiple of 4 ?

5. Ishmail writes a computer program that produces at random one of the digits
0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
What is the probability that the program produces:
(a) an even number,
(b) a multiple of 4,
(c) a number less than 7,
(d) a multiple of 5 ?

6. The police line up 10 people in an identity parade; only one of the people is
the criminal. A witness does not recognise the criminal and so chooses a
person at random. What is the probability that:
(a) the criminal is chosen,
(b) the criminal is not chosen ?

168
MEP Y8 Practice Book A

7. There are 18 boys and 17 girls in a class. One of these pupils is selected at
random to represent the class. What is the probability that the pupil selected
is a girl?

8. In Hannah's purse there are three £1 coins, five 10p coins and eight 2p coins.
If she takes a coin at random from her purse, what is the probability that it is:
(a) a £1 coin,
(b) a 2p coin,
(c) not a £1 coin,
(d) a £1 coin or a 10p coin ?

9. Some of the children in a class write down the first letter of their surname on
a card; these cards are shown below:

W M G S J
S E S A
W H H E
M T S I

(a) One of these cards is taken at random. What is the probability that the
letter on it is:
(i) W,
(ii) S or T,
(iii) J or M,
(iv) not H
(v) a vowel ?
(b) Which letter is the most likely to be chosen?

10. Rachel buys a new CD, on which is her favourite track, 8 other tracks she
likes and 2 tracks that she does not like. She sets her CD player to play at
random. What is the probability that the first track it plays is:
(a) Rachel's favourite,
(b) a track that she likes,
(c) a track that she does not like ?

169
MEP Y8 Practice Book A

04.2 Outcomes with Two Events


When two events take place at the same time, it is important to list all the possible
outcomes in some way. There are three possible approaches: systematic listing,
using a table or using a tree diagram.

Example 1
Caitlin and Dave each buy a chocolate bar from a vending machine that sells Aero,
Bounty, Crunchie and Dime bars.
List the possible pairs of bars which Caitlin and Dave can choose.

Solution
Caitlin Dave
A A
A B
A C
A D
B A
B B
B C
B D A = Aero
B = Bounty
C A
C = Crunchie
C B
D = Dime
C C
C D
D A
D B
D C
D D

170
MEP Y8 Practice Book A

Example 2
A fair dice is rolled and an unbiased coin is tossed. Draw a table to show the
possible outcomes.

Solution
Possibilities DICE
for coin Possibilities
1 2 3 4 5 6
for dice
H H1 H2 H3 H4 H5 H6
COIN
T T1 T2 T3 T4 T5 T6

The table shows that there are 12 possible outcomes.

Example 3
Draw a table to show all the possible total scores when two fair dice are thrown at
the same time.

Solution

DICE B
1 2 3 4 5 6
The table shows that there are 36
1 2 3 4 5 6 7
possible outcomes, and gives the
D 2 3 4 5 6 7 8 total score for each outcome.
I
C 3 4 5 6 7 8 9
From the table it can be seen that
E
4 5 6 7 8 9 10 there are 6 outcomes that give a
A 5 6 7 8 9 10 11 score of 7.
6 7 8 9 10 11 12

Example 4
Use a tree diagram to show the possible outcomes when two unbiased coins are
tossed.
OUTCOMES
Coin B
H HH
Solution
Coin A H
The diagram shows that there
are 4 possible outcomes. T HT
H TH
T

T TT

171
04.2 MEP Y8 Practice Book A

Example 5
In a drawer there are some white socks and some black socks. Tim takes out one
sock and then a second. Draw a tree diagram to show the possible outcomes.

OUTCOMES
Second Sock
Solution B BB
First Sock
There are four possible B
outcomes, of which two will
W BW
will produce two socks of the
same colour. B WB
W

W WW

Exercises
1. Copy and complete the table to show COIN B
all possible outcomes when 2 fair
coins are tossed. H T
H
COIN A
T

2. Two spinners are numbered 1 to 4 as shown in the diagram:


(a) Copy and complete the table
below, to show all possible
1 2 1 2
outcomes when they are spun,
writing the total score for 4 3 4 3
each outcome.

SPINNER B
1 2 3 4
S 1
P
I 2
N A
N 3
E
R 4

(b) What is the total number of possible outcomes?


(c) How many outcomes give a score of 5 ?

172
MEP Y8 Practice Book A

3. Two fair dice are renumbered using −2 , −1, 0, 1, 2, 3 instead of the


usual numbers. The two dice are thrown in the normal way.
(a) Draw a table to show the total score for each of the possible
outcomes.
(b) How many ways are there of scoring 0 ?

4. The two spinners shown in the diagram opposite,


2
are spun at the same time: 1 3
1 2
(a) Draw a table to show all possible 5 4 3
outcomes, and the total score for each outcome.
(b) How many different outcomes are there?
(c) How many outcomes give a score of 6 ?

5. In a bag there are red and blue counters. Two counters are taken out of the
bag at random.
(a) Copy and complete the tree diagram below, to show all outcomes:

OUTCOMES
2nd Counter
RR
1st Counter
R

(b) How many outcomes include a red counter?


(c) How many outcomes include a blue counter?

6. (a) Draw a tree diagram to show all possible outcomes when two
unbiased coins are tossed.
(b) Extend your tree diagram to show the possible outcomes when three
unbiased coins are tossed.
(c) How many outcomes are there when three unbiased coins are tossed?
(d) How many outcomes are there when four unbiased coins are tossed?

7. In a jar there are three different types of sweets, eclairs, mints and toffees;
two sweets are taken at random.
(a) Draw a tree diagram to show the possible outcomes.
(b) How many of the outcomes include a toffee?
(c) How many of the outcomes include a mint and a toffee?
173
04.2 MEP Y8 Practice Book A

8. A red dice, a blue dice and a green dice are put into a bag; all the dice are
fair. One is then taken out and rolled. The colour of the dice and the score
shown are recorded.
(a) How many possible outcomes are there?
(b) How many outcomes include a 5 ?

9. In a game, two fair dice are rolled and the scores are multiplied together.
(a) Draw a table to show the possible outcomes and their scores.
(b) How many ways are there of scoring 12 ?
(c) How many ways are there of scoring 18 ?

10. A bag contains a mixture of red, green and white balls. Three balls are
taken at random from the bag.
(a) Write down all possible outcomes.
(b) How many outcomes include a red ball?
(c) How many outcomes include a red or a white ball?
(d) How many outcomes include a red and a green ball?

04.3 Probability Using Listings


When the outcomes for two events are equally likely, the probabilities of
particular outcomes can be found.

Example 1
Look at the list of chocolate bars which can be chosen by Caitlin and Dave in
Example 1 of section 10.2. What is the probability that they both choose the
same type of chocolate bar?

Solution
There are 16 different outcomes and all are equally likely.
In 4 of these outcomes both Caitlin and Dave choose the same type of bar.
So
4 1
p (same type) = or
16 4

174
MEP Y8 Practice Book A

Example 2
When two unbiased coins are tossed, determine the probability of obtaining:
(a) two heads,
(b) two tails,
(c) a head and a tail.

Solution
The table shows the possible outcomes:
H T
In this situation there are 4 outcomes that are
equally likely. H HH HH
T TH TH
(a) Here 1 of the 4 outcomes gives 2 heads, so
1
p (2 heads) =
4

(b) Here 1 of the 4 outcomes gives 2 tails, so


1
p (2 tails) =
4

(c) Here 2 of the outcomes gives a head and a tail, so


2
p ( head and a tail) =
4
1
=
2

Example 3
Two fair dice are rolled at the same time. What is the probability that the total
score is:
(a) 6,
(b) greater than 9,
(c) less than 7 ?

Solution 1 2 3 4 5 6
The table show the possible outcomes. 1 2 3 4 5 6 7
There are 36 equally likely scores. 2 3 4 5 6 7 8
3 4 5 6 7 8 9
(a) There are 5 outcomes that give a
score of 6, so 4 5 6 7 8 9 10
5 6 7 8 9 10 11
5
p (6) = 6 7 8 9 10 11 12
36

175
04.3 MEP Y8 Practice Book A

(b) There are 6 outcomes that give a score greater than 9, so


6
p (greater than 9) =
36
1
=
6
(c) There are 15 outcomes that give scores of less than 7, so
15
p (less than 7) =
36
5
=
12

Exercises
1. Use information from the table in Example 3 to answer this question:
When two fair dice are thrown, what is the probability that the total score is:
(a) 9, (b) an odd number,
(c) greater than 10, (d) less than 8 ?

2. The diagram shows two spinners which are


both spun. 1 1
What is the probability that the 4 2 7 3
total score on the two spinners is: 3 5
(a) 7, (b) 6,
(c) greater than 10, (d) less than 5 ?

3. An unbiased coin is tossed and a fair dice is thrown. Use a table of


outcomes to determine the probability of each of the following:
(a) obtaining a head and a 3,
(b) obtaining a tail and an even number,
(c) obtaining a head and a prime number.

4. The two spinners shown in the diagram are R B R B


both spun. E L E L
D U D U
(a) Draw up a table to show the possible E E
GREEN YELLOW
outcomes.
(b) What is the probability that both
spinners show the same colour?
(c) What is the probability of obtaining a yellow and a red?
(d) What is the probability of obtaining a red and a blue?

176
MEP Y8 Practice Book A

5. The diagram shows two spinners that are spun at the same time:

1 2 0 1
3 4 –1 2

Use a table to determine the probability of obtaining a total score of:


(a) 6 (b) 0 (c) 1 (d) 3

6. For the spinners in question 5, determine the probability of obtaining a total


score that is:
(a) an even number,
(b) greater than 1,
(c) less than 1,
(d) less than 6.

7. Two unbiased coins are tossed at the same time. What is the probability of
obtaining:
(a) at least one head,
(b) no heads ?

8. Three unbiased coins are tossed at the same time. Use a tree diagram to show
the outcomes and determine the probability of obtaining:
(a) 3 heads,
(b) at least 1 head,
(c) at least 2 heads.

9. Two fair dice are rolled and the scores on each dice are multiplied together to
give a total score. What is the probability of getting a total score:
(a) of 12,
(b) of 20,
(c) greater than 25,
(d) less than 30,
(e) that is an even number ?

10. If 4 unbiased coins are tossed at the same time, what is the probability of
obtaining the same number of heads as tails?

177
MEP Y8 Practice Book A

04.4 Multiplication Law for Independent Events


Probabilities can be assigned to tree diagrams, and then multiplication can be used
to determine the probabilities for combined events.
OUTCOMES PROBABILITIES
p (A) A AA p ( A) × p ( A )

p (A) A
p ( B)
B A B p ( A) × p ( B)
p ( B) p (A) A B A p ( B) × p ( A)
B
p ( B)
B B B p ( B) × p ( B)

Note: Here we have an experiment with two possible outcomes, A and B, and the
experiment is repeated once. It is assumed that the probability of either A
or B remains the same when the experiment is repeated; in this case, we say
that A and B are independent events.

Example 1
Two fair dice are rolled. Use a tree diagram to determine the probability of
obtaining:
(a) 2 sixes, (b) 1 six, (c) no sixes.

Solution
The tree diagram is shown below:
OUTCOMES PROBABILITIES
1 1 1 1
6 6, 6 × =
6 6 6 36
1 6 5
6 6 1 5 5
NOT 6 6, NOT 6 × =
6 6 36
5 1
6 NOT 6, 6 5 1 5
6 6 × =
NOT 6 6 36
6 5
6 5 5 25
NOT 6 NOT 6, NOT 6 × =
6 6 36

1 36
(a) p (2 sixes) = total =
36
=1
36
Note that these probabilities
5 5 10 5
(b) p (1 six) = + = = add up to 1. This will always
36 36 36 18 be so when the probabilities
25 are added from the outcome
(c) p ( no sixes) = of the tree diagram.
36 This is a very useful means
of checking your working.
178
MEP Y8 Practice Book A

Example 2
A bag contains 4 red balls and 3 green balls. A ball is taken out at random, and
then put back; a second ball is then taken from the bag. What is the probability
that:
(a) both balls are the same colour,
(b) at least one of the balls is green,
(c) the balls are of different colours?

Solution
Use a tree diagram:
2nd Ball OUTCOMES PROBABILITIES
4 4 4 16
1st Ball 7 R R R × =
7 7 49
4 R 3
7 7 4 3 12
G R G × =
7 7 49
3 4
R G R 3 4 12
7 7 × =
7 7 49
G 3
7 3 3 9
G G G × =
7 7 49
49
total = =1
49

(a) p ( both the same) = p ( R R or G G )


= p ( R R ) + p (G G )

16 9
= +
49 49
25
=
49

(b) p (at least one green ball)


= p (G G or G R or R G ) or = 1 − p (R R)
16
= p (G G ) + p (G R ) + p ( R G ) = 1−
49
9 12 12 33
= + + =
49 49 49 49
33
=
49
179
04.4 MEP Y8 Practice Book A

(c) p ( both different colours) = p ( R G or G R )

= p ( R G ) + p (G R )
12 12
= +
49 49
24
=
49

Note: In probability questions of this type, 'or' means adding the probabilities.

Example 3
On her way to work, Sylvia drives through three sets of traffic lights. The
probability of each set of lights being green is 0.3. What is the probability that
they are all green?

Solution
p (all green) = p (1st green and 2nd green and 3rd green )

= p (1st green ) × p (2nd green ) × p (3rd green )

= 0.3 × 0.3 × 0.3 [or 0.33 ]


= 0.027

Note: In probability questions of this type, 'and' means multiplying the


probabilities.

Remember A tree diagram is drawn when it will help you to analyse a problem;
so if it will help, draw one. On the other hand, if you are able to solve a problem
without one (see Example 3 above), then do so.

Example 4
A P
B

Q R
C

D
The diagram shows a model railway track. At each of the junctions P, Q and R,
2
the probability of a train going straight ahead is and the probability of it
3
1
branching to the right is .
3

180
MEP Y8 Practice Book A

A train starts at point A.


(a) What is the probability that it reaches point C?
(b) What is the probability that it reaches point D ?

Solution
1 2 2
(a) p ( right and straight and straight ) = × ×
3 3 3
4
=
27

(b) p (( right and right ) or ( right and straight and right ))


1 1 1 2 1
= × + × ×
3 3 3 3 3
1 2
= +
9 27
5
=
27

Exercises
1. A bag contains 3 red balls and 2 blue balls. A ball is taken at random from
the bag and then put back. A second ball is then taken out of the bag.
What is the probability that:
(a) both balls are red,
(b) both balls are the same colour,
(c) at least one of the balls is red ?

2. Repeat question 1 for a bag with 7 red balls and 3 blue balls.

3. Two fair dice are rolled at the same time. Use a tree diagram to determine
the probability of obtaining:
(a) two even numbers,
(b) at least one even number,
(c) no even numbers.

4. Two fair dice are rolled at the same time. Use a tree diagram to determine
the probability of obtaining:
(a) two multiples of 3,
(b) exactly one multiple of 3,
(c) less than two multiples of 3.

181
04.4 MEP Y8 Practice Book A

2
5. A coin has been weighted, so that the probability of getting a head is and
5
3
the probability of getting a tail is ; the coin is thrown twice. Determine
5
the probability of obtaining:
(a) 2 heads, (b) no heads, (c) at least one head.

6. The spinner shown in the diagram is spun twice.


Use a tree diagram to determine the probability Red Red
of obtaining:
(a) 2 reds, (b) at least one red, Red Blue

(c) no reds.

7. The spinner in the diagram is spun twice. Determine


the probability of obtaining:
A B
(a) at least one A, (b) at least one B,
A A
(c) two As, (d) two Bs. B

8. The spinner in question 6 is spun 3 times. Use a tree diagram to determine


the probability of obtaining:
(a) 3 reds, (b) 2 reds, (c) at least 1 red.

9. A bag contains 1 red ball, 2 green balls and 4 yellow balls. A ball is taken
from the bag at random. The ball is then put back, and a second ball is
taken at random from the bag.
What is the probability that:
(a) both balls are the same colour,
(b) no yellow balls are taken out,
(c) at least one yellow ball is taken out?

10. Each of 10 balls is marked with a different number from 1 to 10. One ball
is taken at random and then replaced. A second ball is then taken at
random. Determine the probability that:
(a) both balls taken are marked with the number 5,
(b) both balls taken have even numbers,
(c) both balls taken have numbers which are multiples of 3,
(d) at least one of the balls taken has a number greater than 2.

11. On his way to work, Paul has to pass through 2 sets of traffic lights. The
probability that the first set of lights is green is 0.5, and the probability that
the second set of lights is green is 0.4.
What is the probability that both sets of lights are green?

182
MEP Y8 Practice Book A

12. On her way to the theatre, Sheila passes through 3 sets of traffic lights. The
1
probability that each set of lights is green is .
3
(a) What is the probability that none of the lights is green?
(b) What is the probability that two sets of lights are green and the other
set is not green?

13. A B C
P

D E
Q

R
The diagram shows a section of a railway track. At each of the junctions
3
B, C, D and E, the probability of going straight on is .
4
The train starts at A.
(a) What is the probability that it reaches P?
(b) What is the probability that it reaches Q?

14. Z
A (food)

X
R B (nothing)
W

C (water)
Y
A rat leaves position R and starts walking towards B. If it reaches B it gets
nothing, if it reaches A it gets food and if it reaches C it gets water.
At each of the junctions W, X, Y and Z, the probability of going straight on
is 0.6 and the probability of branching off is 0.4.
(a) What is the probability that the rat gets food?
(b) What is the probability that the rat gets water?
(c) What is the probability that it gets nothing?

15. When two fair dice are thrown, what is the probability that the score on the
second dice is higher than the score on the first dice?

183
MEP Y8 Practice Book A

04.5 Conditional Probability


In some situations where events are repeated, the probabilities
will change after the first event. For example, consider a bag
R B
containing 8 red balls and 3 blue balls. B R
R R
8 R R
The probability that a ball taken at random is red is . R B R
11
If a second ball is taken out without the first ball being
replaced, then:

EITHER the first ball was red, so the probability that the second ball is red is
7
, since there are 3 blue balls but only 7 red balls left.
10
OR the first ball was blue, so the probability that the second is red is
8 4
= , since there are 8 red balls but only 2 blue balls left.
10 5
Tree diagrams are very useful for this type of problems.

Example 1
A bag contains 7 yellow balls and 5 red balls. One ball is taken from the bag at
random, and is not replaced. A second ball is then taken from the bag.
Determine the probability that:
(a) both balls are red, (b) both balls are the same colour,
(c) the balls are different colours, (d) at least one ball is yellow.

Solution
The tree diagram below shows the probabilities and outcomes:
2nd Ball OUTCOMES PROBABILITIES
6 7 6 42
1st Ball Y Y Y × =
11 12 11 132
7 Y 5
12 11 7 5 35
R Y R × =
12 11 132
5 7
Y R Y 5 7 35
12 11 × =
12 11 132
R 4
11 5 4 20
R R R × =
12 11 132
132
total = =1
132

184
MEP Y8 Practice Book A

20
(a) p ( both red ) =
132
5
=
33
(b) p ( both the same colour ) = p (Y Y) + p ( R R )
42 20
= +
132 132
62
=
132
31
=
66
(c) p (different colours)
35 35
= 1 − p (same colour ) or p (Y R ) + p ( R Y) = +
132 132
31 70
= 1− =
66 132
35 35
= =
66 66

(d) p (at least one yellow)

42 35 35
= + + or = 1 − p (R R)
132 132 132
112 20
= = 1−
132 132
28 112
= =
33 132
28
=
33

Example 2
There are 4 boys and 5 girls who are hoping to be selected for a school quiz team.
Two of them are selected at random to be in the team.
Determine the probability that:
(a) 2 boys are chosen,
(b) at least 1 girl is chosen,
(c) 1 girl and 1 boy are chosen.

185
04.5 MEP Y8 Practice Book A

Solution
The tree diagram below shows the outcomes and the probabilities:
OUTCOMES PROBABILITIES
4 5 4 20 5
G G G × = =
8 9 8 72 18
5 G 4
9 8 5 4 20 5
B G B × = =
9 8 72 18
4 5
G B G 4 5 20 5
9 8 × = =
9 8 72 18
B 3
8 4 3 12 3
B B B × = =
9 8 72 18
18
total = =1
18
3
(a) p (2 boys) =
18
1
=
6
5 5 5
(b) p (at least 1 girl) = + +
18 18 18
15
=
18
5
=
6
5 5
(c) p (1 boy and 1 girl) = +
18 18
10
=
18
5
=
9

Note: The questions in Examples 1 and 2 could have been answered without the
use of tree diagrams, but a tree diagram helps greatly with the analysis of
the problem; the same is true for the next example.

Example 3
1
The probability that Ravi does his homework is if he goes out with his friends
10
3
and of he does not go out with his friends. The probability that Ravi goes out
5
3
with his friends is . What is the probability that Ravi does his homework?
4
186
MEP Y8 Practice Book A

Solution
Solution 1

p ((goes out and does homework ) or (does not go out and does homework ))
=
p (goes out ) × p (does homework ) + p (does not go out ) × p (does homework )
3 1 1 3
= × + ×
4 10 4 5
3 3
= +
40 20
9
=
40

Solution 2
OUTCOMES PROBABILITIES
1 3 1 3
does O D × =
10 4 10 40
3 out 9
4 10 3 9 27
does not O D' × =
4 10 40
1 3
does O' D 1 3 3
4 5 × =
4 5 20
not out 2
5 1 2 2 1
does not O' D' × = =
4 5 20 10

Note: O' means does not go out, and D' means does not do homework.
3 3
p (does homework ) = +
40 20
9
=
40

Exercises
1. A bag contains 3 pink balls and 2 blue balls. One ball is taken out at random
and not replaced. A second ball is then taken out.
Determine the probability that:
(a) both balls are pink,
(b) both balls are the same colour,
(c) at least one ball is blue.

187
04.5 MEP Y8 Practice Book A

2. In Tim's drawer there are 6 black socks and 5 white socks. He takes out two
socks at random. What is the probability that he has taken two socks of the
same colour?

3. In a tennis club there are 5 boys and 3 girls in a training squad. Two are
chosen at random to represent the club.
Determine the probability that they are:
(a) both boys,
(b) both girls,
(c) a boy and a girl.

4. Tara has five 10p coins and four 20p coins in her purse. She takes out two
coins at random. What is the probability that she takes out at least 30p?

5. There are 8 footballs in a store cupboard; one is yellow and the others are
white. A pupil takes 2 footballs out of the cupboard at random. What is the
probability that one of them is the yellow ball?

2 1
6. The probability of Jeremy passing a maths exam is if he revises and if
3 3
1
he does not revise. The probability that he revises is . What is the
4
probability of Jeremy passing the maths exam?

7. The probability of Jenny getting to work on time is 0.8 if she gets up before
7 a.m. and 0.4 if she does not get up before 7 a.m. The probability that
Jenny gets up before 7 a.m. is 0.7. What is the probability that Jenny is late
for work?

8. Ian is an inept mountaineer who tends to fall from rock faces. The
probability that he falls is 0.2 if the weather is dry but rises to 0.5 if it is
wet. The probability of wet weather is 0.3. Determine the probability that
Ian falls.

9. A bag contains 7 blue counters, 5 green counters, 2 black counters and


1 white counter. 3 counters are taken at random from the bag, without
replacement. What is the probability that they are all the same colour?

10. Peter and Jane play a game in which they each in turn take a counter at
random from a bag containing 7 red counters and 3 yellow counters. The
winner is the first to get a red counter. Jane goes first. By drawing a tree
diagram, determine the probability that Peter wins the game.

188
Zamzam University science & Technology

Chapter five: Estimation and


Confidence Intervals

Learning Objectives
When you have completed this chapter, you will be able to:
1. Define a parameter, statistic and with examples
2. Define a point estimate. Define level of confidence.
3. Construct a confidence interval for the population mean
when the population standard deviation is known.
4. Construct a confidence interval for a population mean when
the population standard deviation is unknown.
5. Construct a confidence interval for a population proportion.
6. Determine the sample size for attribute and variable
sampling

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Introduction
 This chapter considers several important aspects of
sampling. We begin by studying point estimates. A point
estimate is a single value (point) derived from a sample
and used to estimate a population value. For example,
suppose we select a sample of 50 junior executives and
ask how many hours they worked last week. Compute the
mean of this sample of 50 and use the value of the sample
mean as a point estimate of the unknown population
mean. However, a point estimate is a single value. A more
informative approach is to present a range of values in
which we expect the population parameter to occur. Such
a range of values is called a confidence interval.
1. Parameter and statistics
 Population: is the totality or collection of all objects or
individuals on which observations are taken on the
basis of some characteristic of the objects in any field.
 Parameter: any numerical value describing a
characteristic of population is called a parameter.
 It is customary to represent parameters by
Greek letters. By tradition the arithmetic mean
of population is denoted by a Greek letter µ(mu)
similarly, population variance 𝜎 2 ,
Correlation coefficient (ρ), regression
coefficient (β), proportion (𝜋) etc.
 Statistic: any numerical value describing a
characteristic of a sample is called a statistic.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 Statistic is usually represented by a small letter


of English alphabet. If the statistic is the sample
̅ the sample
arithmetic mean, it is denoted by 𝒙
variance 𝜎 2 , correlation coefficient (r), regression
coefficient (b), the sample proportion (p) etc.
2. Point and Interval Estimates
 A point estimate is the statistic, computed from
sample information, which is used to estimate the
population parameter.
 Example
Recent medical studies indicate that exercise is an
important part of a person’s overall health. The
director of human resources at OCF, a large glass
manufacturer, wants an estimate of the number of
hours per week students spend exercising. A sample
of 70 students reveals the mean number of hours of
exercise last week is 3.3. The point estimate of 3.3
hours estimates the unknown population mean.
a. A confidence interval estimate: is a range of
values constructed from sample data so that the
population parameter is likely to occur within that
range at a specified probability. The specified
probability is called the level of confidence.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Factors Affecting Confidence Interval Estimates

The factors that determine the width of a confidence interval


are:

1. The sample size, n.


2. The variability in the population, usually σ estimated by s.
3. The desired level of confidence.

Confidence intervals

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

3. Confidence interval estimate


when Population Standard
Deviation Known

b. Where:
̅
𝒙 is sample mean

𝒛 is normal distribution
𝜶 is level of confidence
𝜎 is Population Standard Deviation
𝑛 is sample size
𝜇 is population mean
c. Use Z-distribution: If the population
standard deviation is known or the sample
is greater than 30

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 1
Find the reliability factor, 𝑍𝛼⁄2 to estimate

the mean, µ, of a normally distributed


population with known population variance
for the following.
a. 93% confidence level.
b. 96% confidence level.
c. 80% confidence level
Solution

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 2
 A college admissions officer for a Health
Sciences HS program has determined that
historically applicants have undergraduate
grade point averages that are normally
distributed with standard deviation 0.45. From
a random sample of 49 applications from the
current year, the sample mean grade point
average is 2.90.
a. Find a 95% confidence interval for the
population mean
Lecturer: Abdirahman Macalim Ibrahim Abdi
Zamzam University science & Technology

b. Give comment

Solution

Interpretation
Based on a sample
of 49 applicants, a 95% confidence
interval for the unknown applicants on
HELTHE SCIESE have undergraduate
grade point averages mean extends
from approximately 2.8 grade to
approximately 3 grade.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

4. Construct a confidence interval for a


population mean when the population
standard deviation is unknown.
 If the population standard deviation σ is unknown, we
can substitute the sample standard deviation, s.
 So we use the t distribution instead of the normal
distribution
 The formula is

 where

̅
𝒙 is sample mean

𝒕𝒏−𝟏, 𝜶⁄ : is the critical value of the t distribution with


𝟐

n-1 d.f. and an area of α/2 in each tail


𝜶 : is level of confidence
𝒔 : is sample Standard Deviation
𝒏 : is sample size
𝝁 : is population mean
Use t-distribution

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Student’s t Table

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 3

5. Confidence Intervals for the Population


Proportion

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 4

Smoking and college education: The tobaco industry


closely monitors all surveys that involve smoking. One
survey showed that among 100 randomly selected subjects
who completed four year of health science in college, 25%
are smokers (besed on data from the American medical
Association). Construct the 95% confdence interval for true
percentage of smokers among all students who completed
four year of health science?

Solition

Interpretation: We are 95% confdence interval for


true percentage of smokers among all students who
completed four year of health science is between 17%

and 33%.

6. Sample Size Determination for a Variable


To find the sample size for a variable:
𝟐
𝒛.𝝈
𝒏= ⟨𝑬⟩
Lecturer: Abdirahman Macalim Ibrahim Abdi
Zamzam University science & Technology

 Where:

𝒛: 𝑖𝑠 𝑣𝑎𝑙𝑣𝑒 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒

𝑬: 𝑚𝑎𝑟𝑔𝑖𝑛 𝑒𝑟𝑟𝑜𝑟 𝑜𝑟 𝑤𝑖𝑡ℎ𝑖𝑛 𝑒𝑟𝑟𝑜𝑟


𝝈: 𝑖𝑠 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑒𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Example 5

An economist wants to estimate the mean income


for the first year of work for college graduates who
majored in midwifery. How many such incomes
must be found if we want to be 95% confident that
the sample mean is within $500 of the true
population mean? Assume that a previous study has
revealed that for such incomes, 𝜎 = $6250. How
large a sample is required?

Solution

𝟐
𝒛. 𝝈 𝟐(1.96)(6250)
𝒏= ⟨ ⟩ = ⟨ ⟩ = 𝟔𝟎𝟏
𝑬 𝟓𝟎𝟎

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Sample Size for Proportions

The formula for determining the sample size in the


case of a proportion is
2
Z
n  p (1  p ) 
E

where :
p is estimate from a pilot study or some source,
otherwise, 0.50 is used
z - the z - value for the desired confidence level
E - the maximum allowable error
Example 6
The American Kennel Club wanted to estimate the
proportion of children that have a dog as a pet. If the club
wanted the estimate to be within 3% of the population
proportion, how many children would they need to
contact? Assume a 95% level of confidence and that the
club estimated that 30% of the children have a dog as a
pet.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Solution

2
 1.96 
n  (.30)(.70)   897
 .03 

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Chapter six: One Sample Tests of


Hypothesis

Learning Objectives When you have

completed
this chapter, you will be able to:
 Define a hypothesis and hypothesis
testing.
 Describe the five-step hypothesis-
testing procedure.
 Distinguish between a one-tailed
and a two-tailed test of hypothesis.
 Define Type I and Type II errors.
 Conduct a test of hypothesis about
a population mean.
 Conduct a test of hypothesis about
a population proportion.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

1.What is a Hypothesis and Hypothesis


Testing?
 A Hypothesis is a statement about the
value of a population parameter
developed for the purpose of testing.
 A hypothesis is a claim (assumption)
about a population parameter
 Examples of hypotheses made about a
population parameter are:

 Hypothesis testing is a procedure,


based on sample evidence and
probability theory, used to determine
whether the hypothesis is a reasonable
Lecturer: Abdirahman Macalim Ibrahim Abdi
Zamzam University science & Technology

statement and should not be rejected, or


is unreasonable and should be rejected.
2.Five-Step Procedure for Testing a
Hypothesis

There is a five-step procedure that systematizes hypothesis


testing; when we get to step 5, we are ready to reject or
not reject the hypothesis. However, hypothesis testing as
used by statisticians does not provide proof that something
is true, in the manner in which a mathematician “proves” a
statement. It does provide a kind of “proof beyond a
reasonable doubt,” in the manner of the court system.
Hence, there are specific rules of evidence, or procedures,
that are followed. The steps are shown in the following
diagram. We will discuss in detail each of the steps.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Step 1: State the Null Hypothesis (H0)


and the Alternate Hypothesis (H1)
The Null Hypothesis, H0
NULL HYPOTHESIS: A statement about the value of a
population parameter developed for the purpose of
testing numerical evidence.
The Null Hypothesis, H0:

The Alternative Hypothesis, H1

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Step 2: Select a Level of Significance


LEVEL OF SIGNIFICANCE: The probability of
rejecting the null hypothesis when it is true.

Level of Significance:

Level of Significance and the Rejection Region

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Outcomes and Probabilities

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

3. Errors in Making Decisions

Type I & II Error Relationship

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Factors Affecting Type II Error

Step 3: Select the Test Statistic


 There are many test statistics. In this chapter, we use
both z and t as the test statistic.
 In later chapters, we will use such test statistics as F
and, called chi-square.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 TEST STATISTIC: A value, determined from sample


information, used to determine whether to reject the
null hypothesis.

Step 4: Formulate the Decision Rule

 CRITICAL VALUE: The dividing point between the


region where the null hypothesis is rejected and the
region where it is not rejected.

Step 5: Make a Decision

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 The fifth and final step in hypothesis testing is


computing the test statistic, comparing it to the
critical value, and making a decision to reject or not
to reject the null hypothesis.
 Alternative Hypothesis my be either One-Tail Tests or
two-Tail Tests

4. One-tailed and a two-tailed test of


hypothesis
 One-Tail Tests

Upper-Tail Tests

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Lower-Tail Tests

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Two-Tail Tests

5. Hypothesis Tests for the Mean

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Test of Hypothesis for the Mean (σ Known)


 Assume the population is normal
 Convert sample result (mean) to a z value

Example1
A claim: the mean body temperature of health adults is
equal to 98.6℉. A sample data: 𝑛 = 106 , 𝑥̅ = 98.20℉. Assume
that 𝜎 = 0.62 and significance level is 𝛼 = 0.05

Solution

Step 1: the Null Hypothesis (H0) and the Alternate


Hypothesis (H1)

𝐻0 : 𝜇 = 98.6℉

the mean body temperature of health adults is equal to 98.6℉.

𝐻1 ∶ 𝜇 ≠ 98.6℉

the mean body temperature of health adults is not equal to 98.6℉.

Step 2: the Level of Significance

𝜶 = 𝟎. 𝟎𝟓

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Step3: the Test Statistic

̅
𝒙− 𝝁 𝟗𝟖. 𝟐 − 𝟗𝟖. 𝟔
𝒛= 𝝈 = = −𝟔. 𝟔𝟒𝟐
𝟎. 𝟔𝟐
√𝒏 √𝟏𝟎𝟔

Step 4: Formulate the Decision Rule

Step 5: Make a Decision

Reject Null Hypothesis H0: Because the taste statistics


𝑧 = − 6.642 is less than critical value 𝑧 (−1.96). And we can
conclude that the mean body temperature of health adults
is not equal to 98.6℉.

Example 2
The waiting time for patient of hospital follows a normal
distribution with a mean of 3 minutes and population
standard deviation of 1 minute. The department of ENT
sampled 50 patient and found that the mean waiting time
was 2.75 minutes. At the .05 significance level, can we

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

conclude that the mean waiting time is less than 3


minutes?
Solution
̅−𝝁
𝒙 𝟐.𝟕𝟓−𝟑 − 𝟎.𝟐𝟓
𝒁= 𝝈 = 𝟏 = 𝟏 = −𝟏. 𝟕𝟔𝟕𝟕𝟓
√𝒏 √𝟓𝟎 𝟕.𝟎𝟕𝟏

t -Test for the mean (σ unknown)

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Testing for the Population Mean: Population


Standard Deviation Unknown
 When the population standard deviation (σ) is
unknown, the sample standard deviation (s) is
used in its place
 The t-distribution is used as test statistic, which
is computed using the formula:

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 3
The average cost of a hospital room in New
York City is assumed to be $ = 168 per night. A
random sample of 25 hospitals gives 𝑥̅ = $172.5
and 𝑠 = $15.4. Test the hypothesis at 𝛼 = 0.05
level (Assume the population distribution is
normal)
Solution

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

6. Tests of Population Proportion

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 4
MetLife claims that 8% of the Dhaka city residents
have graduates in Health Science. To test this claim, a
random sample of 500 were surveyed and found that
25 have graduate in Health Science. Test the
hypothesis at 𝛼 = 0.05 significance level.

Solution

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Chapter Seven: Two-sample Tests of


Hypothesis

Learning Objectives When you have


completed this chapter, you will be able to:
 Test a hypothesis that two independent
population means with known population
standard deviations
 Conduct a test of a hypothesis about the
difference between two population
proportions.
 Comparing Population Means with
Unknown Population Standard Deviations
 Conduct a test of a hypothesis about the

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Two Sample Tests

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

1. Test a hypothesis that two independent population


means with known population standard deviations

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Hypothesis Tests

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Decision Rules

Example1

Women’s height is a suspected factor for difficult


deliveries, that is, shorter women are more likely to have
Caesarean sections. A medical researcher found in a
sample of 45 women who had a normal delivery that their

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

mean height was 61.4 inches. A second sample of 39


women who had a Caesarean section had a mean height of
60.6 inches. Assume that the population of heights of
normal deliveries has a population standard deviation of
1.2 inches. Also assume that the heights of the population
of women who had Caesarean section births have a
standard deviation of 1.1 inches. Are those who had a
Caesarean section shorter? Use the .05 significance level.
Solution
Step 1: the Null Hypothesis (H0) and the Alternate
Hypothesis (H1)

𝐻0 : 𝜇1 ≤ 𝜇2
𝐻1: 𝜇1 > 𝜇2
Step 2: the Level of Significance

𝜶 = 𝟎. 𝟎𝟓

Step3: the Test Statistic

𝟔𝟏.𝟒−𝟔𝟎.𝟔
𝒁= 𝟐 𝟐
= 𝟑. 𝟏𝟖𝟕
√(𝟏.𝟐) +(𝟏.𝟏)
𝟒𝟓 𝟑𝟗

Step 4: Formulate the Decision Rule

If 𝑍 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 > 𝑍 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 (1.65) reject H0


Step 5: Make a Decision

Reject the null. It is reasonable to conclude that those who


had a Caesarean section are shorter.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

2. Two Sample Tests of Proportions


Hypotheses tests for the difference between two
population proportions, 𝑃𝑥 − 𝑝𝑦

Assumptions: sample sizes are large, n P(1 –P) > 9

 Where:
𝒙 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒖𝒄𝒄𝒆𝒔𝒔𝒇𝒖𝒍 𝒊𝒏 𝒇𝒊𝒔𝒕 𝐨𝐮𝐭𝐜𝐨𝐦𝐞𝐬
̂𝒙 =
𝒑
𝒏 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒇𝒊𝒔𝒕 𝒔𝒂𝒎𝒑𝒍𝒆 𝒛𝒊𝒔𝒆

𝒚 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒖𝒄𝒄𝒆𝒔𝒔𝒇𝒖𝒍 𝒔𝒆𝒄𝒐𝒏𝒅 𝐨𝐮𝐭𝐜𝐨𝐦𝐞𝐬


̂𝒚 =
𝒑
𝒏 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒔𝒆𝒄𝒐𝒏𝒅 𝒔𝒂𝒎𝒑𝒍𝒆 𝒛𝒊𝒔𝒆

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 2

Exercise and Heart Disease: In a study of women


and heart disease, the following sample results were
obtained: Among 10239 women with a low level of
physical activity (less then 200𝑘𝑐𝑎𝑙/𝑤𝑘), there are
101cases of heart disease. Among 9877 women with
physical activity measured between (200 𝑎𝑛𝑑 600𝑘𝑐𝑎𝑙/
𝑤𝑘) there were 56 cases of heart disease. Test a claim
that the all-women’s with a low level of physical
activity in their cases of heart disease are equal to
women’s with physical activity in their cases of heart
disease. At 𝛼 = 0.05 significance level
Solution
Step 1: the Null Hypothesis (H0) and the Alternate
Hypothesis (H1)

𝐻0 : 𝜇𝑥 = 𝜇𝑦

The women with a low level of physical activity in their


cases of heart disease are equal to women with physical
activity in their cases of heart disease.

𝐻0 : 𝜇𝑥 ≠ 𝜇𝑦

The women with a low level of physical activity in their


cases of heart disease are not equal to women with
physical activity in their cases of heart disease.

Step 2: the Level of Significance

𝜶 = 𝟎. 𝟎𝟓

Step3: the Test Statistic


Lecturer: Abdirahman Macalim Ibrahim Abdi
Zamzam University science & Technology

𝒏𝒙 = 𝟏𝟎𝟐𝟑𝟗

𝒙 = 𝟏𝟎𝟏

𝒏𝒚 = 𝟗𝟖𝟕𝟕

𝒚 = 𝟓𝟔

𝒙 𝟏𝟎𝟏
̂𝒙 =
𝒑 = = 𝟎. 𝟎𝟎𝟗𝟖𝟔
𝒏𝒙 𝟏𝟎𝟐𝟑𝟗

𝒚 𝟓𝟔
̂𝒚 =
𝒑 = = 𝟎. 𝟎𝟎𝟓𝟔𝟕
𝒏𝒚 𝟗𝟖𝟕𝟕

𝒏𝒙 𝒑 ̂𝒚
̂ 𝒙 + 𝒏𝒚 𝒑 (𝟏𝟎𝟐𝟑𝟗)(𝟎. 𝟎𝟎𝟗𝟖𝟔) + (𝟗𝟖𝟕𝟕)(𝟎. 𝟎𝟎𝟓𝟔𝟕)
̂𝟎 =
𝒑 =
𝒏𝒙 + 𝒏𝒚 𝟏𝟎𝟐𝟑𝟗 + 𝟗𝟖𝟕𝟕
𝟏𝟓𝟕
= = 𝟎. 𝟎𝟎𝟕𝟖
𝟐𝟎𝟏𝟏𝟔

p̂x − p̂y
Z=
p̂0 (1 − p̂0 ) p̂0(1 − p̂0)
√ +
nx ny

0.00986 − 0.00567
= = 3.377
( ) ( )
√0.0078 1 − 0.0078 + 0.0078 1 − 0.0078
10239 9877

Step 4: Formulate the Decision Rule

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Step 5: Make a Decision

Reject Null Hypothesis H0: Because the taste statistics


𝑧 = 3.377 is greater than critical value 𝑧 (1.96).

3. Comparing Population Means with Unknown


Population Standard Deviations

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

a. Equal Population Standard Deviations:


assumptions
1. We assume the sampled populations have equal but
unknown standard deviations. Because of this
assumption, we combine or “pool” the sample
standard deviations.
2. We use the t distribution as the test statistic.
The following formula is used to pool the sample
standard deviations. Notice that two factors are
involved: the number of observations in each sample
and the sample standard deviations themselves.

 Where:
𝑠12is the variance (standard deviations
squared)of the fist sample
𝑠22 is the variance (standard deviations squared)
of the second sample

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 Where:
̅𝟏 is the mean of the first sample.
𝒙
̅𝟐 is the mean of the second sample.
𝒙
𝒏𝟏 is the number of observations in the first
sample.
𝒏𝟐 is the number of observations in the second
sample.
𝒔𝟐 is the pooled estimate of the population
variance.
 The number of degrees of freedom in the test is the
total number of items sampled minus the total
number of samples. Because there are two samples,
there are n1 + n2 - 2 degrees of freedom.

 To summarize, there are three requirements or


assumptions for the test.

1. The sampled populations follow the normal


distribution.

2. The sampled populations are independent.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

3. The standard deviations of the two populations are


equal.

Example3

A Health science professor is interested in comparing


the characteristics of students who do and do not vote
in national elections. For a random sample of 14
students who claimed to have voted in the last
presidential election, she found a mean grade point
average of 2.71 and a standard deviation of 0.64. For
an independent random sample of 23 students who did
not vote, the mean grade point average was 2.79 and
the standard deviation was 0.56. Test, students who
claimed to have voted in the last presidential election is
greater than students, who did not vote, 𝛼 = 0.05

Solution
Step 1: the Null Hypothesis (H0) and the Alternate
Hypothesis (H1)

𝑯 𝟎 : 𝝁𝟏 ≤ 𝝁𝟐
Students who claimed to have voted in the last
presidential election is less than or equal students, who
did not vote
𝑯 𝟏 : 𝝁𝟏 > 𝝁𝟐
Students who claimed to have voted in the last
presidential election is greater than students, who did
not vote.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Step 2: the Level of Significance

𝜶 = 𝟎. 𝟎𝟓

Step3: the Test Statistic


𝒏𝟏 = 𝟏𝟒
̅𝟏 = 𝟐. 𝟕𝟏
𝒙
𝒔𝟏 = 𝟎. 𝟔𝟒
𝒏𝟐 =23
̅𝟐 = 𝟐. 𝟕𝟗
𝒙
𝒔𝟐 = 𝟎. 𝟓𝟔

𝟐
(𝒏𝟏 − 𝟏 )𝒔𝟏 𝟐 + (𝒏𝟐 − 𝟏 )𝒔𝟐 𝟐
𝒔𝒑 =
𝒏𝟏 + 𝒏𝟐 − 𝟐

(𝟏𝟒−𝟏 )𝟎.𝟔𝟒𝟐 + (𝟐𝟑−𝟏 )𝟎.𝟓𝟔𝟐


= = 0.349257
𝟏𝟒+ 𝟐𝟑−𝟐

𝒙 ̅𝟐
̅𝟏 − 𝒙 𝟐.𝟕𝟏− 𝟐.𝟕𝟗
𝒕= 𝟏 𝟏
= = -0.3994886
𝟏 𝟏
√𝒔𝒑 𝟐 (𝒏 + 𝒏 √𝟎.𝟑𝟒𝟗( + )
𝟏 𝟐 𝟏𝟒 𝟐𝟑

Step 4: Formulate the Decision Rule

If 𝒕 > 𝑡𝑑𝑓, 𝛼 reject null hypothesis

Degrees of freedom 𝒏𝟏 + 𝒏𝟐 − 𝟐 = 𝟏𝟒 + 𝟐𝟑 − 𝟐 = 𝟑𝟓

𝑡𝑑𝑓, 𝛼 = 𝑡35, 0.05 = 2.030

Step 5: Make a Decision

Do not Reject Null Hypothesis H0: Because the taste


statistics is less then t-critical value. We can conclude
the Students who claimed to have voted in the last

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

presidential election is less than or equal students, who


did not vote

b. Comparing Population Means with Unequal


Population Standard Deviations
 If it is not reasonable to assume the population
standard deviations are equal, then we compute
the t-statistic shown on the right.
 The sample standard deviations s1 and s2 are
used in place of the respective population
standard deviations.
 In addition, the degrees of freedom are adjusted
downward by a rather complex approximation
formula. The effect is to reduce the number of
degrees of freedom in the test, which will
require a larger value of the test statistic to
reject the null hypothesis.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example 5

A survey found that the average hospital room rate in


Mogadishu is $88.42 and the average room rate in
Hargiesa is $80.61. Assume that the data were obtained
from two samples of 50 hospitals each and that the
standard deviations were $5.62 and $4.83 respectively.
At a significance level of 5%, can we conclude that there
is a significant difference in the rates?
Solution
Step 1: the Null Hypothesis (H0) and the Alternate
Hypothesis (H1)

𝑯 𝟎 : 𝝁𝟏 = 𝝁𝟐
The average hospital room rate in Mogadishu is equal to
average room rate in Hargiesa.
𝑯 𝟏 : 𝝁𝟏 ≠ 𝝁𝟐

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

The average hospital room rate in Mogadishu is not


equal to average room rate in Hargiesa.
Step 2: the Level of Significance

𝜶 = 𝟎. 𝟎𝟓

Step3: the Test Statistic

𝒏𝟏 = 𝟓𝟎
̅𝟏 = 𝟖𝟖. 𝟒𝟐
𝒙
𝒔𝟏 = 𝟓. 𝟔𝟐
𝒏𝟐 =50
̅𝟐 = 80.61
𝒙
𝒔𝟐 = 4.83
𝑥̅ 1 − 𝑥̅2 88.42− 80.61
𝑡= 𝑠 2 𝑠 2
= 2 2
= 7.452419
√1 + 2 √5.62 + 4.83
𝑛1 𝑛2 50 50

Step 4: Formulate the Decision Rule

[𝟓. 𝟔𝟐2 ⁄𝟓𝟎 + 𝟒. 𝟖𝟑𝟐 ⁄𝟓𝟎]2


𝑑𝑓 = = 96
(𝟓. 𝟔𝟐2⁄𝟓𝟎)2 (𝟒. 𝟖𝟑𝟐 ⁄𝟓𝟎)2
+
𝟓𝟎 − 𝟏 𝟓𝟎 − 𝟏
𝒕𝒅𝒇, 𝜶⁄𝟐 = 𝒕𝟗𝟔. 𝟎.𝟎𝟐𝟓 = 𝟏. 𝟗𝟖𝟓

If 𝒕𝒄𝒂𝒍𝒄𝒖𝒍𝒆𝒕𝒆𝒅 𝒊𝒔 𝒈𝒓𝒆𝒂𝒕𝒆𝒓 𝒕𝒉𝒆𝒏 𝒕𝒅𝒇, 𝜶⁄𝟐 𝒓𝒆𝒋𝒆𝒄𝒕 𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐


Step 5: Make a Decision

Reject Null Hypothesis H0: Because the taste


statistics 𝑡 = 7.4 is greater than critical value 𝑡(𝟏. 𝟗𝟖𝟓).
And we can conclude that the average hospital room
rate in Mogadishu is not equal to average room rate in
Hargiesa.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

4. Two-Sample Tests of Hypothesis: Dependent


Samples

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example
Advertisements by Sylph Fitness Center claim that
completing its course will result in losing weight. A random
sample of eight recent participants showed the following
weights before and after completing the course. At the .01
significance level, can we conclude the students lost
weight?
a. State the null hypothesis and the alternate
hypothesis.
b. What is the critical value of t?
c. What is the computed value of t?

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

d. Interpret the result.?


e. What assumption needs to be made about the
distribution of the differences?

Solution

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Chapter Eight: Analysis of Variance

Learning Objectives When you have


completed
 This chapter, you will be able to:

 List the characteristics of the F-distribution.

 Conduct a test of hypothesis to determine


whether the variances of two populations are
equal.
 Discuss the general idea of analysis of
variance.
 Organize data into a one-way ANOVA table.

Lecturer:
 Abdirahman Ibrahim Abdi
Zamzam University science & Technology

8.1. Introduction to the F Distribution.


 The probability distribution used in this chapter is the F
distribution.

 The test statistic for several situations follows this


probability distribution.

 It is used to test whether two samples are from


populations having equal variances, and it is also applied
when we want to compare several population means
simultaneously. The simultaneous comparison of several
population means is called analysis of variance

 (ANOVA). In both of these situations, the populations


must follow a normal distribution, and the data must be at
least interval-scale.

Characteristics of F-Distribution

1. There is a “family” of F Distributions. Each


member of the family is determined by two
parameters: the numerator degrees of freedom
and the denominator degrees of freedom.
2. F cannot be negative, and it is a continuous
distribution.
3. The F distribution is positively skewed.
4. Its values range from 0 to 

5. As F   the curve approaches the X-axis.

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

8.2. Hypothesis Tests for Two Variances

TEST STATISTIC FOR COMPARING TWO VARIANCES:

Has an F distribution with (𝑛𝑥 – 1) numerator degrees of


freedom and (𝑛𝑦 – 1) denominator degrees of freedom Test
statistic is.

The terms 𝑠12 and 𝑠2 2 are the respective sample variances. If


the null hypothesis is true, the test statistic follows the F
distribution with 𝑛𝑥 – 1and 𝑛𝑦 – 1degrees of freedom.

In order to reduce the size of the table of critical values, the


larger sample variance is placed in the numerator; hence, the
tabled F ratio is always larger than 1.00.

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

Thus, the right-tail critical value is the only one required. The
critical value of F for a two tailed test is found by dividing the
significance level in half and then referring to the appropriate
degrees of freedom in Appendix B.4. An example will illustrate.
Decision Rules: Two Variances

Example: F Test

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

Solution

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

Example 2

Arbitron Media Research Inc. conducted a study of the


iPod listening habits of men and women. One facet of
the study involved the mean listening time. It was
discovered that the mean listening time for men was
35 minutes per day. The standard deviation of the
sample of the 10 men studied was 10 minutes per day.
The mean listening time for the 12 women studied was
also 35 minutes, but the standard deviation of the
sample was 12 minutes. At the .10 significance level,
can we conclude that there is a difference in the
variation in the listening times for men and women?

Solution

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

8.3. ANOVA: Analysis of Variance


 ANOVA: is a procedure used to test the null
hypothesis that the means of three or more
populations are equal.

 The F distribution is also used for testing whether


three or more sample means came from the same
or equal populations.

 ANOVA Assumptions:

– The sampled populations follow the normal


distribution.

– The populations have equal standard


deviations.

– The samples are randomly selected and are


independent.

Comparing Means of Three or More Populations

 The Null Hypothesis is that the population means


are the same. The Alternative Hypothesis is that at
least one of the means is different.

 The Test Statistic is the F distribution.

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

 The Decision rule is to reject the null hypothesis if


F (computed) is greater than F (table) with
numerator and denominator degrees of freedom.

 Hypothesis Setup and Decision Rule:

H0: µ1 = µ2 =…= µk

H1: The means are not all equal

Reject H0 if F > F,k-1,n-k

Organize data into appropriate ANOVA tables for


analysis

 Example :

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

The ANNOVA terminology

ANOVA TABLE

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

 The Decision rule is to reject the null hypothesis if


F (computed) is greater than F (table) with
numerator and denominator degrees of freedom.

 Degree of freedom for the numerator = k – 1.

 Degree of freedom for the denominator = n – k

Example

Joyce Kuhlman manages a regional financial canter. She wishes


to compare the productivity, as measured by the number of
customers served, among three employees. Four days are
randomly selected and the number of customers served by
each employee is recorded. The results are:

Is there a difference in the mean number of customers served


∝ = 0.05?
Solution
Step 1: State the null hypothesis and the alternate hypothesis.

E 𝐻0: µ1 = µ2 = µ3

𝐻1: µ1 = µ2 = µ3

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

8.4. Confidence Interval for the Difference


Between Two Means

 How do we decide whether there is a difference in


the treatment means?

 If the confidence interval includes zero, there is


not a difference between the treatment means.

 For example, if the left endpoint of the confidence


interval has a negative sign and the right endpoint
has a positive sign, the interval includes zero and
the two means do not differ. So if we develop a

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

confidence interval from formula (12–5) and find


the difference in the sample means was 5.00 that
is,

Lecturer: Abdirahman Ibrahim Abdi


Zamzam University science & Technology

Chapter nine: Correlation analysis

Learning Objectives When you have


completed this chapter, you will be able to:
1. Types of correlation

2. Methods of studying correlation

 Scatter diagram
 Karl Pearson's coefficient of
correlation.
 Method of least squares.

3. Understand and interpret the terms


dependent and independent variable.

4. Calculate and interpret the coefficient of


correlation, the coefficient of
determination, and the standard error of
estimate.

5. Conduct a test of hypothesis to


determine whether the coefficient of
correlation in the population is zero

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

1. Correlation
 Correlation: The degree of relationship between the
variables under consideration is measure through the
correlation analysis.
 The measure of correlation called the correlation
coefficient
 The degree of relationship is expressed by coefficient
which range from correlation( −1 ≤ 𝑟 ≥ +1). The
direction of change is indicated by a sign.
 The correlation analysis enables us to have an idea
about the degree & direction of the relationship
between the two variables under study.
 Correlation Analysis is the study of the relationship
between variables. It is also defined as group of
techniques to measure the association between two
variables.
 Correlation is a statistical tool that helps to measure
and analyze the degree of relationship between two
variables.
 Correlation analysis deals with the association
between two or more variables

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

2. Types of Correlation

Types of Correlation: Type I

 Positive Correlation: The correlation is said to be


positive correlation if the values of two variables
changing with same direction.

Ex: Height & weight.

 Negative Correlation: The correlation is said to be


negative correlation when the values of variables
change with opposite direction.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Ex: TV time & grades.

Direction of the Correlation

 Positive relationship–Variables change in the same


direction. As X is increasing, Y is increasing

 As X is decreasing, Y is decreasing

 E.g., As height increases, so does weight.

 Negative relationship–Variables change in opposite


directions. As X is increasing, Y is decreasing

 As X is decreasing, Y is increasing

 E.g., As TV time increases, grades decrease

Type II

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Types of Correlation: Type II

 Simple correlation: Under simple correlation


problem there are only two variables are studied.

 Multiple Correlations: Under Multiple Correlation


three or more than three variables are studied.

 Ex. Qd= f ( P,PC, PS, t, y )

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 Partial correlation: analysis recognizes more than


two variables but considers only two variables keeping
the other constant.

 Total correlation: is based on all the relevant


variables, which is normally not feasible.

Type III

Types of Correlation Type III

 Linear correlation : Correlation is said to be linear


when the amount of change in one variable tends to
bear a constant ratio to the amount of change in the
other. The graph of the variables having a linear
relationship will form a straight line.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 𝑋 = 1, 2, 3, 4, 5, 6, 7, 8,

 𝑌 = 5, 7, 9, 11, 13, 15, 17, 19,

 𝑌 = 3 + 2𝑥

 Non Linear correlation: The correlation would be


nonlinear if the amount of change in one variable does
not bear a constant ratio to the amount of change in
the other variable.

3. Methods of Studying Correlation


1. Scatter Diagram Method.
2. Graphic Method
3. Karl Pearson’s Coefficient of Correlation
4. Method of Least Squares
a. Scatter Diagram Method
 Scatter Diagram is a graph of observed plotted points
where each points represents the values of X & Y as a
coordinate. It portrays the relationship between these
two variables graphically.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

A perfect positive correlation

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Correlation Coefficient – Interpretation

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Advantages and Disadvantage of Scatter Diagram

Advantages of Scatter Diagram

1. Simple & Non Mathematical method

2. Not influenced by the size of extreme item

3. First step in investing the relationship between two


variables.

Disadvantage of scatter diagram

1. Cannot adopt the an exact degree of correlation

b. Karl Pearson's Coefficient of Correlation

 Pearson’s ‘r’ is the most common correlation coefficient.

 Karl Pearson’s Coefficient of Correlation denoted by-‘r’


The coefficient of correlation ‘r’ measure the degree of
linear relationship between two variables say x & y.

 Karl Pearson’s Coefficient of Correlation denoted by r


(−1 ≤ 𝑟 ≥ +1)

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

 Degree of Correlation is expressed by a value of


Coefficient

 Direction of change is Indicated by sign

 ( −𝑣𝑒) 𝑜𝑟 ( + 𝑣𝑒)

Karl Pearson's Coefficient of Correlation – Formula

Interpretation of Correlation Coefficient (r)

 The value of correlation coefficient ‘r’ ranges from -1


to +1

 If r = +1, then the correlation between the two


variables is said to be perfect and positive.

 If r = -1, then the correlation between the two


variables is said to be perfect and negative .

 If r = 0, then there exists no correlation between the


variables

Assumptions of Pearson’s Correlation Coefficient

1. There is linear relationship between two variables,


i.e. when the two variables are plotted on a scatter
diagram a straight line will be formed by the points.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

2. Cause and effect relation exists between different


forces operating on the item of the two variable
series.

Advantages and Limitation of Pearson’s Coefficient

1. It summarizes in one value, the degree of correlation


& direction of correlation also.

Limitation of Pearson’s Coefficient

1. Always assume linear relationship .

2. Interpreting the value of r is difficult.

3. Value of Correlation Coefficient is affected by the


extreme values.

4. Time consuming methods

Example

Haverty’s Furniture is a family business that has been


selling to retail customers in the Chicago area for many
years. The company advertises extensively on radio, TV,
and the Internet, emphasizing low prices and easy credit
terms. The owner would like to review the relationship
between sales and the amount spent on advertising. Below
is information on sales and advertising expense for the last
four months.

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

a. The owner wants to forecast sales on the basis of


advertising expense. Which variable is the dependent
variable? Which variable is the independent variable?
b. Draw a scatter diagram.
c. Determine the correlation coefficient.
d. Interpret the strength of the correlation coefficient.
Solution

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Coefficient of Determination

5. Testing the Significance of the Correlation


Coefficient
H0:  = 0 (the correlation in the population is 0)
H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2

Lecturer: Abdirahman Macalim Ibrahim Abdi


Zamzam University science & Technology

Example

A sample of 25 mayoral campaigns in medium-sized cities


with populations between 50,000 and 250,000 showed that
the correlation between the percent of the vote received
and the amount spent on the campaign by the candidate
was .43. At the .05 significance level, is there a positive
association between the variables?
Solution

Lecturer: Abdirahman Macalim Ibrahim Abdi

You might also like