Com 114 - ND 1 Statistics For Computing - 2024
Com 114 - ND 1 Statistics For Computing - 2024
COM 114
MEANING OF STATISTICS
Definitions of Statistics
Statistics can be defined as a management tool for making decision.
It is a branch of scientific approach to presentation of numerical information in such a way that one will
have a maximum understanding of the reality represented by such information. Statistics is also defined as
the presentation of facts in numerical forms. A more comprehensive definition of statistics shows statistics
as a scientific method which is used for collecting, summarizing, classifying, analyzing and presenting
information in such a way that we can have thorough understanding of the reality the information represents.
However it is a branch of science that deals with the collection, organization, presentation, analyzing,
interpretation and documentation of information for easy decision making.
From all these definitions, you will realize that statistics are concerned with numerical data.. Examples of
such numerical data are the heights and weights of pupils in a primary school when evaluating the
nutritional well being of the pupils and the accident fatalities on a particular road for a period of time.
You should also know that when there are numerical data, there must be non- numerical data such as the
taste of brands of biscuits, the greenness of some vegetables and the texture of some joints of a wholesale
cut of meat. Non- numerical data cannot be subjected to statistical analysis except they are transformed to
numerical data. To transform greenness of vegetables to numerical data, a five point scale for measuring
the colour can be developed with 1 indicating very dull and 5 indicating very green.
You will realize that statistics is useful in all spheres of human life. A woman with a given amount of
money, going to the market to purchase foodstuff for the family, takes decision on the types of food items
to purchase, the quantity and the quality of the items to maximize the satisfaction she will derive from the
purchase. For all these decisions, the woman makes use of statistics
Government uses statistics as a tool for collecting data on economic aggregates such as national income,
savings, consumption and gross national product. Government also uses statistics to measure the effects of
external factors on its policies and to assess the trends in the economy so that it can plan future policies.
Government uses statistics during census. The various forms sent by the government to individuals and firms
on annual income, tax returns, prices, costs, output and wage rates generate a lot of statistical data for the
use of the government
SMS201 BUSINESS STATISTICS I
Business uses statistics to monitor the various changes in the national economy for the various budget
decisions. Business makes use of statistics in production, marketing, administration and in personnel
management.
Statistics is also used extensively to control and analyze stock level such as minimum, maximum and reorder
levels. It is used by business in market research to determine the acceptability of a product that will be
demanded at various prices by a given population in a geographical area. Management also uses statistics
to make forecast about the sales and labour cost of a firm. Management uses statistics to establish
mathematical relationship between two or more variables for the purpose of predicting a variable in terms
of others. For the conduct and analyses of biological, physical, medical and social researches, we use
statistics extensively.
Let us quickly define some of the basic concepts you will continue to come across in this course.
• Entity: This may be person, place, and thing on which we make observations. In studying the
nutritional well being of pupils in a primary school, the entity is a pupil in the school.
• Variable: This is a characteristic that assumes different values for different entities. The weights of
pupils in the primary school constitute a variable.
• Random Variable: If we can specify, for a given variable, a mathematical expression called a
function, which gives the relative frequency of occurrence of the values that the variable can assume, the
function is called a probability function and the variable a random variable.
• Quantitative Variable: This is a variable whose values are given as numerical quantities.
• Qualitative Variable: This is a variable that is not measurable in numerical form or that cannot be
counted. Examples of this are colours of fiuits, taste of some brands of a biscuit.
• Discrete Variable: This is the variable that can only assume whole numbers. Examples of these are the
number of Local Government Council Areas of the States in Nigeria, number of female students in the
various programmes in the National Open University.
A discrete variable has "interruptions" between the values it can assume. For instance between 1 and 2,
there are infinite number of values such as 1.1, 1.11, 1.111, 1. I I l land so on. These are called
NOUN 2
SMS201 BUSINESS STATISTICS I
interruptions.
• Continuous
Variable: This is a variable that can assume both decimal and nondecimal values. There is always a
continuum of values that the continuous variable can assume. The interruptions that characterize the
discrete variable are absent in the continuous variable. The weight can be both whole values or decimal
values such as 20 kilograms and 220.1752 kilograms.
• Population: This is the largest number of entities in a study. In the study of how workers in Nigeria spend
their leisure hours, the number of workers in Nigeria constitutes the population of the study.
• Sample: This is the part of the population that is selected for a study. In studying the income distribution
of students in the National Open University, the incomes of 1000 students selected for the study, from the
population of all the students in the Open University will constitute the sample of the study.
• Random Sample: This is a sample drawn from a population in such a way that the results of its analysis
may be used to generalize about the population from which it was drawn.
Primary Data
You are already aware that statistics uses numerical data. The Numerical data can be divided into:
a) Primary and
b) Secondary Data
Primary Data are data collected by or on behalf of the person or people who are going to make use of the
data. It is the data collected specifically for a purpose and used for the purpose for which they are
collected.
Examples of Primary data are:
(i) Heights and weights of students collected to determine their nutritional well being.
(ii) The population of Primary school pupils in the states of the country to allow the Federal Government
plan for the primary education.
NOUN 3
SMS201 BUSINESS STATISTICS I
(ii) The academic performance of students in secondary schools under various types of leaders to know
the leadership style suitable for the secondary schools
The various methods of collecting primary data include surveys, direct interview, direct observation,
questionnaire and experiments, peer group or focus group discussion, census. These will be further
discussed in this unit.
Primary data could be very expensive to collect when the elements of the study sample are widely scattered
and when the items of equipment for data collection, as in many experiments, are capital intensive.
However errors can be minimized when collecting primary data since the researcher can always take
adequate precaution in collecting primary data.
Secondary Data
This is the data that is used by a person or people other than the person or people by whom or for whom the
data was collected. These are the data collected for some other purpose, frequently for administrative
reasons, and used for the purpose for which they were not collected.
Secondary data are always collected from published sources, like textbooks, journals, Newspapers,
magazines, and gazette.
(i) Accident fatalities on a particular road over a period of time collected from the Police or Road safety
corps.
(ii) Dietary requirements of various age groups collected from a nutrition textbook.
(iii) Age distribution in Nigeria collected from the publications of the National Population Census.
From the discussion so far, you will know that secondary data are second-hand data. This shows the need to
know as much as possible about the data. In trying to know much about the secondary data, we
need to consider the following
points:
NOUN 4
SMS201 BUSINESS STATISTICS I
(vi) How to interpret the data, especially when figures collected for one purpose are used for another.
With secondary
data, we usually strike a compromise between what we want and what we are able to find.
(i) They are very inexpensive to collect since they are readily and abundantly available in
published sources.
(ii) There is a great variety of secondary data on a wide range of subjects.
(iii) Many of these data have been collected for many years hence we can use them to establish trends for
forecasting.
With all these advantages of secondary data, we must use secondary data with great care since such data
may not give the exact kind of information required and the data may not be in the most suitable form they
are required.
SOURCES OF DATA
1 Micro-Statistical Information
The firms and private organizations, when monitoring the activities of their businesses, produce a lot of
information that are specific to the organization and firm and used for their decision-making processes.
The information produced by these form and organization are called micro-statistical Information.
In micro-economics, the concern is for the household and firm. The information produced or
generated at these levels are micro-statistical information. For firm, such information include those
generated from production, marketing and personnel.
2 Macro-Statistical Information
These are produced by the Public sector of the economy. They are related to the whole country as a whole.
Such information include population, education, rate of inflation, level of unemployment and so on.
Macro-economics is concerned with such aggregates as national income, gross national products, saving,
consumption, gross domestic product and so on. These are macro-statistics information.
NOUN 5
SMS201 BUSINESS STATISTICS I
In The performance
and monitoring of various activities in the firms, a lot of data is produced. The quality and the quantity of
the information generated in the firm depend on the size of the firm and the resources available to the firm
The firm is interested in what is happening to the national economy and what is happening in the industry it
belongs to. You should know that a combination of firms makes the industry. As regards national income,
the firm wishes to know the interest rates and unemployment. As regards the industry, the firm will always
wish to know wage rates, prices, level of output so that it can compare its performance with the
competitors in the industry.
Other information could be generated from personnel and Accounts departments to aid decision-making
process of the firm.
4 Government Statistics
The governments produce statistics to be able to measure the effects of their policies, to monitor the effects
of external factors on their policies and to be able to assess trends so that they can plan future policies. The
macrostatistical information is generated by the governments as you have learned in this unit. The various
governments rely in the firms and the individuals to generate these macro statistical data
1. Surveys
In this unit, you have learned that secondary data are already available; hence they must be collected when
there is the need to use them. The collection of primary data involves survey or inquiry of one type or the
other.
NOUN 6
SMS201 BUSINESS STATISTICS I
Some surveys can be limited in the sense that they can be carried out with a few minutes of observation.
Others can be detailed. When surveys are detailed, information from the surveys are more acceptable and
valued than when
they are limited.
(ii) Market research surveys - carried out for one particular client and not published in any form.
(iii) Research surveys - carried out by academicians and published in journals
(iv) Firms commission ad hoc surveys on a wide variety of subjects.
Survey methods consist of the following stages:
(i) The survey design - This depends on the objectives of the survey, the available methods, the amount
of money, and time that can
be allocated for collecting the information
(ii) The Pilot survey: This is the preliminary survey carried out on a very small scale to make sure that
the design and methodology of the survey are likely to produce the information required
(iii) Collection of Information - This involves the use of observation, interview and questionnaire.
(iv) Coding - We may need to pre-code the questions to facilitate classification and tabulation.
(v) Tabulation - There is the need to tabulate the data, to give a summary of the data.
(vi) Secondary statistics - We will need to calculate secondary statistics such as means and percentages.
(vii) Reports - Reports must be written on the results and the results must be illustrated with graphs and
diagrams.
2. Observations
This is one of the methods of collecting primary data. It can be used to know the use a particular facility is
put. It can be used to study the behaviours of people in a work place.
• Participant observation
• Systematic observation
• Mechanical observation
In participant observation, the observer is involved in the activities he is trying to observe. Examples of
these are the vice-chancellor who participates in the eating at the cafeteria to observe the performance of
NOUN 7
SMS201 BUSINESS STATISTICS I
the cooks and the acceptability of the food by the students; the lecturer who sits for his own paper with the
students to observe the difficulty encountered by the students in the paper. This method can have serious
influence in the
entities the observer is observing. The method may also consume a lot of time.
In systematic observation, the observer does not take part in the activity. The method is used when events
can be investigated without the participants knowing that somebody is observing them. Though the method
is objective, it does not question the motives of people observed.
In mechanical observation, mechanical devices do the observation. For instance, the number of vehicles
passing a particular point on the road can be recorded mechanically. Sophisticated mechanical means such
as television, film and tape recorders are used to provide more complex information. Mechanical
observations can be more effective than those observation made by individuals observer who can be
subject to bias.
• Objectivity- to remain objective, the observer cannot ask the question that will help him to understand the
events he is observing.
• Selectivity- an observer can unintentionally become selective in perception, recording and reporting.
• Interpretation- the observer may impute meanings to the behaviour of people that the people do not intend.
• Chance- a chance event may be mistaken for a recurrent one.
• Participation- the participation of the observer can influence the behaviour of people being observed.
3. Interviewing
This is a conversation with a purpose. There can be formal and informal interviews. Informally everybody
uses interviews to obtain information.
The formal interview is also initiated by the interviewer who approaches the person he is interested in
interviewing. The interviewer therefore arranges the venue, and the time, and prepares the questions to be
asked. The interviewer
also secures the means of recording the responses.
NOUN 8
SMS201 BUSINESS STATISTICS I
The interviewer should be very objective. He must not express his own opinions and must not
influence the answers of the respondents. The language of the question must be at the level of the
respondents. If the questions are written in a language different from that of the respondents, the question
must be translated to that of the respondent but the answers must be written by the interviewer in the
language of the questions. This was the case during the last census in Nigeria. The questionnaire used was
written in English language. That is not the language many Nigerians understand. For the purpose of
getting the response of people in this group; the questions were translated to the language they understand.
Interviewing method has a number of advantages:
(i) It allows the interviewer to have personal contact with the respondents therefore allowing more
questions to be asked which improves the quality of the information.
(ii) It allows the interviewer to persuade unwilling respondents.
(iv) It allows experienced interviewer to know when to make calls and recalls
(i) Biased interviewer may influence the responses of the respondents to suit his own opinions.
(ii) A biased respondent, in a matter-affecting ego, may give false responses.
(iii) It is very expensive and time consuming especially when respondents are widely scattered.
(v) It may be difficult to interview some top people in government and business
4. Questionnaire
This is a list of questions drawn in such a way that the questions are related to the objectives of the study
being conducted, and the responses to the question will be analyzed to provide solutions to the problems
we attempt to solve in the study.
NOUN 9
SMS201 BUSINESS STATISTICS I
The structured questionnaire consists of a list of questions drawn on the study being conducted. Each
question is accompanied by alternative answers from which the respondent picks appropriate answer or
answers. An
example of a structured question is this:
• Below N7500
• N7500 - N10,000 0
• N10, 000 - N12, 500 0
• Above N12,500
Structured questionnaire has a number of advantages:
(i) It is very easy to complete and analyze.
(u) Most of the questions are answered
(iii) The responses are always related to the objectives of the study.
A major disadvantage of the structured questionnaire is the fact that it does not allow the views of the
respondents, which may enhance the quality of the information collected.
Unstructured questionnaire is a list of questions drawn on the study on which information is required. The
questions are not accompanied by alternative answers as in the structured questionnaire. The respondents
are free to provide their own responses.
Example of a question in an unstructured questionnaire is "What is your monthly salary?" Unstructured
questionnaire are not difficult to construct since no question is accompanied by alternative answers. It has
the following advantages.
NOUN 10
SMS201 BUSINESS STATISTICS I
(v) Ask simple and interesting questions before the difficult and uninteresting ones.
(v) State the questions clearly.
(vi) A question
must mean the same thing to all the respondents. (vii) The language must be at the level of the
respondents.
(viii) Do not ask questions that will hurt your respondent
(ix) The questionnaire should be pre-tested on a mock-audience before it is administered on the real
sample to detect the difficulty in completing and analyzing the questionnaire so as to review the
questionnaire before it finally gets to the real audience it is meant for.
Exercise
List the various methods of data collection, their descriptions, their advantages and disadvantages
DATA PRESENTATION
1 .Histogram:
One of the ways of representing a frequency distribution is by means of a histogram. In constructing a histogram we
plot the frequencies of the class intervals against the class boundaries [not the class limit]. The vertical axis is used for
the frequencies and the horizontal axis for the class boundaries.
Example
Suppose the table below shows the distribution of 50 spectators in a secondary school sports competition. You
are required to represent the data with a histogram
NOUN 11
SMS201 BUSINESS STATISTICS I
10-14 2
15-19 3
20-24 5
25-29 7
30-34 11
35-39 8
40-44 6
45-49 5
50-54 3
In solving the exercise, you require a graph sheet; you also need to prepare the class boundaries for the class intervals.
You will then plot the frequency against the respective class boundaries. You still need to recall how the class
boundaries are computed in unit 3 we will not discuss it here.
Age (Years)
Class Interval
Class Boundaries Frequency
10-14 9.5-14.5 2
15-19 14.5-19.5 3
20-24 19.5-24.5 5
25-29 24.5-29.5 7
30-34 29.5-34.5 11
35-39 34.5-39.5 8
40-44 39.5-44.5 6
NOUN 12
SMS201 BUSINESS STATISTICS I
45-49 44.5-49.5 5
49.5-54.5 3
er
F cn
1
1
9
8
7
6
5
4
3
2
1 2 3 Mode 40 5 6
Class Boundaries
The histogram can be used to estimate the mode of the distribution. To do this you have to locate the highest cell in
the histogram, join the upper class boundary of the cell with the upper boundary of the preceding cell, join the lower
class boundary of the highest cell with the lower class boundary of the succeeding cell, locate the intersection,
draw a vertical line from the intersection to the horizontal. The value of the vertical line on the horizontal axis is
the mode. You need to see the construction on the histogram . The mode read from figure is 32.5.
2. Frequency Polygon.
Another way of representing frequency distribution graphically is by the means of a frequency polygon.
In constructing a frequency polygon, we plot the frequency against the class marks. You learned in unit 3 that class
mark of a class interval is the mean of the lower and the upper class boundaries or limits of the class interval.
Example
NOUN 13
SMS201 BUSINESS STATISTICS I
To construct the frequency polygon, you need to compute the class mark for the class intervals, you need to make the
polygon touch the horizontal at both ends. To do this, you have to compute the class mark for an imaginary class
interval at the beginning and another imaginary class interval at the end of the distribution.
If you look at table 4.1 that is of interest here, there is no class interval 5-9 at the beginning and there is no class
interval 55-59 at the end of the distribution. We need to bring these intervals in and assign a frequency of 0 to each of
them.
Let us now compute the class mark for the class intervals.
The next activity is to plot the frequency against the class marks. The frequency is on the vertical axis and the
class mark on the horizontal axis. Frequency polygon must be plotted on a graph sheet
NOUN 14
SMS201 BUSINESS STATISTICS I
. 3. Ogive
This is another way of representing frequency distribution graphically. The other name for ogive is cumulative
frequency distribution curve. This curve is very important in the determination of median, quartriles, percentiles, semi-
interquartile range, that will be discussed in some subsequent units.
In plotting ogive for a distribution, you will do the following
(a) Compute the upper class boundaries of all the classes including that of an imaginary class at the beginning of
the distribution.
b) Prepare a cumulative frequency distribution for the data (c) Plot the cumulative frequency
against the upper class boundary.
Example .
Class
Interval Frequency Less than Cumulative Frequency
5-9 0 9.5 0
10-14 2 14.5 2
15-19 3 19.5 5
20-24 5 24.5 10
25-29 7 29.5 17
30-34 11 34.5 28
35-39 8 39.5 36
40-44 6 44.5 42
45-49 5 49.5 47
50-54 3 59.5 50
NOUN 15
SMS201 BUSINESS STATISTICS I
You will realize that the class interval 5-9 was introduced. A frequency of 0 was also assigned to the class interval
since the original table did not show the class. This is done so that the ogive can take its origin from the horizontal
line. From the table you will see that all the values that are less than 24.5 are contained jn class interval 5-9, 10-14,
15-19 and 20-24. The sum of the values which is equal to the cumulative frequency of the interval is 0+2+3+5 = 10.
You should know that ogive can only be plotted on a graph sheet.
Exercises
NOUN 16
SMS201 BUSINESS STATISTICS I
4. Bar Chart
Another way of representing data graphically is by means of bar chart. A bar chart shows vertical bars with equal
width to represent the values of a variable in some intervals of time. The area of the bar is proportional to the
magnitude of the quantity it represents. The bars must be drawn on graph sheet and they must have equal width.
There are simple components and multiple bar charts. These will be discussed in this unit.
In a simple bar chart, we use the bars to represent the value of a variable in a period of time.
Example
Suppose the monthly sales of a firm for three consecutive months are given as follows
January 5.2
February 7.4
March 10.6
NOUN 17
SMS201 BUSINESS STATISTICS I
11
10
9
8
7
6
5
4
3
2
1
Jan. Feb. March
Months
Another bar chart that shows the total value for a time period and the values of the components that makeup the total is
the component bar chart. In this case, the bar for the total value for a period is divided into the values for the
components that make up the total.
Example
NOUN 18
SMS201 BUSINESS STATISTICS I
Suppose a hotel has three departments A, B, C from where sales are made and the annual records of the net profit of
the departments for three consecutive years are as presented below.
A B C
1999 3.2 3.4 2.8
2000 2.8 3.0 2.6
2001 4.0 3.2 3.6
You are to represent the values of the net profits with the aid of a component bar chart.
You will need to plot the values of the components A, B, and C for three years. The first year will show a bar of
length 9.4cm divided into 3.2cm for A, 3.4cm for B, and 2.8cm for C. You will then repeat the exercise for the values
of A, B and C in years 2000 and 2001. There must be a legend to s how the shading of the component.
11
10
9
8
7
6
5
4
3
2
1
0
Years
1999 2000 2001
NOUN 19
SMS201 BUSINESS STATISTICS I
For the multiple bar chart, each component of every year is presented by a bar whose length is corresponding to the
value of the component.
Example
Using the values of net profit in example 4.5, construct a multiple bar chart for the hotel for the period three years.
In this exercise, you will draw single bar for each component of every year. The bars for a year will now look like
histogram.
B
)
ia A
il 4
3
s
r 2
te
1
Year
5. Pie Chart
NOUN 20
SMS201 BUSINESS STATISTICS I
This is another means of representing data graphically. The values of the Items represented with pie chart are
proportional to the area of the sectors that represent them.
In the case of pie-chart, the sectorial angles are computed for the items based on their values and on the total values of
the items.
After obtaining the sectorial angles for the items we use a pair of compasses, a pencil,
protector and ruler to draw the angles of the sector.
Example
Items N
Feeding 9625
Rent 4125
Education 5500
Savings 6875
Others 1375
TOTAL 27,5000
You are required to represent the data on a pie char. We will therefore compute the sectoral angles
Items Sectoral Angles
Feeding 126"
Rent 54"
Education 72°
Savings 90"
NOUN 21
SMS201 BUSINESS STATISTICS I
Others 18'
TOTAL 360
The total values of the items is N27, 500. This is to say that the monthly income of the worker is N27, 500. 360° is
used in the calculation because the sum of angles at a point is 360°. If the sectoral angles are computed correctly,
the sum of the angles must be equal to 360°.
It is possible the values of the items are given as percentages of the total values of the items. To find the sectoral
angles, we only need to multiply the respective percentage with 360°
Example
NOUN 22
SMS201 BUSINESS STATISTICS I
Food Cost 40
Labour Cost 35
Overhead Cost 15
Net Profit 10
Sales 100
SUMMARISING DATA
1 .Ordered Array
When data are collected, they are collected in such a way that there is no particular arrangement of the values. This
unordered array of values does not facilitate the process of analyzing the data. The data must therefore be arranged
either in ascending or descending order of magnitude so as to facilitate the analysis.
Example : Suppose the ages of twenty pupils in a primary school are as follows (age to the nearest years)
NOUN 23
SMS201 BUSINESS STATISTICS I
A look through the values shows no order of arrangement. An ordered form of the values shows the following:
5, 5, 6, 6, 7, 7, 8, 8, 8, 8. 9,9,9,10,10,11,11,11,12,12.
The data arranged in this form has a number of advantages over the raw data.
(i) We can quickly know the lowest and highest values in the data.
(ii) We can easily divide the data into sections.
(iii) We can see whether any value appears more than once in the array.
(iv) We can observe the distance between succeeding values in the data.
Exercise
Go through the values listed below and arrange them in ascending order of magnitude
5, 3, 6, 7, 4, 1, 2, 4, 0, 3
3, 5, 2, 6, 2, 5, 0, 7, 2, 5
2, 3, 5, 4, 6, 7, 5, 1, 1, 3
2 .Frequency Distribution
2.1 Frequency Distribution for Ungrouped Data
An ordered array of data does not sufficiently summarize the data. The data can be further summarized by
preparing the frequency distribution for the data.
A frequency distribution is a table showing the values of the data and the number of occurrence of each of the
values.
Example
For the data in exercise above, present the frequency distribution for the values. The values will be represented by Xi
and the frequency is represented by Fi
NOUN 24
SMS201 BUSINESS STATISTICS I
Frequency Distribution
Xi Fi
0 2
1 3
2 5
3 5
4 3
5 6
6 3
7 3
This table shows the values of the variable and their respective frequencies.
It is possible to have the frequency in example above without grouping the data because the values are not many.
There are only seven values. In situation where the values are in thousands or millions, it may be difficult to analyze
the values if they are not grouped.
For many of sophisticated analyses, we need to group the data before analysis commences. We therefore group the
data into class intervals
Class intervals: are defined as contiguous, non-overlapping intervals selected in such a way that they are mutually
exclusive and collectively exhaustive. They are mutually exclusive in the sense that a value is Placed in one and only
one class interval.
The class intervals could be 5-9, 10-14, 15-19, 20-24..................... It could also be 11-20, 21-30, 31-
40,.............................
It may even take either form. In this unit there will be more examples of class intervals. The class should not be too
few and should not be too many. Too few class intervals can result in a loss of much detail while too many class
intervals may not condense the
NOUN 25
SMS201 BUSINESS STATISTICS I
Example
The table below shows the scores of 50 students in mathematics in a Senior Secondary Examination
19 50 57 25 61 42 26 33 46 45
63 31 80 36 78 56 38 69 83 40
52 17 35 65 13 63 72 29 56 57
22 45 53 44 76 47 86 55 66 48
41 64 38 43 23 58 55 32 52 46
For this we need to prepare the tallies from the tallies we obtain the frequency of each of the class intervals.
11-20 III 3
21-30 IIII 5
31-40 IIII III 8
41-50 IIII IIII I 11
51-60 IIII IIII 10
61-70 IIII II 7
71-80 IIII 4
81-90 II 2
NOUN 26
SMS201 BUSINESS STATISTICS I
What you have above is the frequency distribution of a grouped data. To further summarize grouped data, some basic
concepts are important. These concepts will be defined and computed now.
(i) Class Limit: For any class interval, there are two class limits, the lower and upper class limits. For the
example 3.2, the lower class limits
are 11, 21, 31, 41, 51, 61, 71, 81. The upper class limits are 20, 30, 40, 50, 60, 70, 80 and 90.
(ii) Class Boundaries: For any class interval, there are two class boundaries. The lower class boundary of
a class interval is the mean of the lower class limit of the interval and the upper limit of the preceeding
interval. Let us compute the lower class boundary of the interval 11-20. The lower class limit of the class interval is
11 and the upper class limit of the preceeding class interval is suppose to be 10. The lower class boundary of the class
interval is therefore:
10+11 = 10.5
2
For the example 3.2, the lower class boundaries of the class intervals are 10.5, 20.5, 30.5, 40.5, 50.5, 60.5, 70.5, and
80.5.
The upper class boundary of a class interval is the mean of the upper class limit of the class interval and the lower class
limit of the succeeding class interval. For example 3.2, the upper class boundary of interval 11-20 is 20.5. The upper
class limit of the class interval is 20 while the lower class limit of the succeeding class interval is 21. The upper class
boundary of the interval is therefore equal to:
20+21 = 20.5
2
For the example, the upper class boundaries of the class intervals are respectively 20.5, 30.5, 40.5, 50.5, 60.5, 70.5,
80.5, and 90.5. From these values, you will see that the upper class boundary of a class interval is the lower class
boundary of the succeeding class interval. The classes can also be given in terms of class boundaries rather than class
limits. When this is done, there is overlapping of the class intervals. For example 3.2, if the class boundaries are used,
we will have the following as our frequency distribution. (iii) Class Width: This is the difference between the
upper and lower
NOUN 27
SMS201 BUSINESS STATISTICS I
10.5-20.5 3
20.5-30.5 5
30.5-40.5 8
40.5-50.5 11
50.5-60.5 10
60.5-70.5 7
70.5-80.5 4
80.5-90.5 2
class boundaries (not class limits). For our example, the class width for all the class intervals is 10. For the first
interval, the class width is 20.5-10.5 = 10. It is not 20-11.
(iv) Class Mark: This is the mean of the upper and the lower class boundaries. It can also be the mean of
the lower and the upper class limits. For example 3.2, the class mark for the first interval is:
10.5+20.5 = 15.5. It can also be 11+20 = 15.5
2 2
For the classes in the example, we have the following as class width for the respective class intervals 15.5, 25.5,
35.5 , 45.5, 55.5, 65.5, 75.5, and 85.5. There is need to summarize what we have done so far into class limits, class
boundaries, class marks and class width.
NOUN 28
SMS201 BUSINESS STATISTICS I
The relative frequency of a value is defined by the total frequencies of all the values contained in the set of values.
Example. Suppose the frequency distribution of the scores of the twenty students in a test is as presented below
Score Frequenc
s y
2 1
3 2
4 2
5 4
6 5
7 3
8 2
9 1
TOTAL 20
The total frequency is 20. The relative frequency of the first score is given as 1/20 = 0.05
NOUN 29
SMS201 BUSINESS STATISTICS I
For the grouped data, the relative frequency of a class interval is the frequency of the interval divided by the total
frequencies of all class intervals. For example the relative frequency distribution of the class intervals is as presented
NOUN 30
SMS201 BUSINESS STATISTICS I
61-70 7 0.14
71-80 4 0.08
81-90 2 0.04
Going through the relative frequencies table presented in the unit, you will realize that the sum of the relative
frequencies for a set of values is 1
In this unit, you have learned construction of relative frequency for ungrouped data. A further summary of data can be
in form of cumulative relative frequency. To construct the cumulative relative frequency, we need to construct
the cumulative frequency for the data. The cumulative relative frequency of a value is the cumulative frequency of the
values divided by the total frequency of the values contained in the array of data.
For the example 3.3, the cumulative frequency and the cumulative relative frequency for the set of data is as
presented below
Cumulative
Score Frequency Cumulative Relative Frequency
Frequency
2 1 1 1/20 = 0.05
3 2 3 = 1+2 3/20 = 0.15
4 2 5 = 3+2 5/20 = 0.25
5 4 9= =5+4 9/20 = 0.45
6 5 14=9+5 14/20=0.70
7 3 17 = 14+3 17/20 =.85
NOUN 31
SMS201 BUSINESS STATISTICS I
Example
The table below shows the frequency distribution of the ages of employees in a firm. Prepare the cumulative
frequency and the cumulative relative frequency for the ages of the employees.
48-50 3
NOUN 32
SMS201 BUSINESS STATISTICS I
51-53 2
54-56 2
57-59 1
The cumulative frequency and the cumulative relative frequency distributions are as follows.
NOUN 33
SMS201 BUSINESS STATISTICS I
Assignment
(i) The frequency distribution for the class interval 11-20, 21-30, 31-40........
(ii) Cummulative Frequency distribution.
(iii) Relative Frequency.
(v) Cummulative Relative Frequency Distribution
Ages of 100 Employees
58 37 21 28 27 24 27 39 38 30
60 20 30 50 44 33 23 26 31 32
18 23 41 32 27 42 40 29 28 34
56 19 28 29 23 47 29 24 31 34
30 19 22 53 49 26 16 38 42 36
41 32 33 20 31 36 32 32 31 32
21 24 55 21 24 34 37 29 33 32
32 49 38 48 33 43 26 38 30 28
24 46 15 43 43 23 23 28 34 28
41 23 25 19 51 23 36 31 35 31
NOUN 34
SMS201 BUSINESS STATISTICS I
We use average many times to mean the arithmetic mean. We compute arithmetic mean for
both ungrouped and grouped data. We also compute arithmetic mean, which we henceforth in
this unit call mean, for both the sample and the population from where the sample is drawn.
Mean computed for the sample is called a statistic and it is donated by x . The mean computed
for the population is called a parameter and is denoted by µ You should note in this course that
any measure computed for the sample is called statistic and any measure computed for the
population is called parameter.
The mean of ungrouped data is the summation of the values in the set of data divided by the
number of values in the set of data.
For a sample the number of values is denoted by n, that is the sample size, and for the
population the population size is given as N.
-
Mean of ungrouped data (if a sample) is
∑x
x = n
∑ = Summation
×i = values of a variable
n = Sample size
Example 5.1
Compute the mean for the following:
10, 9,13, 2.
NOUN 35
SMS201 BUSINESS STATISTICS I
Mean=x = 7+5+8+10+11+6+3+4+10+9+13+2
12
= 88 = 7.33
12
You will realize that there is no frequency distribution for this example. Suppose
there is a frequency distribution for the values of a variable, then how do we calculate
the mean?
This is simple. If we are computing the mean for a sample, that is x, the mean
Example 5.2
The table below gives the frequency distribution of the mark scored by 20 students in
a test conducted in Statistics for Management
Scores frequency
11 2
12 3
13 5
14 7
15 2
NOUN 16 1 36
SMS201 BUSINESS STATISTICS I
Scores Frequency
Total 20 276
276
Mean = x = ∑fixi = 20 = 13.35
There can be another method of computing the mean apart from using
∑ fidi
we use the Assumed mean, A, then the mean x = A +
∑ fi
where A = Assumed mean
NOUN 37
SMS201 BUSINESS STATISTICS I
Example , let us compute the mean again using an assumed mean of 13.
We will then have the table below for the solution of the problem.
Total 20 7
Where Ass
ASSUMED MEAN (A) = 13
Mean = x = 13 + 7 = 13 + 0.35 = 13.35
20
You will see that with the method of assumed mean, we should obtain the same value
of mean we had before.
You have learned how to compute the arithmetic mean of ungrouped data.
When the values are many in a set of data, there is the need to group them into class
intervals. You learned about this in unit 3. We need to take some time to compute
the mean of grouped data.
NOUN 38
STA 101 BUSINESS STATISTICS I
Example
The earning per share (in kobo) of some firms is presented below with the frequency
distribution.
To solve this question, we need to compute the class mark for the class intervals. The
class mark becomes the Xi we will use in the computation. Immediately this is done,
the whole distribution is reduced to the form of an ungrouped data with frequency
distribution. You should recall that class mark is the mean of the upper and lower
class boundaries (or Limits) of a class interval.
Mean x = 4100 = 82
50
STA 101 BUSINESS STATISTICS I
Example : Using the assumed mean method, compute the mean of the distribution in
example
We will still make use of the frequency distribution of the class mark
We assume a mean of 77
x =A+ ∑fidi
x = 250 = 82
50
NOUN 40
STA 101 BUSINESS STATISTICS I
Assignment Since all the values in a set of data are used to compute the mean, the mean can be
influenced by extreme values
3 + 4 +5 + 6 +7 +19 = 44 = 7.33
5 5
(ii) We are unable to compute mean for data in which there are open- ended
classes either at the beginning of the distribution or at the end of the
distribution. It will be difficult to know the class mark of the open-ended
class.
(i) Compute the mean using the two methods in this unit
(ii) Compare your result and comment
Class Frequency
8.0-8.9 5
9.0-9.9 7
10.0-10.9 10
11.0-11.9 13
12.0-12.9 15
13.0-13.9 5
14.0-14.9 3
15.0-15.9 2
NOUN 41
STA 101 BUSINESS STATISTICS I
GEOMETRIC MEAN
For values x l, x2, x3, . . . xn the geometric mean is the nth root of the product of the values.
The geometric mean is denoted as GM therefore
Where GM is geometric mean x1, X2, x3, . .. xn are values of the variable of interest, while n
represents the sample size.
Example
GM = √4 3∗5∗6∗7
= √4 630 = 5.01
HARMONIC MEAN
NOUN 42
STA 101 BUSINESS STATISTICS I
Example 6.5
Harmonic mean =
3 = 3 = 5.88
0.2 + .0167 + 0.143 0.5099
The harmonic mean can also be used to obtain the average of different speeds
The harmonic mean is used to average ratios, speeds etc. It is used mostly in engineering.
Exercise 6.2
NOUN 43