Stratified sampling involves splitting a population into different groups based on a common characteristic and then randomly selecting members from each group. This method is useful when we want to ensure that each subgroup is represented in the sample. In this article , we will explore how to implement stratified sampling using R programming language.
Implementation of Stratified Sampling Using Number of Rows
We divide the population into groups and select a fixed number of members from each group to form the final sample.
1. Installing and Loading Required Packages
We install and load the dplyr package to manipulate data and perform group-wise sampling.
- install.packages: Installs the required package.
- library: Loads the installed package for use.
install.packages("dplyr")
library(dplyr)
2. Creating the Data Frame
We create a data frame containing 600 entries, with equal numbers of Teachers, Students, Workforce and Guests, each having a randomly generated GPA.
- data.frame: Creates a tabular structure to hold data.
- rep: Repeats the group labels to form a full population.
- rnorm: Generates random GPA scores using a normal distribution.
df <- data.frame(group = rep(c("Teachers", "Students", "Workforce", "Guests"), each = 150),
gpa = rnorm(600, mean = 90, sd = 3))
3. Obtaining Stratified Sample
We group the data by category and select 15 samples from each group using random sampling.
- group_by: Groups data by a categorical column.
- sample_n: Selects a specific number of samples from each group.
- pipe operator (%>%): Passes the result from one function to the next.
strat_sample <- df %>%
group_by(group) %>%
sample_n(size = 15)
4. Finding Frequency of Groups in the Sample
We check how many records are selected from each group in the final sample.
- table: Counts the frequency of each group in the sample.
table(strat_sample$group)
Output:

Implementation of Stratified Sampling Using Fraction of Rows
We divide the population into groups and select a specific fraction of members from each group to form the final sample.
1. Installing and Loading Required Packages
We install and load the dplyr package to enable data manipulation and sampling functions.
- install.packages: Installs the required package.
- library: Loads the installed package for use.
install.packages("dplyr")
library(dplyr)
2. Creating the Data Frame
We create the same data frame with 600 rows and four groups, each having 150 entries and a GPA score.
- data.frame: Creates the data structure.
- rep: Repeats group values.
- rnorm: Generates GPA values randomly.
df <- data.frame(group = rep(c("Teachers", "Students", "Workforce", "Guests"), each = 150),
gpa = rnorm(600, mean = 90, sd = 3))
3. Obtaining Stratified Sample
We use the group-wise sampling function to select 20 percent of data from each group.
- group_by: Groups data according to the category.
- sample_frac: Selects a fraction of samples from each group.
- pipe operator (%>%): Passes the result from one step to another.
strat_sample <- df %>%
group_by(group) %>%
sample_frac(size = 0.20)
4. Finding Frequency of Groups in the Sample
We check how many records were selected from each group after applying fraction-based sampling.
- table: Shows the count of entries from each group.
table(strat_sample$group)
Output:

We implemented stratified sampling in R programming language using two methods, fixed number of rows and fraction of rows.