Stratified Sampling in R

Last Updated : 25 Jul, 2025

Stratified sampling involves splitting a population into different groups based on a common characteristic and then randomly selecting members from each group. This method is useful when we want to ensure that each subgroup is represented in the sample. In this article , we will explore how to implement stratified sampling using R programming language.

Implementation of Stratified Sampling Using Number of Rows

We divide the population into groups and select a fixed number of members from each group to form the final sample.

1. Installing and Loading Required Packages

We install and load the dplyr package to manipulate data and perform group-wise sampling.

  • install.packages: Installs the required package.
  • library: Loads the installed package for use.
R
install.packages("dplyr")
library(dplyr)

2. Creating the Data Frame

We create a data frame containing 600 entries, with equal numbers of Teachers, Students, Workforce and Guests, each having a randomly generated GPA.

  • data.frame: Creates a tabular structure to hold data.
  • rep: Repeats the group labels to form a full population.
  • rnorm: Generates random GPA scores using a normal distribution.
R
df <- data.frame(group = rep(c("Teachers", "Students", "Workforce", "Guests"), each = 150),
                 gpa = rnorm(600, mean = 90, sd = 3))

3. Obtaining Stratified Sample

We group the data by category and select 15 samples from each group using random sampling.

  • group_by: Groups data by a categorical column.
  • sample_n: Selects a specific number of samples from each group.
  • pipe operator (%>%): Passes the result from one function to the next.
R
strat_sample <- df %>%
  group_by(group) %>%
  sample_n(size = 15)

4. Finding Frequency of Groups in the Sample

We check how many records are selected from each group in the final sample.

  • table: Counts the frequency of each group in the sample.
R
table(strat_sample$group)

Output:

table
Output

Implementation of Stratified Sampling Using Fraction of Rows

We divide the population into groups and select a specific fraction of members from each group to form the final sample.

1. Installing and Loading Required Packages

We install and load the dplyr package to enable data manipulation and sampling functions.

  • install.packages: Installs the required package.
  • library: Loads the installed package for use.
R
install.packages("dplyr")
library(dplyr)

2. Creating the Data Frame

We create the same data frame with 600 rows and four groups, each having 150 entries and a GPA score.

  • data.frame: Creates the data structure.
  • rep: Repeats group values.
  • rnorm: Generates GPA values randomly.
R
df <- data.frame(group = rep(c("Teachers", "Students", "Workforce", "Guests"), each = 150),
                 gpa = rnorm(600, mean = 90, sd = 3))

3. Obtaining Stratified Sample

We use the group-wise sampling function to select 20 percent of data from each group.

  • group_by: Groups data according to the category.
  • sample_frac: Selects a fraction of samples from each group.
  • pipe operator (%>%): Passes the result from one step to another.
R
strat_sample <- df %>%
  group_by(group) %>%
  sample_frac(size = 0.20)

4. Finding Frequency of Groups in the Sample

We check how many records were selected from each group after applying fraction-based sampling.

  • table: Shows the count of entries from each group.
R
table(strat_sample$group)

Output:

table
Output

We implemented stratified sampling in R programming language using two methods, fixed number of rows and fraction of rows.

Comment

Explore