Statistical measures like mean, median and mode are important for summarizing and understanding the central tendency of a dataset. They help describe the typical value in the data and provide a quick overview of its distribution. In R, these measures can be calculated easily using built-in functions.
Mean in R
The mean is the arithmetic average the sum of all values divided by the count of values.
Syntax
mean(x, na.rm = FALSE)
Parameters:
- x: Numeric vector
- na.rm: If TRUE, ignores NA values
Example:
x <- c(2, 4, 6, 8, 10)
mean(x)
# Handling NA
x <- c(2, 4, NA, 8)
mean(x, na.rm = TRUE)
Output
[1] 6 [1] 4.666667
Explanation:
- In the first example, it returns the average of all values in x: (2+4+6+8+10)/5 = 6
- In second example The na.rm = TRUE removes the NA, so the average is (2+4+8)/3 = 4.666667
Median in R
It is the middle value of the data set. It splits the data into two halves. If the number of elements in the data set is odd then the center element is median and if it is even then the median would be the average of two central elements.
Syntax
median(x, na.rm = FALSE)
Example:
x <- c(1, 3, 5, 7, 9)
median(x)
# With NA values
x <- c(1, NA, 5, 7)
median(x, na.rm = TRUE)
Output
[1] 5 [1] 5
Explanation:
- In first example sorted list has 5 numbers, the middle one is 5
- In second example After removing NA, the sorted values are (1, 5, 7), the middle value is 5
Mode in R
The mode is the value that appears most frequently in a dataset. R does not include a built-in mode function for statistical mode, but you can define one easily.
Method 1: Custom Function to Find Mode
get_mode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
x <- c(1, 2, 2, 3, 3, 3, 4)
get_mode(x)
Output
[1] 3
Explanation: The number 3 appears most frequently (3 times), so it is the mode
Method 2: Using Modeest Package
We can use the modeest package of the R. This package provides methods to find the mode of the univariate data and the mode of the usual probability distribution.
# Install and load package
install.packages("modeest")
library(modeest)
x <- c(1, 2, 2, 3, 3, 3, 4)
mfv(x)
Output
3Explanation: The mfv() function from the modeest package finds the most frequent value again, 3