Statistical measures such as average, variance and standard deviation are fundamental tools in data analysis. They help summarize numerical data, understand central tendency and measure how spread out the data is. In R, these measures can be calculated easily using built-in functions.
- Mean (Average): Measures the center of the data
- Variance: Measures how far values spread from the mean
- Standard Deviation: Measures overall data dispersion
Average in R
The mean is a measure of central tendency. It is calculated by dividing the sum of all observations by the total number of observations. R provides the built-in function mean() to calculate the average of a numeric vector.
Syntax
mean(x, na.rm = FALSE)
Parameters:
- x: Numeric Vector
- na.rm: If TRUE, ignores missing values (NA)
Example: R provides the built-in function mean() to compute the average.
# Create a numeric vector
data <- c(2, 4, 4, 4, 5, 5, 7, 9)
# Calculate mean
mean(data)
Output
[1] 5
Variance in R
Variance measures how far each number in the set is from the mean. It is the average of the squared differences from the Mean. We can calculate the variance by using var() function in R.
Syntax
var(x)
Where, x: numeric vector
Example:
data <- c(2, 4, 4, 4, 5, 5, 7, 9)
var(data)
Output
[1] 4.571429
Note: R calculates sample variance (divides by n-1). For population variance, multiply by (n-1)/n.
Standard Deviation in R
Standard Deviation is the square root of variance. It is a measure of the extent to which data varies from the mean. One can calculate the standard deviation by using sd() function in R.
Syntax
sd(x)
Parameters:
- x: numeric vector
Example:
data <- c(2, 4, 4, 4, 5, 5, 7, 9)
sd(data)
Output
[1] 2.13809
Calculating All Three Measures for a Dataset
Let’s calculate the mean, variance and standard deviation for the following dataset:
data <- c(12, 15, 18, 22, 30, 35)
mean_value <- mean(data)
variance_value <- var(data)
sd_value <- sd(data)
print(paste("Mean:", mean_value))
print(paste("Variance:", variance_value))
print(paste("Standard Deviation:", sd_value))
Output
[1] "Mean: 22" [1] "Variance: 79.6" [1] "Standard Deviation: 8.92188320927819"
Visualizing Mean, Variance and Standard Deviation
We can visualize these measures using a density plot with ggplot2
library(ggplot2)
# Generate 100 random data points with mean=50 and sd=10
set.seed(123)
d <- rnorm(100, 50, 10)
# Calculate mean, variance, and standard deviation
m <- mean(d); v <- var(d); s <- sd(d)
# Create the plot
ggplot(data.frame(d), aes(d)) +
geom_density(fill = "lightblue", alpha = 0.5) +
# Add vertical lines for mean and standard deviation boundaries
geom_vline(xintercept = c(m, m + s, m - s),
color = c("red", "green", "green"),
linetype = c("dashed", "dotted", "dotted"),
linewidth = c(1.2, 1, 1)) +
labs(title = "Visualization of Mean, Variance, and Standard Deviation",
x = "Data Values", y = "Density") +
theme_minimal() +
# Annotate mean, mean ± SD, and variance
annotate("text", x= m, y = 0.03, label = paste("Mean =", round(m, 2)), color = "red", vjust = -1) +
annotate("text", x= m + s, y = 0.02, label = paste("Mean + SD =", round(m + s, 2)), color = "green", vjust = -1) +
annotate("text", x= m - s, y = 0.02, label = paste("Mean - SD =", round(m - s, 2)), color = "green", vjust = -1) +
annotate("text", x= m + 20, y = 0.04, label = paste("Variance =", round(v, 2)), color = "blue", vjust = -1)
Output:

- The mean as a red dashed line in the center of the distribution.
- The standard deviation lines as green dotted lines on both sides of the mean, indicating the spread of the data.
- Variance annotation: The annotate() function adds a label showing the variance in blue text at a specified location (in this case, to the right of the mean).
This visualization provides the way to see how the data is distributed around the mean and how spread out it is using the standard deviation. The variance is inherently visualized as part of the spread between the standard deviation lines.