diff --git a/DIRECTORY.md b/DIRECTORY.md index 0a0560e..79398d5 100644 --- a/DIRECTORY.md +++ b/DIRECTORY.md @@ -2,6 +2,10 @@ ## Association Algorithms * [Apriori](https://github.com/TheAlgorithms/R/blob/HEAD/association_algorithms/apriori.r) +## Biomedical + * [Mann Whitney U Test](https://github.com/TheAlgorithms/R/blob/HEAD/biomedical/mann_whitney_u_test.r) + * [Wilcoxon Signed Rank Test](https://github.com/TheAlgorithms/R/blob/HEAD/biomedical/wilcoxon_signed_rank_test.r) + ## Classification Algorithms * [Decision Tree](https://github.com/TheAlgorithms/R/blob/HEAD/classification_algorithms/decision_tree.r) * [Gradient Boosting Algorithms](https://github.com/TheAlgorithms/R/blob/HEAD/classification_algorithms/gradient_boosting_algorithms.r) @@ -61,6 +65,7 @@ ## Machine Learning * [Gradient Boosting](https://github.com/TheAlgorithms/R/blob/HEAD/machine_learning/gradient_boosting.r) + ## Mathematics * [Amicable Numbers](https://github.com/TheAlgorithms/R/blob/HEAD/mathematics/amicable_numbers.r) * [Armstrong Number](https://github.com/TheAlgorithms/R/blob/HEAD/mathematics/armstrong_number.r) diff --git a/biomedical/README.md b/biomedical/README.md new file mode 100644 index 0000000..41ce751 --- /dev/null +++ b/biomedical/README.md @@ -0,0 +1,226 @@ +# R for Biomedical Data Analysis + +This directory contains R implementations of essential statistical tests commonly used in biomedical research and clinical studies. These non-parametric tests are fundamental tools for biomedical students and researchers working with medical data. + +## 📋 Contents + +- [`wilcoxon_signed_rank_test.r`](wilcoxon_signed_rank_test.r) - Wilcoxon Signed-Rank Test for paired samples +- [`mann_whitney_u_test.r`](mann_whitney_u_test.r) - Mann-Whitney U Test for independent samples +- [`README.md`](README.md) - This documentation file + +## 🏥 Why These Tests Are Essential for Biomedical Students + +### 1. **Real-World Medical Data Challenges** +Medical data rarely follows normal distributions due to: +- **Skewed distributions**: Lab values, reaction times, survival data +- **Outliers**: Extreme measurements that are medically significant +- **Small sample sizes**: Pilot studies, rare diseases, expensive procedures +- **Ordinal scales**: Pain scores, quality of life indices, severity ratings + +### 2. **Robust and Assumption-Free** +Non-parametric tests like Wilcoxon and Mann-Whitney: +- **No normality assumption**: Work with any distribution shape +- **Outlier resistant**: Not affected by extreme values +- **Rank-based**: Focus on relative ordering rather than exact values +- **Clinically meaningful**: Often more relevant for medical decisions + +### 3. **Common Applications in Medicine** +- **Clinical trials**: Comparing treatment effectiveness +- **Before/after studies**: Evaluating intervention outcomes +- **Biomarker research**: Comparing levels between groups +- **Quality of life studies**: Analyzing patient-reported outcomes +- **Diagnostic accuracy**: Comparing test performance + +## 📊 Statistical Tests Overview + +### Wilcoxon Signed-Rank Test +**Purpose**: Compare paired samples or test single sample against a median + +**When to use**: +- Before/after treatment comparisons +- Matched pairs (e.g., twins, matched controls) +- Repeated measurements on same subjects +- Testing if median differs from a specific value + +**Medical Examples**: +- Blood pressure before vs after medication +- Pain scores before vs after treatment +- Weight loss in diet studies +- Biomarker changes over time + +**Assumptions**: +- Paired observations or single sample +- Data can be ranked +- Differences are symmetrically distributed around the median + +### Mann-Whitney U Test (Wilcoxon Rank-Sum Test) +**Purpose**: Compare two independent groups + +**When to use**: +- Comparing treatment vs control groups +- Comparing different populations +- When groups are independent (not paired) +- Alternative to two-sample t-test + +**Medical Examples**: +- Drug efficacy: treatment vs placebo +- Gender differences in biomarkers +- Disease severity between stages +- Age-related immune responses +- Comparing diagnostic methods + +**Assumptions**: +- Two independent samples +- Data can be ranked +- Similar distribution shapes (for location comparison) + +## 🚀 Quick Start Guide + +### Installation +No additional packages required! Just R base installation. + +```r +# Clone or download the files to your R working directory +# Source the functions +source("wilcoxon_signed_rank_test.r") +source("mann_whitney_u_test.r") +``` + +### Basic Usage + +#### Wilcoxon Signed-Rank Test +```r +# Example: Blood pressure before and after treatment +before <- c(145, 150, 138, 155, 142, 148, 152, 140) +after <- c(138, 142, 135, 148, 136, 140, 145, 133) + +# Perform the test +result <- wilcoxon_signed_rank_test(before, after, alternative = "greater") +print(result) +``` + +#### Mann-Whitney U Test +```r +# Example: Comparing treatment vs control groups +treatment <- c(78, 85, 92, 73, 88, 91, 76, 83) +control <- c(65, 58, 71, 62, 69, 54, 67, 60) + +# Perform the test +result <- mann_whitney_u_test(treatment, control, alternative = "greater") +print(result) +``` + +## 🔬 Detailed Examples with Biomedical Data + +### Running the Examples +Each R file contains comprehensive examples with dummy biomedical data: + +```r +# Run Wilcoxon examples +source("wilcoxon_signed_rank_test.r") +run_biomedical_examples() + +# Run Mann-Whitney examples +source("mann_whitney_u_test.r") +run_biomedical_examples() +``` + +## 📈 Understanding the Results + +### Key Output Elements + +#### For Wilcoxon Signed-Rank Test: +- **W+**: Sum of ranks for positive differences +- **W-**: Sum of ranks for negative differences +- **Test statistic W**: Usually min(W+, W-) +- **P-value**: Probability of observing the result by chance +- **Effect size**: Magnitude of the difference + +#### For Mann-Whitney U Test: +- **U1, U2**: U statistics for each group +- **W1, W2**: Sum of ranks for each group +- **Test statistic U**: Depends on alternative hypothesis +- **P-value**: Probability of observing the result by chance +- **Effect size estimate**: Magnitude of group difference + +### Interpreting P-values in Medical Context +- **p < 0.001**: Highly significant - very strong evidence +- **p < 0.01**: Very significant - strong evidence +- **p < 0.05**: Significant - moderate evidence +- **p ≥ 0.05**: Not significant - insufficient evidence + +**Important**: Always consider clinical significance alongside statistical significance! + +## 🎯 Choosing the Right Test + +| Scenario | Test | Example | +|----------|------|---------| +| Same subjects, before/after | Wilcoxon Signed-Rank | Pre/post treatment blood pressure | +| Paired subjects | Wilcoxon Signed-Rank | Twins, matched controls | +| Single sample vs reference | Wilcoxon Signed-Rank | Patient cholesterol vs normal (200) | +| Two independent groups | Mann-Whitney U | Treatment vs control groups | +| Gender/age comparisons | Mann-Whitney U | Male vs female biomarker levels | +| Disease stage comparison | Mann-Whitney U | Stage I vs Stage II severity | + +## 🧮 Statistical Theory (Simplified) + +### Wilcoxon Signed-Rank Test +1. **Calculate differences** between paired observations +2. **Rank the absolute differences** (ignore zeros) +3. **Sum ranks** for positive and negative differences separately +4. **Test statistic** is typically the smaller sum +5. **Compare** to expected distribution under null hypothesis + +### Mann-Whitney U Test +1. **Combine all observations** from both groups +2. **Rank all values** from smallest to largest +3. **Sum ranks** for each group separately +4. **Calculate U statistics** using rank sums +5. **Compare** to expected distribution under null hypothesis + +## 🔧 Advanced Features + +### Alternative Hypotheses +- **"two.sided"**: Groups/conditions are different (default) +- **"greater"**: First group/condition is greater than second +- **"less"**: First group/condition is less than second + +### Handling Ties +Both implementations use average ranks for tied values, which is the standard approach. + +### Effect Size +- **Wilcoxon**: Based on Z-score and sample size +- **Mann-Whitney**: Based on U statistic relative to maximum possible + +## 📚 Further Reading + +### Recommended Resources for Biomedical Students +1. **"Biostatistics: A Foundation for Analysis in the Health Sciences"** - Wayne Daniel +2. **"Statistical Methods in Medical Research"** - Armitage, Berry & Matthews +3. **"Nonparametric Statistical Methods"** - Hollander, Wolfe & Chicken +4. **"Medical Statistics from Scratch"** - David Bowers + +### Online Resources +- [Laerd Statistics](https://statistics.laerd.com/) - Excellent step-by-step guides +- [StatsDirect](https://www.statsdirect.com/) - Comprehensive statistical reference +- [BMJ Statistics Notes](https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one) - Medical statistics primer + +## ⚠️ Important Considerations + +### When NOT to Use These Tests +- **Large samples with normal data**: t-tests might be more powerful +- **Survival data**: Use specialized survival analysis methods +- **Repeated measures**: Consider mixed-effects models +- **Multiple comparisons**: Adjust p-values appropriately + +### Common Pitfalls +1. **Multiple testing**: Correct for multiple comparisons +2. **Effect size**: Don't ignore practical significance +3. **Sample size**: Very small samples need exact methods +4. **Assumptions**: Ensure data can be meaningfully ranked + +### Data Quality Checks +- Check for outliers and data entry errors +- Verify assumptions are met +- Consider the clinical context +- Validate results with domain experts diff --git a/biomedical/mann_whitney_u_test.r b/biomedical/mann_whitney_u_test.r new file mode 100644 index 0000000..b42ccc9 --- /dev/null +++ b/biomedical/mann_whitney_u_test.r @@ -0,0 +1,368 @@ +# Mann-Whitney U Test (Wilcoxon Rank-Sum Test) for Biomedical Data Analysis +# Author: Contributor +# Description: Implementation of Mann-Whitney U test for comparing two independent groups +# Common applications: Comparing treatment vs control groups, comparing different populations + +mann_whitney_u_test <- function(x, y, alternative = "two.sided", exact = FALSE) { + #' Mann-Whitney U Test (Wilcoxon Rank-Sum Test) + #' + #' Performs the Mann-Whitney U test to compare two independent samples + #' + #' @param x numeric vector of data values for group 1 + #' @param y numeric vector of data values for group 2 + #' @param alternative character, alternative hypothesis ("two.sided", "less", "greater") + #' @param exact logical, whether to compute exact p-values for small samples + #' + #' @return list containing: + #' - statistic: test statistic (U) + #' - p_value: p-value of the test + #' - U1: U statistic for group 1 + #' - U2: U statistic for group 2 + #' - W1: sum of ranks for group 1 + #' - W2: sum of ranks for group 2 + #' - n1: sample size of group 1 + #' - n2: sample size of group 2 + #' - method: description of test performed + #' - alternative: alternative hypothesis + #' - data_summary: summary of input data + + # Input validation + if (!is.numeric(x) || !is.numeric(y)) { + stop("Both x and y must be numeric vectors") + } + + if (length(x) == 0 || length(y) == 0) { + stop("Both groups must contain at least one observation") + } + + if (!alternative %in% c("two.sided", "less", "greater")) { + stop("alternative must be 'two.sided', 'less', or 'greater'") + } + + # Remove missing values + x <- x[!is.na(x)] + y <- y[!is.na(y)] + + n1 <- length(x) + n2 <- length(y) + + if (n1 == 0 || n2 == 0) { + stop("Groups cannot be empty after removing missing values") + } + + # Combine data and calculate ranks + combined <- c(x, y) + N <- n1 + n2 + ranks <- rank(combined, ties.method = "average") + + # Sum of ranks for each group + W1 <- sum(ranks[1:n1]) # Sum of ranks for group 1 + W2 <- sum(ranks[(n1 + 1):N]) # Sum of ranks for group 2 + + # Calculate U statistics + U1 <- W1 - n1 * (n1 + 1) / 2 + U2 <- W2 - n2 * (n2 + 1) / 2 + + # Verify: U1 + U2 should equal n1 * n2 + if (abs(U1 + U2 - n1 * n2) > 1e-10) { + warning("U statistics calculation may have numerical errors") + } + + # Test statistic (traditionally the smaller U, but depends on alternative) + if (alternative == "greater") { + # Testing if group 1 > group 2, so we want U1 + U_stat <- U1 + } else if (alternative == "less") { + # Testing if group 1 < group 2, so we want U1 (smaller values) + U_stat <- U1 + } else { + # Two-sided: use the smaller U for traditional reporting + U_stat <- min(U1, U2) + } + + # Calculate p-value + if (n1 >= 8 && n2 >= 8 && !exact) { + # Normal approximation for large samples + mu_U <- n1 * n2 / 2 + + # Check for ties and adjust variance if necessary + ties <- table(combined) + tie_correction <- sum(ties^3 - ties) / (N * (N - 1)) + + var_U <- n1 * n2 * (N + 1) / 12 - n1 * n2 * tie_correction / (12 * (N - 1)) + sigma_U <- sqrt(var_U) + + # Apply continuity correction + if (alternative == "two.sided") { + z <- (abs(U1 - mu_U) - 0.5) / sigma_U + p_value <- 2 * pnorm(z, lower.tail = FALSE) + } else if (alternative == "less") { + z <- (U1 + 0.5 - mu_U) / sigma_U + p_value <- pnorm(z) + } else { # greater + z <- (U1 - 0.5 - mu_U) / sigma_U + p_value <- pnorm(z, lower.tail = FALSE) + } + + method <- "Mann-Whitney U test with normal approximation" + + } else { + # For small samples or when exact is requested + if (exact && n1 <= 20 && n2 <= 20) { + # Note: Exact calculation would require generating all possible rank combinations + # This is computationally intensive and typically done using lookup tables + p_value <- NA + method <- "Mann-Whitney U test (exact method - requires lookup tables)" + warning("Exact p-value calculation requires specialized tables. Using NA.") + } else { + p_value <- NA + method <- "Mann-Whitney U test" + warning("Sample sizes are small. Consider using exact tables for p-value calculation.") + } + } + + # Prepare data summary + data_summary <- list( + n1 = n1, + n2 = n2, + median_x = median(x), + mean_x = mean(x), + sd_x = sd(x), + median_y = median(y), + mean_y = mean(y), + sd_y = sd(y), + combined_median = median(combined), + combined_mean = mean(combined) + ) + + # Return results + result <- list( + statistic = U_stat, + p_value = p_value, + U1 = U1, + U2 = U2, + W1 = W1, + W2 = W2, + n1 = n1, + n2 = n2, + method = method, + alternative = alternative, + data_summary = data_summary + ) + + class(result) <- "mann_whitney_test" + return(result) +} + +# Print method for mann_whitney_test objects +print.mann_whitney_test <- function(x, ...) { + cat("\n", x$method, "\n") + cat("Data summary:\n") + cat(" Group 1 (x): n =", x$n1, ", median =", round(x$data_summary$median_x, 3), + ", mean =", round(x$data_summary$mean_x, 3), ", SD =", round(x$data_summary$sd_x, 3), "\n") + cat(" Group 2 (y): n =", x$n2, ", median =", round(x$data_summary$median_y, 3), + ", mean =", round(x$data_summary$mean_y, 3), ", SD =", round(x$data_summary$sd_y, 3), "\n") + + cat("\nTest results:\n") + cat(" W1 (sum of ranks, group 1) =", x$W1, "\n") + cat(" W2 (sum of ranks, group 2) =", x$W2, "\n") + cat(" U1 (U statistic, group 1) =", x$U1, "\n") + cat(" U2 (U statistic, group 2) =", x$U2, "\n") + cat(" Test statistic U =", x$statistic, "\n") + cat(" Alternative hypothesis:", x$alternative, "\n") + + if (!is.na(x$p_value)) { + cat(" P-value =", round(x$p_value, 6), "\n") + + # Interpretation + if (x$p_value < 0.001) { + significance <- "highly significant (p < 0.001)" + } else if (x$p_value < 0.01) { + significance <- "very significant (p < 0.01)" + } else if (x$p_value < 0.05) { + significance <- "significant (p < 0.05)" + } else { + significance <- "not significant (p >= 0.05)" + } + cat(" Result:", significance, "\n") + } else { + cat(" P-value: Not calculated (exact method required)\n") + } + + # Effect size estimate + cat(" Effect size estimate (r) =", round(abs(x$U1 - x$n1 * x$n2 / 2) / sqrt(x$n1 * x$n2 * (x$n1 + x$n2 + 1) / 12), 3), "\n") +} + +# ============================================================================== +# EXAMPLES WITH BIOMEDICAL DUMMY DATA +# ============================================================================== + +run_biomedical_examples <- function() { + cat("=================================================================\n") + cat("MANN-WHITNEY U TEST - BIOMEDICAL EXAMPLES\n") + cat("=================================================================\n\n") + + # Example 1: Treatment vs Control group - Drug efficacy + cat("EXAMPLE 1: Drug Efficacy Study (Treatment vs Control)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Improvement scores (0-100) + set.seed(123) + treatment_group <- c(78, 85, 92, 73, 88, 91, 76, 83, 89, 95, + 80, 87, 94, 71, 86, 93, 79, 84, 90, 77) + + control_group <- c(65, 58, 71, 62, 69, 54, 67, 60, 73, 56, + 63, 70, 59, 66, 52, 68, 61, 74, 57, 64) + + cat("Treatment group scores (n=20):", treatment_group, "\n") + cat("Control group scores (n=20):", control_group, "\n\n") + + result1 <- mann_whitney_u_test(treatment_group, control_group, + alternative = "greater") + print(result1) + + cat("\nClinical Interpretation:\n") + cat("This test compares the effectiveness of a new drug vs placebo.\n") + cat("H0: No difference between treatment and control groups\n") + cat("H1: Treatment group shows greater improvement than control\n\n") + + # Example 2: Male vs Female biomarker levels + cat("\n=================================================================\n") + cat("EXAMPLE 2: Biomarker Analysis by Gender (Male vs Female)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Protein biomarker levels (ng/mL) + set.seed(456) + male_levels <- c(12.3, 15.7, 11.8, 14.2, 16.1, 13.5, 12.9, 15.3, 14.8, 13.1, + 11.6, 16.4, 12.7, 14.9, 13.8, 15.2, 12.1, 14.6, 13.3, 15.9) + + female_levels <- c(9.8, 10.5, 8.9, 11.2, 9.3, 10.8, 8.7, 10.1, 9.6, 11.5, + 8.4, 10.9, 9.1, 10.3, 8.8, 11.1, 9.4, 10.6, 8.6, 10.2) + + cat("Male biomarker levels (ng/mL):", round(male_levels, 1), "\n") + cat("Female biomarker levels (ng/mL):", round(female_levels, 1), "\n\n") + + result2 <- mann_whitney_u_test(male_levels, female_levels, + alternative = "two.sided") + print(result2) + + cat("\nClinical Interpretation:\n") + cat("This test examines gender differences in biomarker expression.\n") + cat("H0: No difference in biomarker levels between males and females\n") + cat("H1: There is a difference in biomarker levels between genders\n\n") + + # Example 3: Disease severity - Stage I vs Stage II + cat("\n=================================================================\n") + cat("EXAMPLE 3: Disease Severity (Stage I vs Stage II)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Disease severity scores + set.seed(789) + stage_1 <- c(18, 22, 15, 20, 25, 17, 19, 23, 16, 21, 24, 14, 18, 22, 19) + stage_2 <- c(35, 42, 38, 29, 45, 33, 40, 37, 31, 44, 36, 39, 32, 43, 34, 41, 28, 46, 30) + + cat("Stage I severity scores (n=15):", stage_1, "\n") + cat("Stage II severity scores (n=19):", stage_2, "\n\n") + + result3 <- mann_whitney_u_test(stage_1, stage_2, alternative = "less") + print(result3) + + cat("\nClinical Interpretation:\n") + cat("This test compares disease severity between different stages.\n") + cat("H0: No difference in severity between Stage I and Stage II\n") + cat("H1: Stage I has lower severity scores than Stage II\n\n") + + # Example 4: Age groups - Young vs Elderly immune response + cat("\n=================================================================\n") + cat("EXAMPLE 4: Immune Response by Age Group (Young vs Elderly)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Antibody titers (IU/mL) + set.seed(321) + young_adults <- c(450, 520, 380, 490, 560, 420, 480, 510, 440, 500, + 470, 530, 410, 485, 525, 395, 475, 515, 435, 495) + + elderly <- c(280, 320, 250, 310, 290, 270, 300, 260, 330, 285, + 275, 315, 255, 305, 295, 265, 325, 245, 290, 280, 310, 270) + + cat("Young adults antibody titers (IU/mL):", young_adults, "\n") + cat("Elderly antibody titers (IU/mL):", elderly, "\n\n") + + result4 <- mann_whitney_u_test(young_adults, elderly, alternative = "greater") + print(result4) + + cat("\nClinical Interpretation:\n") + cat("This test compares immune response between age groups.\n") + cat("H0: No difference in antibody titers between young and elderly\n") + cat("H1: Young adults have higher antibody titers than elderly\n\n") + + # Example 5: Pre-diabetic vs Diabetic glucose levels + cat("\n=================================================================\n") + cat("EXAMPLE 5: Glucose Levels (Pre-diabetic vs Diabetic)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Fasting glucose levels (mg/dL) + set.seed(654) + prediabetic <- c(108, 115, 102, 118, 125, 110, 112, 120, 105, 117, + 122, 106, 114, 119, 103, 116, 123, 107, 113, 121) + + diabetic <- c(145, 165, 138, 172, 156, 149, 183, 142, 168, 151, + 178, 135, 161, 154, 175, 140, 164, 158, 181, 147, 170, 153) + + cat("Pre-diabetic glucose levels (mg/dL):", prediabetic, "\n") + cat("Diabetic glucose levels (mg/dL):", diabetic, "\n\n") + + result5 <- mann_whitney_u_test(prediabetic, diabetic, alternative = "less") + print(result5) + + cat("\nClinical Interpretation:\n") + cat("This test compares glucose levels between pre-diabetic and diabetic patients.\n") + cat("H0: No difference in glucose levels between groups\n") + cat("H1: Pre-diabetic patients have lower glucose levels than diabetic patients\n\n") + + cat("=================================================================\n") + cat("END OF EXAMPLES\n") + cat("=================================================================\n") +} + +# Helper function to compare with built-in R function (if available) +compare_with_r_builtin <- function() { + cat("\n=================================================================\n") + cat("COMPARISON WITH R's BUILT-IN wilcox.test() FUNCTION\n") + cat("=================================================================\n\n") + + # Generate sample data + set.seed(123) + group1 <- rnorm(15, mean = 50, sd = 10) + group2 <- rnorm(18, mean = 45, sd = 12) + + cat("Comparison using sample data:\n") + cat("Group 1:", round(group1, 2), "\n") + cat("Group 2:", round(group2, 2), "\n\n") + + # Our implementation + cat("Our implementation:\n") + our_result <- mann_whitney_u_test(group1, group2, alternative = "two.sided") + print(our_result) + + # R's built-in function + cat("\nR's built-in wilcox.test():\n") + r_result <- wilcox.test(group1, group2, alternative = "two.sided", exact = FALSE) + print(r_result) + + cat("\nNote: Small differences may occur due to different tie-handling methods\n") + cat("and continuity corrections, but results should be very similar.\n") +} + +# Examples are available but not run automatically to avoid side effects +# To run examples, execute: run_biomedical_examples() +# To compare with R built-in, execute: compare_with_r_builtin() +if (interactive()) { + cat("Loading Mann-Whitney U Test implementation...\n") + cat("Run 'run_biomedical_examples()' to see biomedical examples.\n") + cat("Run 'compare_with_r_builtin()' to compare with R's wilcox.test().\n") +} + +# Uncomment the following lines to run examples automatically: +# run_biomedical_examples() +# compare_with_r_builtin() \ No newline at end of file diff --git a/biomedical/wilcoxon_signed_rank_test.r b/biomedical/wilcoxon_signed_rank_test.r new file mode 100644 index 0000000..7c35954 --- /dev/null +++ b/biomedical/wilcoxon_signed_rank_test.r @@ -0,0 +1,317 @@ +# Wilcoxon Signed-Rank Test for Biomedical Data Analysis +# Author: Contributor +# Description: Implementation of Wilcoxon signed-rank test for paired samples +# Common applications: Before/after treatment comparisons, paired clinical measurements + +wilcoxon_signed_rank_test <- function(x, y = NULL, paired = TRUE, alternative = "two.sided", mu = 0) { + #' Wilcoxon Signed-Rank Test + #' + #' Performs the Wilcoxon signed-rank test on paired samples or single sample against a median + #' + #' @param x numeric vector of data values (first group or single sample) + #' @param y numeric vector of data values (second group for paired test), default NULL + #' @param paired logical, whether the test is for paired samples, default TRUE + #' @param alternative character, alternative hypothesis ("two.sided", "less", "greater") + #' @param mu numeric, hypothesized median for single sample test, default 0 + #' + #' @return list containing: + #' - statistic: test statistic (W) + #' - p_value: p-value of the test + #' - W_plus: sum of positive ranks + #' - W_minus: sum of negative ranks + #' - n: effective sample size (after removing zeros) + #' - method: description of test performed + #' - alternative: alternative hypothesis + #' - data_summary: summary of input data + + # Input validation + if (!is.numeric(x)) { + stop("x must be a numeric vector") + } + + if (!is.null(y) && !is.numeric(y)) { + stop("y must be a numeric vector or NULL") + } + + if (!alternative %in% c("two.sided", "less", "greater")) { + stop("alternative must be 'two.sided', 'less', or 'greater'") + } + + # Remove NA values early + if (is.null(y)) { + # One-sample test against median mu + x <- x[!is.na(x)] + if (length(x) == 0) { + stop("No non-missing observations in x") + } + differences <- x - mu + test_type <- "One-sample Wilcoxon signed-rank test" + } else { + # Paired samples test - remove NA pairs + if (length(x) != length(y)) { + stop("x and y must have the same length for paired test") + } + idx <- complete.cases(x, y) + if (sum(idx) == 0) { + stop("No complete cases found") + } + x <- x[idx] + y <- y[idx] + differences <- x - y + test_type <- "Paired samples Wilcoxon signed-rank test" + } + + # Remove zeros (ties at the hypothesized median) + non_zero_diff <- differences[differences != 0] + n <- length(non_zero_diff) + + if (n == 0) { + stop("No non-zero differences found. All values are tied at the hypothesized median.") + } + + # Calculate ranks of absolute differences + abs_diff <- abs(non_zero_diff) + ranks <- rank(abs_diff, ties.method = "average") + + # Sum of ranks for positive and negative differences + W_plus <- sum(ranks[non_zero_diff > 0]) + W_minus <- sum(ranks[non_zero_diff < 0]) + + # Test statistic selection based on alternative hypothesis + # For one-sided tests, always use W_plus as the test statistic + if (alternative == "greater" || alternative == "less") { + W <- W_plus + } else { + # For two-sided, use the smaller of the two (traditional approach) + W <- min(W_plus, W_minus) + } + + # Calculate p-value using normal approximation for large samples (n >= 10) + if (n >= 10) { + # Mean of W_plus under null hypothesis + mu_W <- n * (n + 1) / 4 + + # Variance with tie correction + tie_counts <- table(abs_diff) + tie_correction <- sum(tie_counts^3 - tie_counts) + var_W <- (n * (n + 1) * (2 * n + 1) - tie_correction) / 24 + sigma_W <- sqrt(var_W) + + # Apply continuity correction and compute p-values + if (alternative == "two.sided") { + z <- (abs(W - mu_W) - 0.5) / sigma_W + p_value <- 2 * pnorm(z, lower.tail = FALSE) + } else if (alternative == "less") { + # For 'less': test if W_plus is unusually small + z <- (W_plus + 0.5 - mu_W) / sigma_W + p_value <- pnorm(z, lower.tail = TRUE) + } else { # greater + # For 'greater': test if W_plus is unusually large + z <- (W_plus - 0.5 - mu_W) / sigma_W + p_value <- pnorm(z, lower.tail = FALSE) + } + } else { + # For small samples, exact p-values would require lookup tables + p_value <- NA + warning("Sample size is small (n < 10). P-value calculation requires exact tables.") + } + + # Prepare data summary + if (is.null(y)) { + data_summary <- list( + sample_size = length(x), + median_x = median(x), + mean_x = mean(x), + hypothesized_median = mu + ) + } else { + data_summary <- list( + sample_size = length(x), + median_x = median(x), + mean_x = mean(x), + median_y = median(y), + mean_y = mean(y), + median_difference = median(differences), + mean_difference = mean(differences) + ) + } + + # Return results + result <- list( + statistic = W, + p_value = p_value, + W_plus = W_plus, + W_minus = W_minus, + n = n, + method = test_type, + alternative = alternative, + data_summary = data_summary + ) + + class(result) <- "wilcoxon_test" + return(result) +} + +# Print method for wilcoxon_test objects +print.wilcoxon_test <- function(x, ...) { + cat("\n", x$method, "\n") + cat("Data summary:\n") + if (!is.null(x$data_summary$median_y)) { + cat(" Group 1 (x): median =", round(x$data_summary$median_x, 3), + ", mean =", round(x$data_summary$mean_x, 3), "\n") + cat(" Group 2 (y): median =", round(x$data_summary$median_y, 3), + ", mean =", round(x$data_summary$mean_y, 3), "\n") + cat(" Differences: median =", round(x$data_summary$median_difference, 3), + ", mean =", round(x$data_summary$mean_difference, 3), "\n") + } else { + cat(" Sample: median =", round(x$data_summary$median_x, 3), + ", mean =", round(x$data_summary$mean_x, 3), "\n") + cat(" Hypothesized median =", x$data_summary$hypothesized_median, "\n") + } + + cat("\nTest results:\n") + cat(" W+ (sum of positive ranks) =", x$W_plus, "\n") + cat(" W- (sum of negative ranks) =", x$W_minus, "\n") + cat(" Test statistic W =", x$statistic, "\n") + cat(" Effective sample size =", x$n, "\n") + cat(" Alternative hypothesis:", x$alternative, "\n") + + if (!is.na(x$p_value)) { + cat(" P-value =", round(x$p_value, 6), "\n") + + # Interpretation + if (x$p_value < 0.001) { + significance <- "highly significant (p < 0.001)" + } else if (x$p_value < 0.01) { + significance <- "very significant (p < 0.01)" + } else if (x$p_value < 0.05) { + significance <- "significant (p < 0.05)" + } else { + significance <- "not significant (p >= 0.05)" + } + cat(" Result:", significance, "\n") + } else { + cat(" P-value: Not calculated (small sample size)\n") + } +} + +# ============================================================================== +# EXAMPLE WITH BIOMEDICAL DUMMY DATA +# ============================================================================== + +run_biomedical_examples <- function() { + cat("=================================================================\n") + cat("WILCOXON SIGNED-RANK TEST - BIOMEDICAL EXAMPLES\n") + cat("=================================================================\n\n") + + # Example 1: Blood pressure before and after treatment + cat("EXAMPLE 1: Blood Pressure Analysis (Before vs After Treatment)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Systolic blood pressure (mmHg) + set.seed(123) # For reproducible results + before_treatment <- c(145, 150, 138, 155, 142, 148, 152, 140, 146, 149, + 143, 151, 147, 153, 141, 144, 156, 139, 145, 150) + + after_treatment <- c(138, 142, 135, 148, 136, 140, 145, 133, 139, 143, + 137, 144, 141, 146, 134, 138, 149, 132, 140, 144) + + cat("Before treatment (n=20):", before_treatment, "\n") + cat("After treatment (n=20):", after_treatment, "\n\n") + + # Perform the test + result1 <- wilcoxon_signed_rank_test(before_treatment, after_treatment, + alternative = "greater") + print(result1) + + cat("\nClinical Interpretation:\n") + cat("This test examines whether the treatment significantly reduces blood pressure.\n") + cat("H0: The treatment has no effect (median difference = 0)\n") + cat("H1: The treatment reduces blood pressure (before > after)\n\n") + + # Example 2: Pain scores before and after medication + cat("\n=================================================================\n") + cat("EXAMPLE 2: Pain Score Analysis (Before vs After Medication)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Pain scores on a scale of 0-10 + set.seed(456) + pain_before <- c(8, 7, 9, 6, 8, 7, 9, 8, 7, 6, 8, 9, 7, 8, 6, 9, 7, 8, 6, 7) + pain_after <- c(4, 3, 5, 2, 4, 3, 5, 4, 3, 2, 4, 5, 3, 4, 2, 5, 3, 4, 2, 3) + + cat("Pain before medication (0-10 scale):", pain_before, "\n") + cat("Pain after medication (0-10 scale):", pain_after, "\n\n") + + result2 <- wilcoxon_signed_rank_test(pain_before, pain_after, + alternative = "greater") + print(result2) + + cat("\nClinical Interpretation:\n") + cat("This test evaluates the effectiveness of pain medication.\n") + cat("H0: The medication has no effect on pain levels\n") + cat("H1: The medication reduces pain levels (before > after)\n\n") + + # Example 3: Weight loss study + cat("\n=================================================================\n") + cat("EXAMPLE 3: Weight Loss Study (Before vs After Diet Program)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Weight in kg + set.seed(789) + weight_before <- c(85.2, 92.1, 78.5, 88.9, 95.3, 81.7, 86.4, 90.8, 83.2, 87.6, + 91.3, 84.1, 89.7, 82.5, 94.2, 80.9, 88.1, 86.7, 93.4, 85.8) + + weight_after <- c(82.1, 89.3, 76.2, 85.7, 91.8, 79.4, 83.9, 87.5, 80.8, 84.3, + 88.1, 81.6, 86.4, 80.1, 90.7, 78.5, 85.2, 83.9, 89.8, 82.9) + + cat("Weight before program (kg):", round(weight_before, 1), "\n") + cat("Weight after program (kg):", round(weight_after, 1), "\n\n") + + result3 <- wilcoxon_signed_rank_test(weight_before, weight_after, + alternative = "greater") + print(result3) + + cat("\nClinical Interpretation:\n") + cat("This test assesses the effectiveness of a weight loss program.\n") + cat("H0: The program has no effect on weight\n") + cat("H1: The program reduces weight (before > after)\n\n") + + # Example 4: One-sample test - cholesterol levels vs normal + cat("\n=================================================================\n") + cat("EXAMPLE 4: Cholesterol Levels vs Normal Range (One-sample test)\n") + cat("-----------------------------------------------------------------\n") + + # Dummy data: Total cholesterol levels (mg/dL) + # Normal level is considered to be 200 mg/dL + set.seed(321) + cholesterol_levels <- c(215, 230, 195, 220, 240, 185, 225, 210, 235, 205, + 245, 190, 228, 218, 232, 198, 242, 208, 226, 214) + + normal_cholesterol <- 200 # mg/dL + + cat("Cholesterol levels (mg/dL):", cholesterol_levels, "\n") + cat("Normal level (reference):", normal_cholesterol, "mg/dL\n\n") + + result4 <- wilcoxon_signed_rank_test(cholesterol_levels, mu = normal_cholesterol, + alternative = "greater") + print(result4) + + cat("\nClinical Interpretation:\n") + cat("This test determines if the population has elevated cholesterol levels.\n") + cat("H0: Median cholesterol level = 200 mg/dL (normal)\n") + cat("H1: Median cholesterol level > 200 mg/dL (elevated)\n\n") + + cat("=================================================================\n") + cat("END OF EXAMPLES\n") + cat("=================================================================\n") +} + +# Examples are available but not run automatically to avoid side effects +# To run examples, execute: run_biomedical_examples() +if (interactive()) { + cat("Loading Wilcoxon Signed-Rank Test implementation...\n") + cat("Run 'run_biomedical_examples()' to see biomedical examples.\n") +} + +# Uncomment the following line to run examples automatically: +# run_biomedical_examples() \ No newline at end of file