-
-
Notifications
You must be signed in to change notification settings - Fork 339
Add biomedical statistical tests: Wilcoxon and Mann-Whitney U tests #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Brijesh03032001
wants to merge
1
commit into
TheAlgorithms:master
Choose a base branch
from
Brijesh03032001:add-biomedical-statistical-tests
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
# R for Biomedical Data Analysis | ||
|
||
This directory contains R implementations of essential statistical tests commonly used in biomedical research and clinical studies. These non-parametric tests are fundamental tools for biomedical students and researchers working with medical data. | ||
|
||
## ๐ Contents | ||
|
||
- [`wilcoxon_signed_rank_test.r`](wilcoxon_signed_rank_test.r) - Wilcoxon Signed-Rank Test for paired samples | ||
- [`mann_whitney_u_test.r`](mann_whitney_u_test.r) - Mann-Whitney U Test for independent samples | ||
- [`README.md`](README.md) - This documentation file | ||
|
||
## ๐ฅ Why These Tests Are Essential for Biomedical Students | ||
|
||
### 1. **Real-World Medical Data Challenges** | ||
Medical data rarely follows normal distributions due to: | ||
- **Skewed distributions**: Lab values, reaction times, survival data | ||
- **Outliers**: Extreme measurements that are medically significant | ||
- **Small sample sizes**: Pilot studies, rare diseases, expensive procedures | ||
- **Ordinal scales**: Pain scores, quality of life indices, severity ratings | ||
|
||
### 2. **Robust and Assumption-Free** | ||
Non-parametric tests like Wilcoxon and Mann-Whitney: | ||
- **No normality assumption**: Work with any distribution shape | ||
- **Outlier resistant**: Not affected by extreme values | ||
- **Rank-based**: Focus on relative ordering rather than exact values | ||
- **Clinically meaningful**: Often more relevant for medical decisions | ||
|
||
### 3. **Common Applications in Medicine** | ||
- **Clinical trials**: Comparing treatment effectiveness | ||
- **Before/after studies**: Evaluating intervention outcomes | ||
- **Biomarker research**: Comparing levels between groups | ||
- **Quality of life studies**: Analyzing patient-reported outcomes | ||
- **Diagnostic accuracy**: Comparing test performance | ||
|
||
## ๐ Statistical Tests Overview | ||
|
||
### Wilcoxon Signed-Rank Test | ||
**Purpose**: Compare paired samples or test single sample against a median | ||
|
||
**When to use**: | ||
- Before/after treatment comparisons | ||
- Matched pairs (e.g., twins, matched controls) | ||
- Repeated measurements on same subjects | ||
- Testing if median differs from a specific value | ||
|
||
**Medical Examples**: | ||
- Blood pressure before vs after medication | ||
- Pain scores before vs after treatment | ||
- Weight loss in diet studies | ||
- Biomarker changes over time | ||
|
||
**Assumptions**: | ||
- Paired observations or single sample | ||
- Data can be ranked | ||
- Differences are symmetrically distributed around the median | ||
|
||
### Mann-Whitney U Test (Wilcoxon Rank-Sum Test) | ||
**Purpose**: Compare two independent groups | ||
|
||
**When to use**: | ||
- Comparing treatment vs control groups | ||
- Comparing different populations | ||
- When groups are independent (not paired) | ||
- Alternative to two-sample t-test | ||
|
||
**Medical Examples**: | ||
- Drug efficacy: treatment vs placebo | ||
- Gender differences in biomarkers | ||
- Disease severity between stages | ||
- Age-related immune responses | ||
- Comparing diagnostic methods | ||
|
||
**Assumptions**: | ||
- Two independent samples | ||
- Data can be ranked | ||
- Similar distribution shapes (for location comparison) | ||
|
||
## ๐ Quick Start Guide | ||
|
||
### Installation | ||
No additional packages required! Just R base installation. | ||
|
||
```r | ||
# Clone or download the files to your R working directory | ||
# Source the functions | ||
source("wilcoxon_signed_rank_test.r") | ||
source("mann_whitney_u_test.r") | ||
``` | ||
|
||
### Basic Usage | ||
|
||
#### Wilcoxon Signed-Rank Test | ||
```r | ||
# Example: Blood pressure before and after treatment | ||
before <- c(145, 150, 138, 155, 142, 148, 152, 140) | ||
after <- c(138, 142, 135, 148, 136, 140, 145, 133) | ||
|
||
# Perform the test | ||
result <- wilcoxon_signed_rank_test(before, after, alternative = "greater") | ||
print(result) | ||
``` | ||
|
||
#### Mann-Whitney U Test | ||
```r | ||
# Example: Comparing treatment vs control groups | ||
treatment <- c(78, 85, 92, 73, 88, 91, 76, 83) | ||
control <- c(65, 58, 71, 62, 69, 54, 67, 60) | ||
|
||
# Perform the test | ||
result <- mann_whitney_u_test(treatment, control, alternative = "greater") | ||
print(result) | ||
``` | ||
|
||
## ๐ฌ Detailed Examples with Biomedical Data | ||
|
||
### Running the Examples | ||
Each R file contains comprehensive examples with dummy biomedical data: | ||
|
||
```r | ||
# Run Wilcoxon examples | ||
source("wilcoxon_signed_rank_test.r") | ||
run_biomedical_examples() | ||
|
||
# Run Mann-Whitney examples | ||
source("mann_whitney_u_test.r") | ||
run_biomedical_examples() | ||
``` | ||
|
||
## ๐ Understanding the Results | ||
|
||
### Key Output Elements | ||
|
||
#### For Wilcoxon Signed-Rank Test: | ||
- **W+**: Sum of ranks for positive differences | ||
- **W-**: Sum of ranks for negative differences | ||
- **Test statistic W**: Usually min(W+, W-) | ||
- **P-value**: Probability of observing the result by chance | ||
- **Effect size**: Magnitude of the difference | ||
|
||
#### For Mann-Whitney U Test: | ||
- **U1, U2**: U statistics for each group | ||
- **W1, W2**: Sum of ranks for each group | ||
- **Test statistic U**: Depends on alternative hypothesis | ||
- **P-value**: Probability of observing the result by chance | ||
- **Effect size estimate**: Magnitude of group difference | ||
|
||
### Interpreting P-values in Medical Context | ||
- **p < 0.001**: Highly significant - very strong evidence | ||
- **p < 0.01**: Very significant - strong evidence | ||
- **p < 0.05**: Significant - moderate evidence | ||
- **p โฅ 0.05**: Not significant - insufficient evidence | ||
|
||
**Important**: Always consider clinical significance alongside statistical significance! | ||
|
||
## ๐ฏ Choosing the Right Test | ||
|
||
| Scenario | Test | Example | | ||
|----------|------|---------| | ||
| Same subjects, before/after | Wilcoxon Signed-Rank | Pre/post treatment blood pressure | | ||
| Paired subjects | Wilcoxon Signed-Rank | Twins, matched controls | | ||
| Single sample vs reference | Wilcoxon Signed-Rank | Patient cholesterol vs normal (200) | | ||
| Two independent groups | Mann-Whitney U | Treatment vs control groups | | ||
| Gender/age comparisons | Mann-Whitney U | Male vs female biomarker levels | | ||
| Disease stage comparison | Mann-Whitney U | Stage I vs Stage II severity | | ||
|
||
## ๐งฎ Statistical Theory (Simplified) | ||
|
||
### Wilcoxon Signed-Rank Test | ||
1. **Calculate differences** between paired observations | ||
2. **Rank the absolute differences** (ignore zeros) | ||
3. **Sum ranks** for positive and negative differences separately | ||
4. **Test statistic** is typically the smaller sum | ||
5. **Compare** to expected distribution under null hypothesis | ||
|
||
### Mann-Whitney U Test | ||
1. **Combine all observations** from both groups | ||
2. **Rank all values** from smallest to largest | ||
3. **Sum ranks** for each group separately | ||
4. **Calculate U statistics** using rank sums | ||
5. **Compare** to expected distribution under null hypothesis | ||
|
||
## ๐ง Advanced Features | ||
|
||
### Alternative Hypotheses | ||
- **"two.sided"**: Groups/conditions are different (default) | ||
- **"greater"**: First group/condition is greater than second | ||
- **"less"**: First group/condition is less than second | ||
|
||
### Handling Ties | ||
Both implementations use average ranks for tied values, which is the standard approach. | ||
|
||
### Effect Size | ||
- **Wilcoxon**: Based on Z-score and sample size | ||
- **Mann-Whitney**: Based on U statistic relative to maximum possible | ||
|
||
## ๐ Further Reading | ||
|
||
### Recommended Resources for Biomedical Students | ||
1. **"Biostatistics: A Foundation for Analysis in the Health Sciences"** - Wayne Daniel | ||
2. **"Statistical Methods in Medical Research"** - Armitage, Berry & Matthews | ||
3. **"Nonparametric Statistical Methods"** - Hollander, Wolfe & Chicken | ||
4. **"Medical Statistics from Scratch"** - David Bowers | ||
|
||
### Online Resources | ||
- [Laerd Statistics](https://statistics.laerd.com/) - Excellent step-by-step guides | ||
- [StatsDirect](https://www.statsdirect.com/) - Comprehensive statistical reference | ||
- [BMJ Statistics Notes](https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one) - Medical statistics primer | ||
|
||
## โ ๏ธ Important Considerations | ||
|
||
### When NOT to Use These Tests | ||
- **Large samples with normal data**: t-tests might be more powerful | ||
- **Survival data**: Use specialized survival analysis methods | ||
- **Repeated measures**: Consider mixed-effects models | ||
- **Multiple comparisons**: Adjust p-values appropriately | ||
|
||
### Common Pitfalls | ||
1. **Multiple testing**: Correct for multiple comparisons | ||
2. **Effect size**: Don't ignore practical significance | ||
3. **Sample size**: Very small samples need exact methods | ||
4. **Assumptions**: Ensure data can be meaningfully ranked | ||
|
||
### Data Quality Checks | ||
- Check for outliers and data entry errors | ||
- Verify assumptions are met | ||
- Consider the clinical context | ||
- Validate results with domain experts |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per repository guidelines, please also update DIRECTORY.md to list these new algorithms in the appropriate section and ensure they are categorized consistently (e.g., statistics/mathematics).
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update DIRECTORY.md instead of adding this file