Skip to main content
Cornell University
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
arxiv logo > stat.AP

Help | Advanced Search

arXiv logo
Cornell University Logo

quick links

  • Login
  • Help Pages
  • About

Applications

  • New submissions
  • Cross-lists
  • Replacements

See recent articles

Showing new listings for Tuesday, 30 December 2025

Total of 16 entries
Showing up to 2000 entries per page: fewer | more | all

New submissions (showing 3 of 3 entries)

[1] arXiv:2512.22515 [pdf, other]
Title: Robust Liu-Type Estimation for Multicollinearity in Fuzzy Logistic Regression
Ayad Habib Shemail, Ahmed Razzaq Al-Lami, Amal Hadi Rashid
Subjects: Applications (stat.AP); Methodology (stat.ME)

This article addresses the fuzzy logistic regression model under conditions of multicollinearity, which causes instability and inflated variance in parameter estimation. In this model, both the response variable and parameters are represented as fuzzy triangular numbers. To overcome the multicollinearity problem, various Liu-type estimators were employed: Fuzzy Maximum Likelihood Estimators (FMLE), Fuzzy Logistic Ridge Estimators (FLRE), Fuzzy Logistic Liu Estimators (FLLE), Fuzzy Logistic Liu-type Estimators (FLLTE), and Fuzzy Logistic Liu-type Parameter Estimators (FLLTPE). Through simulations with various sample sizes and application to real fuzzy data on kidney failure, model performance was evaluated using mean square error (MSE) and goodness of fit criteria. Results demonstrated superior performance of FLLTPE and FLLTE compared to other estimators.

[2] arXiv:2512.22892 [pdf, other]
Title: Counterfactual Harm: A Counter-argument
Amit N. Sawant, Mats J. Stensrud
Subjects: Applications (stat.AP)

As AI systems are increasingly used to guide decisions, it is essential that they follow ethical principles. A core principle in medicine is non-maleficence, often equated with ``do no harm''. A formal definition of harm based on counterfactual reasoning has been proposed and popularized. This notion of harm has been promoted in simple settings with binary treatments and outcomes. Here, we highlight a problem with this definition in settings involving multiple treatment options. Illustrated by an example with three tuberculosis treatments (say, A, B, and C), we demonstrate that the counterfactual definition of harm can produce intransitive results: B is less harmful than A, C is less harmful than B, yet C is more harmful than A when compared pairwise. This intransitivity poses a challenge as it may lead to practical (clinical) decisions that are difficult to justify or defend. In contrast, an interventionist definition of harm based on expected utility forgoes counterfactual comparisons and ensures transitive treatment rankings.

[3] arXiv:2512.23019 [pdf, other]
Title: Reliability Analysis of a 1-out-of-n Cold Standby Redundant System under the Generalized Lindley Distribution
Afshin Yaghoubi, Esmaile Khorram, Omid Naghshineh Arjmand
Subjects: Applications (stat.AP)

Cold standby 1-out-of-n redundant systems are well-established models in system reliability engineering. To date, reliability analyses of such systems have predominantly assumed exponential, Erlang, or Weibull failure distributions for their components. The Lindley distribution and its generalizations represent a significant class of statistical distributions in reliability engineering. Certain generalized Lindley distributions, due to the appealing characteristics of their hazard functions, can serve as suitable alternatives to other well-known lifetime distributions like the Weibull. This study investigates the reliability of a 1-out-of-n cold standby redundant system with perfect and imperfect switching, assuming that the active component failure times follow the Generalized Lindley distribution. We derive a closed-form expression for the system reliability. To achieve this, the distribution of the sum of n independent and identically distributed random variables following the Generalized Lindley distribution is first determined using the moment-generating function approach.

Cross submissions (showing 2 of 2 entries)

[4] arXiv:2512.23110 (cross-list from econ.TH) [pdf, html, other]
Title: Assessing the Effects of Macroeconomic Variables on Child Mortality in D-8 Countries Using Panel Data Analysis
M. Waseem Akram, Binita Shahi, M. Javed Akram
Comments: 13 pages, 3 Figures, 4 tables
Subjects: Theoretical Economics (econ.TH); Applications (stat.AP); Other Statistics (stat.OT)

This research analyses the axiomatic link among health expenditures, inflation rate, and gross national income (GNI) per capita concerning the child mortality (CMU5) rate in D-8 nations, employing panel data analysis from 1995 to 2014. Utilising conventional panel unit root tests and linear regression models, we establish that education expenditures, in conjunction with health expenditures, inflation rate, and GNI per capita, display stationarity at level. Additionally, we examine fixed effects and random effects estimators for the pertinent variables, utilising metrics such as the Hausman Test (HT) and comparisons with CCMR correlations. Our data demonstrate that the CMU5 rate in D-8 nations has steadily decreased, according to a somewhat negative linear regression model, therefore slightly undermining the fourth Millennium Development Goal (MDG4) of the World Health Organisation (WHO).

[5] arXiv:2512.23571 (cross-list from stat.ME) [pdf, html, other]
Title: Considering parallel tempering and comparing post-treatment procedures in Bayesian Profile Regression Models for a survival outcome and correlated exposures
Fendler Julie, Guihenneuc Chantal, Ancelet Sophie
Subjects: Methodology (stat.ME); Applications (stat.AP)

Bayesian profile regression mixture models (BPRM) allow to assess a health risk in a multi-exposed population. These mixture models cluster individuals according to their exposure profile and their health risk. However, their results, based on Monte-Carlo Markov Chain (MCMC) algorithms, turned out to be unstable in different application cases. We suppose two reasons for this instability. The MCMC algorithm can be trapped in local modes of the posterior distribution and the choice of post-treatment procedures used on the output of the MCMC algorithm leads to different clustering structures. In this work, we propose improvements of the MCMC algorithms proposed in previous works in order to avoid the local modes of the posterior distribution while reducing the computation time. We also carry out a simulation study to compare the performances of the MCMC algorithms and different post-processing in order to provide guidelines on their use. An application in radiation epidemiology is considered.

Replacement submissions (showing 11 of 11 entries)

[6] arXiv:2410.16617 (replaced) [pdf, html, other]
Title: Markov switching zero-inflated space-time multinomial models for comparing multiple infectious diseases
Dirk Douwes-Schultz, Alexandra M. Schmidt, Laís Picinini Freitas, Marilia Sá Carvalho
Comments: The accepted version in Biostatistics
Journal-ref: Markov switching zero-inflated space-time multinomial models for comparing multiple infectious diseases, Biostatistics, Volume 26, Issue 1, 2025, kxaf034
Subjects: Applications (stat.AP)

Univariate zero-inflated models are increasingly being used to account for excess zeros in spatio-temporal infectious disease counts. However, the multivariate case is challenging due to the need to account for correlations across space, time and disease in both the count and zero-inflated components of the model. We are interested in comparing the transmission dynamics of several co-circulating infectious diseases across space and time, where some of the diseases can be absent for long periods. We first assume there is a baseline disease that is well-established and always present in the region. The other diseases switch between periods of presence and absence in each area through a series of coupled Markov chains, which account for long periods of disease absence, disease interactions and disease spread from neighboring areas. Since we are mainly interested in comparing the diseases, we assume the cases of the present diseases in an area jointly follow an autoregressive multinomial model. We use the multinomial model to investigate whether there are associations between certain factors, such as temperature, and differences in the transmission intensity of the diseases. Inference is performed using efficient Bayesian Markov chain Monte Carlo methods based on jointly sampling all unknown presence indicators. We apply the model to spatio-temporal counts of dengue, Zika, and chikungunya cases in Rio de Janeiro, during the first triple epidemic there.

[7] arXiv:2506.08731 (replaced) [pdf, html, other]
Title: Unveiling the Impact of Social and Environmental Determinants of Health on Lung Function Decline in Cystic Fibrosis through Data Integration using the US Registry
Eleni-Rosalina Andrinopoulou, Emrah Gecili, Rhonda D Szczesniak
Subjects: Applications (stat.AP)

Integrating diverse data sources offers a comprehensive view of patient health and holds potential for improving clinical decision-making. In Cystic Fibrosis (CF), which is a genetic disorder primarily affecting the lungs, biomarkers that track lung function decline such as FEV1 serve as important predictors for assessing disease progression. Prior research has shown that incorporating social and environmental determinants of health improves prognostic accuracy. To investigate the lung function decline among individuals with CF, we integrate data from the U.S. Cystic Fibrosis Foundation Patient Registry with social and environmental health information. Our analysis focuses on the relationship between lung function and the deprivation index, a composite measure of socioeconomic status. We used advanced multivariate mixed-effects models, which allow for the joint modelling of multiple longitudinal outcomes with flexible functional forms. This methodology provides an understanding of interrelationships among outcomes, addressing the complexities of dynamic health data. We examine whether this relationship varies with patients' exposure duration to high-deprivation areas, analyzing data across time and within individual US states. Results show a strong relation between lung function and the area under the deprivation index curve across all states. These results underscore the importance of integrating social and environmental determinants of health into clinical models of disease progression. By accounting for broader contextual factors, healthcare providers can gain deeper insights into disease trajectories and design more targeted intervention strategies.

[8] arXiv:2510.13927 (replaced) [pdf, html, other]
Title: Long-Term Spatio-Temporal Forecasting of Monthly Rainfall in West Bengal Using Ensemble Learning Approaches
Jishu Adhikary, Raju Maiti
Comments: 25 pages, 22 figures
Subjects: Applications (stat.AP); Machine Learning (cs.LG)

Rainfall forecasting plays a critical role in climate adaptation, agriculture, and water resource management. This study develops long-term forecasts of monthly rainfall across 19 districts of West Bengal using a century-scale dataset spanning 1900-2019. Daily rainfall records are aggregated into monthly series, resulting in 120 years of observations for each district. The forecasting task involves predicting the next 108 months (9 years, 2011-2019) while accounting for temporal dependencies and spatial interactions among districts. To address the nonlinear and complex structure of rainfall dynamics, we propose a hierarchical modeling framework that combines regression-based forecasting of yearly features with multi-layer perceptrons (MLPs) for monthly prediction. Yearly features, such as annual totals, quarterly proportions, variability measures, skewness, and extremes, are first forecasted using regression models that incorporate both own lags and neighboring-district lags. These forecasts are then integrated as auxiliary inputs into an MLP model, which captures nonlinear temporal patterns and spatial dependencies in the monthly series. The results demonstrate that the hierarchical regression-MLP architecture provides robust long-term spatio-temporal forecasts, offering valuable insights for agriculture, irrigation planning, and water conservation strategies.

[9] arXiv:2512.17119 (replaced) [pdf, html, other]
Title: Uncovering latent territorial structure in ICFES Saber 11 performance with Bayesian multilevel spatial models
Laura Pardo, Juan Sosa, Juan Pablo Torres-Clavijo, Andrés Felipe Arévalo-Arévalo
Comments: 42 pages, 14 tables, 12 figures
Subjects: Applications (stat.AP); Methodology (stat.ME)

This article develops a Bayesian hierarchical framework to analyze academic performance in the 2022 second semester Saber 11 examination in Colombia. Our approach combines multilevel regression with municipal and departmental spatial random effects, and it incorporates Ridge and Lasso regularization priors to compare the contribution of sociodemographic covariates. Inference is implemented in a fully open source workflow using Markov chain Monte Carlo methods, and model behavior is assessed through synthetic data that mirror key features of the observed data. Simulation results indicate that Ridge provides the most balanced performance in parameter recovery, predictive accuracy, and sampling efficiency, while Lasso shows weaker fit and posterior stability, with gains in predictive accuracy under stronger multicollinearity. In the application, posterior rankings show a strong centralization of performance, with higher scores in central departments and lower scores in peripheral territories, and the strongest correlates of scores are student level living conditions, maternal education, access to educational resources, gender, and ethnic background, while spatial random effects capture residual regional disparities. A hybrid Bayesian segmentation based on K means propagates posterior uncertainty into clustering at departmental, municipal, and spatial scales, revealing multiscale territorial patterns consistent with structural inequalities and informing territorial targeting in education policy.

[10] arXiv:2105.03508 (replaced) [pdf, html, other]
Title: Cross-Population Amplitude Coupling in High-Dimensional Oscillatory Neural Time Series
Heejong Bong, Valérie Ventura, Eric A. Yttri, Matthew A. Smith, Robert E. Kass
Subjects: Methodology (stat.ME); Neurons and Cognition (q-bio.NC); Applications (stat.AP)

Neural oscillations have long been considered important markers of interaction across brain regions, yet identifying coordinated oscillatory activity from high-dimensional multiple-electrode recordings remains challenging. We sought to quantify time-varying covariation of oscillatory amplitudes across two brain regions, during a memory task, based on local field potentials recorded from 96 electrodes in each region. We extended Canonical Correlation Analysis (CCA) to multiple time series through the cross-correlation of latent time series. This, however, introduces a large number of possible lead-lag cross-correlations across the two regions. To manage that high dimensionality we developed rigorous statistical procedures aimed at finding a small number of dominant lead-lag effects. The method correctly identified ground truth structure in realistic simulation-based settings. When we used it to analyze local field potentials recorded from prefrontal cortex and visual area V4 we obtained highly plausible results. The new statistical methodology could also be applied to other slowly-varying high-dimensional time series.

[11] arXiv:2503.20100 (replaced) [pdf, html, other]
Title: EASI Drugs in the Streets of Colombia: Modeling Heterogeneous and Endogenous Drug Preferences
Santiago Montoya-Blandón, Andrés Ramírez-Hassan
Subjects: Econometrics (econ.EM); Applications (stat.AP)

The response of illicit drug consumers to policy changes like legalization is mediated by demand behavior. Since individual drug use is driven by many unobservable factors, accounting for unobserved heterogeneity becomes crucial for designing targeted policies. This paper introduces a finite Gaussian mixture of EASI demand systems to estimate joint demand for marijuana, cocaine, and basuco (a low-purity cocaine paste) in Colombia, accounting for corner solutions and endogenous prices. Our method classifies users into two groups with distinct preferences over consumption: "soft" and "hard" users. Nationally representative survey estimates find drugs are unit-elastic, with marijuana and cocaine complementary. International marijuana legalization episodes along with Colombia's low marijuana production cost suggest legalization is likely to drive prices down significantly. Legalization counterfactuals under the most likely scenario of a 50\% marijuana price decrease reveal \$363/year welfare gains for consumers, \$120M in governement revenue, and \$127M dealer losses.

[12] arXiv:2505.06564 (replaced) [pdf, html, other]
Title: The Malaysian Election Corpus (MECo): Federal and State-Level Election Results from 1955 to 2025
Thevesh Thevananthan
Comments: 25 pages, 6 figures, 3 tables
Subjects: Computers and Society (cs.CY); Applications (stat.AP)

Empirical research and public knowledge on Malaysia's elections have long been constrained by a lack of high-quality open data, particularly in the absence of a Freedom of Information framework. This paper introduces the Malaysian Election Corpus (MECo), an open-access panel database covering all federal and state general elections since 1955, as well as by-elections since 2008. MECo includes candidate- and constituency-level data for 9,704 electoral contests across seven decades, standardised with unique identifiers for candidates, parties, and coalitions. The database also provides summary statistics for each contest (electorate size, voter turnout, majority size, rejected ballots, unreturned ballots), key demographic data for candidates (age, gender, ethnicity), and lineage data for political parties. MECo is the most well-curated open database on Malaysian elections to date, and will unlock new opportunities for research, data journalism, and civic engagement.

[13] arXiv:2505.13106 (replaced) [pdf, other]
Title: How to optimise tournament draws: The case of the FIFA World Cup
László Csató
Comments: 32 pages, 8 figures, 6 tables
Subjects: Optimization and Control (math.OC); Physics and Society (physics.soc-ph); Applications (stat.AP)

The organisers of major sports competitions use different policies with respect to constraints in the group draw. Our paper aims to rationalise these choices by analysing the trade-off between attractiveness (the number of games played by teams from the same geographic zone) and fairness (the departure of the draw mechanism from a uniform distribution). A parametric optimisation model is formulated and applied to the 2018 and 2022 FIFA World Cup draws. A flaw of the draw procedure is identified: the pre-assignment of the host to a group unnecessarily increases the distortions. All Pareto efficient sets of draw constraints are determined via simulations. The proposed framework can be used to find the optimal draw rules and justify the non-uniformity of the draw procedure for the stakeholders.

[14] arXiv:2508.07403 (replaced) [pdf, html, other]
Title: Is Repeated Bayesian Interim Analysis Consequence-Free?
Suyu Liu, Beibei Guo, Laura Thompson, Lei Nie, Ying Yuan
Subjects: Methodology (stat.ME); Applications (stat.AP)

Interim analyses are vital in clinical trials for early decision-making. While frequentist implications are well-established, the consequences of repeated Bayesian interim monitoring for efficacy, specifically regarding multiplicity, remain contentious. This article provides theoretical justification and numerical evidence evaluating the impact of such designs on bias, mean squared error (MSE), credible interval coverage, false discovery rate (FDR), and average Type I error (ATIE). Our findings show that when the inferential prior matches the data-generating prior, sequential efficacy stopping does not bias the posterior mean or degrade credible interval coverage. However, even under this ``matched" condition, the FDR, ATIE, and MSE are significantly altered. In the more practically relevant scenario where the inferential and data-generating priors differ, all aforementioned operating characteristics, including estimation bias and coverage, are substantially impacted. These results reconcile long-standing conflicting arguments regarding Bayesian multiplicity. We demonstrate that while some Bayesian properties are invariant to sequential looks, others are not. Our work underscores the necessity of thoughtful prior specification and comprehensive evaluation of frequentist-Bayesian operating characteristics to ensure reliable inference in adaptive trial designs.

[15] arXiv:2511.04130 (replaced) [pdf, html, other]
Title: Multiple Testing of Partial Conjunction Hypotheses for Assessing Replicability Across Dependent Studies
Monitirtha Dey, Trambak Banerjee, Prajamitra Bhuyan, Arunabha Majumdar
Comments: Minor updates to the paper structure and title
Subjects: Methodology (stat.ME); Applications (stat.AP)

Replicability is central to scientific progress, and the partial conjunction (PC) hypothesis testing framework provides an objective tool to quantify it across disciplines. Existing PC methods assume independent studies. Yet many modern applications, such as genome-wide association studies (GWAS) with sample overlap, violate this assumption, leading to dependence among study-specific summary statistics. Failure to account for this dependence can drastically inflate type I errors when combining inferences. We propose e-Filter, a powerful procedure grounded on the theory of e-values. It involves a filtering step that retains a set of the most promising PC hypotheses, and a selection step where PC hypotheses from the filtering step are marked as discoveries whenever their e-values exceed a selection threshold. We establish the validity of e-Filter for FWER and FDR control under unknown study dependence. A comprehensive simulation study demonstrates its excellent power gains over competing methods. We apply e-Filter to a GWAS replicability study to identify consistent genetic signals for low-density lipoprotein cholesterol (LDL-C). Here, the participating studies exhibit varying levels of sample overlap, rendering existing methods unsuitable for combining inferences. A subsequent pathway enrichment analysis shows that e-Filter replicated signals achieve stronger statistical enrichment on biologically relevant LDL-C pathways than competing approaches.

[16] arXiv:2512.20460 (replaced) [pdf, html, other]
Title: The Aligned Economic Index & The State Switching Model
Ilias Aarab
Journal-ref: Financieel Forum Bank en Financiewezen 2020 3 pp 252-261
Subjects: Statistical Finance (q-fin.ST); Machine Learning (cs.LG); Econometrics (econ.EM); Portfolio Management (q-fin.PM); Applications (stat.AP)

A growing empirical literature suggests that equity-premium predictability is state dependent, with much of the forecasting power concentrated around recessionary periods (Henkel et al., 2011; Dangl and Halling, 2012; Devpura et al., 2018). I study U.S. stock return predictability across economic regimes and document strong evidence of time-varying expected returns across both expansionary and contractionary states. I contribute in two ways. First, I introduce a state-switching predictive regression in which the market state is defined in real time using the slope of the yield curve. Relative to the standard one-state predictive regression, the state-switching specification increases both in-sample and out-of-sample performance for the set of popular predictors considered by Welch and Goyal (2008), improving the out-of-sample performance of most predictors in economically meaningful ways. Second, I propose a new aggregate predictor, the Aligned Economic Index, constructed via partial least squares (PLS). Under the state-switching model, the Aligned Economic Index exhibits statistically and economically significant predictive power in sample and out of sample, and it outperforms widely used benchmark predictors and alternative predictor-combination methods.

Total of 16 entries
Showing up to 2000 entries per page: fewer | more | all
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status