Hi!
I just read through Regression Models with Ordered Categorical Outcomes — PyMC example gallery and am thinking about applying such a model to my data (currently using logistic regression because the response can be interpreted in a binary way).
Is my understanding right that by default a logistic distribution of the latent continuous response variable Z is assumed? In my data there are four categories and categories 1 and 4 have more than twice as much data as categories 2 and 3. I’m wondering how to best deal with this situation and if a logit or probit link is appropriate here?
Thanks!
The “amount of data” shouldn’t matter?
The shape of the latent distributions has pretty subtle/minor effects, it’s usually just chosen for computational convenience.
I like this resource: Ordinal Regression
1 Like
Thanks both! Hopefully these will clear up any misconceptions I seem to have.
I think this section from Ordinal Regression explains it:
This construction holds for any probability distribution over X. Differences in the shape of the distribution can be compensated by reconfiguring the interior cut points to achieve the desired ordinal probabilities. Consequently we have the luxury of selecting a probability distribution based on computational convenience, in particular the expense of the cumulative distribution function and the ultimate cost of computing the interval probabilities.
Thanks again!
1 Like
I guess the shape matter a bit in relation to the cutpoints prior, but would be weird to try to think of an exotic distribution to fit with a specific prior on the cutpoints than the other way around.
The continuous response variable is typically marginalized out rather than realized explicitly. The ordinal logit distribution then takes on the character of differences of logistic cdfs based on the cutpoints. The reason this is done for Hamiltonian Monte Carlo is that the result remains continuously differentiable. If you include the latent variable, then you get discontinuities in derivatives that are not good for HMC sampling (the feedback essentially gets “cut”).
You can replace the logistic cdf with another cdf to get a different latent distribution assumption. The reason logit is popular is that the cdf has a simple explicit form—the inverse logit function 1 / (1 + \exp(-x)). If you move to ordinal probit, this gets replaced with \Phi(x), the standard normal
cdf, which is much harder to compute because there’s no explicit form.
2 Likes