|
983 | 983 | "cell_type": "markdown", |
984 | 984 | "metadata": {}, |
985 | 985 | "source": [ |
986 | | - "If this is probability is too high for comfortable decision-making, we can perform more trials on site B (as site B has less samples to begin with, each additional data point for site B contributes more inferential \"power\" than each additional data point for site A). \n", |
| 986 | + "If this probability is too high for comfortable decision-making, we can perform more trials on site B (as site B has less samples to begin with, each additional data point for site B contributes more inferential \"power\" than each additional data point for site A). \n", |
987 | 987 | "\n", |
988 | 988 | "Try playing with the parameters `true_p_A`, `true_p_B`, `N_A`, and `N_B`, to see what the posterior of $\\text{delta}$ looks like. Notice in all this, the difference in sample sizes between site A and site B was never mentioned: it naturally fits into Bayesian analysis.\n", |
989 | 989 | "\n", |
|
996 | 996 | "source": [ |
997 | 997 | "## An algorithm for human deceit\n", |
998 | 998 | "\n", |
999 | | - "Social data is has an additional layer of interest as people are not always honest with responses, which adds a further complication into inference. For example, simply asking individuals \"Have you ever cheated on a test?\" will surely contain some rate of dishonesty. What you can say for certain is that the true rate is less than your observed rate (assuming individuals lie *only* about *not cheating*; I cannot imagine one who would admit \"Yes\" to cheating when in fact they hadn't cheated). \n", |
| 999 | + "Social data has an additional layer of interest as people are not always honest with responses, which adds a further complication into inference. For example, simply asking individuals \"Have you ever cheated on a test?\" will surely contain some rate of dishonesty. What you can say for certain is that the true rate is less than your observed rate (assuming individuals lie *only* about *not cheating*; I cannot imagine one who would admit \"Yes\" to cheating when in fact they hadn't cheated). \n", |
1000 | 1000 | "\n", |
1001 | 1001 | "To present an elegant solution to circumventing this dishonesty problem, and to demonstrate Bayesian modeling, we first need to introduce the binomial distribution.\n", |
1002 | 1002 | "\n", |
|
1984 | 1984 | "\n", |
1985 | 1985 | "The skeptical reader will say \"You deliberately chose the logistic function for $p(t)$ and the specific priors. Perhaps other functions or priors will give different results. How do I know I have chosen a good model?\" This is absolutely true. To consider an extreme situation, what if I had chosen the function $p(t) = 1,\\; \\forall t$, which guarantees a defect always occurring: I would have again predicted disaster on January 28th. Yet this is clearly a poorly chosen model. On the other hand, if I did choose the logistic function for $p(t)$, but specified all my priors to be very tight around 0, likely we would have very different posterior distributions. How do we know our model is an expression of the data? This encourages us to measure the model's **goodness of fit**.\n", |
1986 | 1986 | "\n", |
1987 | | - "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artificial dataset which we can simulate. The rational is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n", |
| 1987 | + "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artificial dataset which we can simulate. The rationale is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n", |
1988 | 1988 | "\n", |
1989 | 1989 | "Previously in this Chapter, we simulated artificial dataset for the SMS example. To do this, we sampled values from the priors. We saw how varied the resulting datasets looked like, and rarely did they mimic our observed dataset. In the current example, we should sample from the *posterior* distributions to create *very plausible datasets*. Luckily, our Bayesian framework makes this very easy. We only need to create a new `Stochastic` variable, that is exactly the same as our variable that stored the observations, but minus the observations themselves. If you recall, our `Stochastic` variable that stored our observed data was:\n", |
1990 | 1990 | "\n", |
|
0 commit comments