|
470 | 470 | "\n", |
471 | 471 | "1. We started by thinking \"what is the best random variable to describe this count data?\" A Poisson random variable is a good candidate because it can represent count data. So we model the number of sms's received as sampled from a Poisson distribution.\n", |
472 | 472 | "\n", |
473 | | - "2. Next, we think, \"Ok, assuming sms's are Poisson-distributed, what do I need for the Poisson distribution?\" Well, the Poisson distribution has a parameters $\\lambda$. \n", |
| 473 | + "2. Next, we think, \"Ok, assuming sms's are Poisson-distributed, what do I need for the Poisson distribution?\" Well, the Poisson distribution has a parameter $\\lambda$. \n", |
474 | 474 | "\n", |
475 | 475 | "3. Do we know $\\lambda$? No. In fact, we have a suspicion that there are *two* $\\lambda$ values, one for the earlier behaviour and one for the latter behaviour. We don't know when the behaviour switches though, but call the switchpoint $\\tau$.\n", |
476 | 476 | "\n", |
|
670 | 670 | "\n", |
671 | 671 | "As this is a hacker book, we'll continue with the web-dev example. For the moment, we will focus on the analysis of site A only. Assume that there is some true $0 \\lt p_A \\lt 1$ probability that users who, upon shown site A, eventually purchase from the site. This is the true effectiveness of site A. Currently, this quantity is unknown to us. \n", |
672 | 672 | "\n", |
673 | | - "Suppose site A was shown to $N$ people, and $n$ people purchased from the site. One might conclude hastly that $p_A = \\frac{n}{N}$. Unfortunately, the *observed frequency* $\\frac{n}{N}$ does not necessarily equal $p_A$ -- there is a difference between the *observed frequency* and the *true frequency* of an event. The true frequency can be interpreted as the probability of an event occurring. For example, the true frequency of rolling a 1 on a 6-sided die is $\\frac{1}{6}$. Knowing the true frequency of events like:\n", |
| 673 | + "Suppose site A was shown to $N$ people, and $n$ people purchased from the site. One might conclude hastily that $p_A = \\frac{n}{N}$. Unfortunately, the *observed frequency* $\\frac{n}{N}$ does not necessarily equal $p_A$ -- there is a difference between the *observed frequency* and the *true frequency* of an event. The true frequency can be interpreted as the probability of an event occurring. For example, the true frequency of rolling a 1 on a 6-sided die is $\\frac{1}{6}$. Knowing the true frequency of events like:\n", |
674 | 674 | "\n", |
675 | 675 | "- fraction of users who make purchases, \n", |
676 | 676 | "- frequency of social attributes, \n", |
|
1074 | 1074 | "\n", |
1075 | 1075 | "Try playing with the parameters `true_p_A`, `true_p_B`, `N_A`, and `N_B`, to see what the posterior of $\\text{delta}$ looks like. Notice in all this, the difference in sample sizes between site A and site B was never mentioned: it naturally fits into Bayesian analysis.\n", |
1076 | 1076 | "\n", |
1077 | | - "I hope the readers feel this style of A/B testing is more natural than hypothesis testing, which the latter has probably confused more than helped practitioners. Later in this book, we will see two extensions of this model: the first to help dynamically adjust for bad sites, and the second will improve the speed of this computation by reducing the analysis to a single equation. " |
| 1077 | + "I hope the readers feel this style of A/B testing is more natural than hypothesis testing, which has probably confused more than helped practitioners. Later in this book, we will see two extensions of this model: the first to help dynamically adjust for bad sites, and the second will improve the speed of this computation by reducing the analysis to a single equation. " |
1078 | 1078 | ] |
1079 | 1079 | }, |
1080 | 1080 | { |
|
1182 | 1182 | "input": [ |
1183 | 1183 | "import pymc as pm\n", |
1184 | 1184 | "\n", |
1185 | | - "\n", |
1186 | 1185 | "N = 100\n", |
1187 | 1186 | "p = pm.Uniform(\"freq_cheating\", 0, 1)" |
1188 | 1187 | ], |
|
2578 | 2577 | "### References\n", |
2579 | 2578 | "\n", |
2580 | 2579 | "- [1] Dalal, Fowlkes and Hoadley (1989),JASA, 84, 945-957.\n", |
2581 | | - "- [2] German Rodriguez. Datasets. In WWS509. Retrieved 30/01/2013, from http://data.princeton.edu/wws509/datasets/#smoking.\n", |
| 2580 | + "- [2] German Rodriguez. Datasets. In WWS509. Retrieved 30/01/2013, from <http://data.princeton.edu/wws509/datasets/#smoking>.\n", |
2582 | 2581 | "- [3] McLeish, Don, and Cyntha Struthers. STATISTICS 450/850 Estimation and Hypothesis Testing. Winter 2012. Waterloo, Ontario: 2012. Print.\n", |
2583 | 2582 | "- [4] Fonnesbeck, Christopher. \"Building Models.\" PyMC-Devs. N.p., n.d. Web. 26 Feb 2013. <http://pymc-devs.github.com/pymc/modelbuilding.html>.\n", |
2584 | 2583 | "- [5] Cronin, Beau. \"Why Probabilistic Programming Matters.\" 24 Mar 2013. Google, Online Posting to Google . Web. 24 Mar. 2013. <https://plus.google.com/u/0/107971134877020469960/posts/KpeRdJKR6Z1>.\n", |
|
0 commit comments