| 
470 | 470 |       "\n",  | 
471 | 471 |       "1.  We started by thinking \"what is the best random variable to describe this count data?\" A Poisson random variable is a good candidate because it can represent count data. So we model the number of sms's received as sampled from a Poisson distribution.\n",  | 
472 | 472 |       "\n",  | 
473 |  | -      "2.  Next, we think, \"Ok, assuming sms's are Poisson-distributed, what do I need for the Poisson distribution?\" Well, the Poisson distribution has a parameters $\\lambda$. \n",  | 
 | 473 | +      "2.  Next, we think, \"Ok, assuming sms's are Poisson-distributed, what do I need for the Poisson distribution?\" Well, the Poisson distribution has a parameter $\\lambda$. \n",  | 
474 | 474 |       "\n",  | 
475 | 475 |       "3.  Do we know $\\lambda$? No. In fact, we have a suspicion that there are *two* $\\lambda$ values, one for the earlier behaviour and one for the latter behaviour. We don't know when the behaviour switches though, but call the switchpoint $\\tau$.\n",  | 
476 | 476 |       "\n",  | 
 | 
670 | 670 |       "\n",  | 
671 | 671 |       "As this is a hacker book, we'll continue with the web-dev example. For the moment, we will focus on the analysis of site A only. Assume that there is some true $0 \\lt p_A \\lt 1$ probability that users who, upon shown site A, eventually purchase from the site. This is the true effectiveness of site A. Currently, this quantity is unknown to us. \n",  | 
672 | 672 |       "\n",  | 
673 |  | -      "Suppose site A was shown to $N$ people, and $n$ people purchased from the site. One might conclude hastly that $p_A = \\frac{n}{N}$. Unfortunately, the *observed frequency* $\\frac{n}{N}$ does not necessarily equal $p_A$ -- there is a difference between the *observed frequency* and the *true frequency* of an event. The true frequency can be interpreted as the probability of an event occurring. For example, the true frequency of rolling a 1 on a 6-sided die is $\\frac{1}{6}$. Knowing the true frequency of events like:\n",  | 
 | 673 | +      "Suppose site A was shown to $N$ people, and $n$ people purchased from the site. One might conclude hastily that $p_A = \\frac{n}{N}$. Unfortunately, the *observed frequency* $\\frac{n}{N}$ does not necessarily equal $p_A$ -- there is a difference between the *observed frequency* and the *true frequency* of an event. The true frequency can be interpreted as the probability of an event occurring. For example, the true frequency of rolling a 1 on a 6-sided die is $\\frac{1}{6}$. Knowing the true frequency of events like:\n",  | 
674 | 674 |       "\n",  | 
675 | 675 |       "- fraction of users who make purchases, \n",  | 
676 | 676 |       "- frequency of social attributes, \n",  | 
 | 
1074 | 1074 |       "\n",  | 
1075 | 1075 |       "Try playing with the parameters `true_p_A`, `true_p_B`, `N_A`, and `N_B`, to see what the posterior of $\\text{delta}$ looks like. Notice in all this, the difference in sample sizes between site A and site B was never mentioned: it naturally fits into Bayesian analysis.\n",  | 
1076 | 1076 |       "\n",  | 
1077 |  | -      "I hope the readers feel this style of A/B testing is more natural than hypothesis testing, which the latter has probably confused more than helped practitioners. Later in this book, we will see two extensions of this model: the first to help dynamically adjust for bad sites, and the second will improve the speed of this computation by reducing the analysis to a single equation.   "  | 
 | 1077 | +      "I hope the readers feel this style of A/B testing is more natural than hypothesis testing, which has probably confused more than helped practitioners. Later in this book, we will see two extensions of this model: the first to help dynamically adjust for bad sites, and the second will improve the speed of this computation by reducing the analysis to a single equation.   "  | 
1078 | 1078 |      ]  | 
1079 | 1079 |     },  | 
1080 | 1080 |     {  | 
 | 
1182 | 1182 |      "input": [  | 
1183 | 1183 |       "import pymc as pm\n",  | 
1184 | 1184 |       "\n",  | 
1185 |  | -      "\n",  | 
1186 | 1185 |       "N = 100\n",  | 
1187 | 1186 |       "p = pm.Uniform(\"freq_cheating\", 0, 1)"  | 
1188 | 1187 |      ],  | 
 | 
2578 | 2577 |       "### References\n",  | 
2579 | 2578 |       "\n",  | 
2580 | 2579 |       "-  [1] Dalal, Fowlkes and Hoadley (1989),JASA, 84, 945-957.\n",  | 
2581 |  | -      "-  [2] German Rodriguez. Datasets. In WWS509. Retrieved 30/01/2013, from http://data.princeton.edu/wws509/datasets/#smoking.\n",  | 
 | 2580 | +      "-  [2] German Rodriguez. Datasets. In WWS509. Retrieved 30/01/2013, from <http://data.princeton.edu/wws509/datasets/#smoking>.\n",  | 
2582 | 2581 |       "-  [3] McLeish, Don, and Cyntha Struthers. STATISTICS 450/850 Estimation and Hypothesis Testing. Winter 2012. Waterloo, Ontario: 2012. Print.\n",  | 
2583 | 2582 |       "-  [4] Fonnesbeck, Christopher. \"Building Models.\" PyMC-Devs. N.p., n.d. Web. 26 Feb 2013. <http://pymc-devs.github.com/pymc/modelbuilding.html>.\n",  | 
2584 | 2583 |       "- [5] Cronin, Beau. \"Why Probabilistic Programming Matters.\" 24 Mar 2013. Google, Online Posting to Google . Web. 24 Mar. 2013. <https://plus.google.com/u/0/107971134877020469960/posts/KpeRdJKR6Z1>.\n",  | 
 | 
0 commit comments