Merge branch 'master' of github.com:CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

CamDavidsonPilon · CamDavidsonPilon · commit 556bccc254f0 · 2013-07-13T22:43:57.000-04:00
diff --git a/Chapter2_MorePyMC/MorePyMC.ipynb b/Chapter2_MorePyMC/MorePyMC.ipynb
@@ -983,7 +983,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "If this is probability is too high for comfortable decision-making, we can perform more trials on site B (as site B has less samples to begin with, each additional data point for site B contributes more inferential \"power\" than each additional data point for site A). \n",
+      "If this probability is too high for comfortable decision-making, we can perform more trials on site B (as site B has less samples to begin with, each additional data point for site B contributes more inferential \"power\" than each additional data point for site A). \n",
       "\n",
       "Try playing with the parameters `true_p_A`, `true_p_B`, `N_A`, and `N_B`, to see what the posterior of $\\text{delta}$ looks like. Notice in all this, the difference in sample sizes between site A and site B was never mentioned: it naturally fits into Bayesian analysis.\n",
       "\n",
@@ -996,7 +996,7 @@
      "source": [
       "## An algorithm for human deceit\n",
       "\n",
-      "Social data is has an additional layer of interest as people are not always honest with responses, which adds a further complication into inference. For example, simply asking individuals \"Have you ever cheated on a test?\" will surely contain some rate of dishonesty. What you can say for certain is that the true rate is less than your observed rate (assuming individuals lie *only* about *not cheating*; I cannot imagine one who would admit \"Yes\" to cheating when in fact they hadn't cheated). \n",
+      "Social data has an additional layer of interest as people are not always honest with responses, which adds a further complication into inference. For example, simply asking individuals \"Have you ever cheated on a test?\" will surely contain some rate of dishonesty. What you can say for certain is that the true rate is less than your observed rate (assuming individuals lie *only* about *not cheating*; I cannot imagine one who would admit \"Yes\" to cheating when in fact they hadn't cheated). \n",
       "\n",
       "To present an elegant solution to circumventing this dishonesty problem, and to demonstrate Bayesian modeling, we first need to introduce the binomial distribution.\n",
       "\n",
@@ -1984,7 +1984,7 @@
       "\n",
       "The skeptical reader will say \"You deliberately chose the logistic function for $p(t)$ and the specific priors. Perhaps other functions or priors will give different results. How do I know I have chosen a good model?\" This is absolutely true. To consider an extreme situation, what if I had chosen the function $p(t) = 1,\\; \\forall t$, which guarantees a defect always occurring: I would have again predicted disaster on January 28th. Yet this is clearly a poorly chosen model. On the other hand, if I did choose the logistic function for $p(t)$, but specified all my priors to be very tight around 0, likely we would have very different posterior distributions. How do we know our model is an expression of the data? This encourages us to measure the model's **goodness of fit**.\n",
       "\n",
-      "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artificial dataset which we can simulate. The rational is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n",
+      "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artificial dataset which we can simulate. The rationale is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n",
       "\n",
       "Previously in this Chapter, we simulated artificial dataset for the SMS example. To do this, we sampled values from the priors. We saw how varied the resulting datasets looked like, and rarely did they mimic our observed dataset. In the current example,  we should sample from the *posterior* distributions to create *very plausible datasets*. Luckily, our Bayesian framework makes this very easy. We only need to create a new `Stochastic` variable, that is exactly the same as our variable that stored the observations, but minus the observations themselves. If you recall, our `Stochastic` variable that stored our observed data was:\n",
       "\n",
diff --git a/Chapter3_MCMC/IntroMCMC.ipynb b/Chapter3_MCMC/IntroMCMC.ipynb
@@ -922,7 +922,7 @@
       "\n",
       "### Using `MAP` to improve convergence\n",
       "\n",
-      "If you ran the above example yourself, you may have noticed that our results were not consistent: perhaps your cluster cluster division was more scattered, or perhaps less scattered. The problem is that our traces are a function of the *starting values* of the MCMC algorithm. \n",
+      "If you ran the above example yourself, you may have noticed that our results were not consistent: perhaps your cluster division was more scattered, or perhaps less scattered. The problem is that our traces are a function of the *starting values* of the MCMC algorithm. \n",
       "\n",
       "It can be mathematically shown that letting the MCMC run long enough, by performing many steps, the algorithm *should forget its initial position*. In fact, this is what it means to say the MCMC converged (in practice though we can never achieve total convergence). Hence if we observe different posterior analysis, it is likely because our MCMC has not fully converged yet, and we should not use samples from it yet (we should use a larger burn-in period ).\n",
       "\n",
@@ -969,7 +969,7 @@
       "$$x_t \\sim \\text{Normal}(0,1), \\;\\; x_0 = 0$$\n",
       "$$y_t \\sim \\text{Normal}(y_{t-1}, 1 ), \\;\\; y_0 = 0$$\n",
       "\n",
-      "which has an example paths like:"
+      "which have example paths like:"
      ]
     },
     {
@@ -1078,7 +1078,7 @@
       "plt.bar(x, autocorr( y_t )[1:max_x], edgecolor=colors[0],\n",
       "        label=\"no thinning\", color = colors[0], width =1 )\n",
       "plt.bar(x, autocorr( y_t[::2] )[1:max_x], edgecolor=colors[1],\n",
-      "        label=\"keeping every 2d sample\", color = colors[1], width=1 )\n",
+      "        label=\"keeping every 2nd sample\", color = colors[1], width=1 )\n",
       "plt.bar(x, autocorr( y_t[::3] )[1:max_x], width =1, edgecolor = colors[2],\n",
       "        label=\"keeping every 3rd sample\", color = colors[2] )\n",
       "\n",
diff --git a/Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb b/Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb
@@ -481,7 +481,7 @@
       "In light of these, I think it is better to use a `Uniform` prior.\n",
       "\n",
       "\n",
-      "With our prior in place, we can find the posterior of the true upvote ratio. The Python script `comments_for_top_reddit_pic.py` will scrap the comments from the current top picture on Reddit. Below is the picture, and some comments:"
+      "With our prior in place, we can find the posterior of the true upvote ratio. The Python script `comments_for_top_reddit_pic.py` will scrape the comments from the current top picture on Reddit. Below is the picture, and some comments:"
      ]
     },
     {

Original file line number	Diff line number	Diff line change
`@@ -481,7 +481,7 @@`
`481`	`481`	"In light of these, I think it is better to use a `Uniform` prior.\n",
`482`	`482`	`"\n",`
`483`	`483`	`"\n",`
`484`		- "With our prior in place, we can find the posterior of the true upvote ratio. The Python script `comments_for_top_reddit_pic.py` will scrap the comments from the current top picture on Reddit. Below is the picture, and some comments:"
	`484`	+ "With our prior in place, we can find the posterior of the true upvote ratio. The Python script `comments_for_top_reddit_pic.py` will scrape the comments from the current top picture on Reddit. Below is the picture, and some comments:"
`485`	`485`	`]`
`486`	`486`	`},`
`487`	`487`	`{`