Skip to content

Commit 9524ec9

Browse files
Merge pull request CamDavidsonPilon#151 from williamscott/master
Minor editing changes to chapters 2 and 3
2 parents bfdd2b9 + 49cdf30 commit 9524ec9

File tree

3 files changed

+8
-9
lines changed

3 files changed

+8
-9
lines changed

Chapter2_MorePyMC/MorePyMC.ipynb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1182,7 +1182,6 @@
11821182
"input": [
11831183
"import pymc as pm\n",
11841184
"\n",
1185-
"\n",
11861185
"N = 100\n",
11871186
"p = pm.Uniform(\"freq_cheating\", 0, 1)"
11881187
],
@@ -2578,7 +2577,7 @@
25782577
"### References\n",
25792578
"\n",
25802579
"- [1] Dalal, Fowlkes and Hoadley (1989),JASA, 84, 945-957.\n",
2581-
"- [2] German Rodriguez. Datasets. In WWS509. Retrieved 30/01/2013, from http://data.princeton.edu/wws509/datasets/#smoking.\n",
2580+
"- [2] German Rodriguez. Datasets. In WWS509. Retrieved 30/01/2013, from <http://data.princeton.edu/wws509/datasets/#smoking>.\n",
25822581
"- [3] McLeish, Don, and Cyntha Struthers. STATISTICS 450/850 Estimation and Hypothesis Testing. Winter 2012. Waterloo, Ontario: 2012. Print.\n",
25832582
"- [4] Fonnesbeck, Christopher. \"Building Models.\" PyMC-Devs. N.p., n.d. Web. 26 Feb 2013. <http://pymc-devs.github.com/pymc/modelbuilding.html>.\n",
25842583
"- [5] Cronin, Beau. \"Why Probabilistic Programming Matters.\" 24 Mar 2013. Google, Online Posting to Google . Web. 24 Mar. 2013. <https://plus.google.com/u/0/107971134877020469960/posts/KpeRdJKR6Z1>.\n",

Chapter3_MCMC/IntroMCMC.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -417,7 +417,7 @@
417417
"\n",
418418
" taus = 1.0/mc.Uniform( \"stds\", 0, 100, size= 2)**2 \n",
419419
"\n",
420-
"Notice that we specified `size=2`: we are modeling both $\\tau$s as a single PyMC variable. Note that is does not induce a necessary relationship between the two $\\tau$s, it is simply for succinctness.\n",
420+
"Notice that we specified `size=2`: we are modeling both $\\tau$s as a single PyMC variable. Note that this does not induce a necessary relationship between the two $\\tau$s, it is simply for succinctness.\n",
421421
"\n",
422422
"We also need to specify priors on the centers of the clusters. The centers are really the $\\mu$ parameters in this Normal distributions. Their priors can be modeled by a Normal distribution. Looking at the data, I have an idea where the two centers might be &mdash; I would guess somewhere around 120 and 190 respectively, though I am not very confident in these eyeballed estimates. Hence I will set $\\mu_0 = 120, \\mu_1 = 190$ and $\\sigma_{0,1} = 10$ (recall we enter the $\\tau$ parameter, so enter $1/\\sigma^2 = 0.01$ in the PyMC variable.)"
423423
]
@@ -917,7 +917,7 @@
917917
"\n",
918918
" L = 1 if prob > 0.5 else 0\n",
919919
"\n",
920-
"we can optimize our guesses using *loss function*, of which the entire fifth chapter is devoted to. \n",
920+
"we can optimize our guesses using a *loss function*, which the entire fifth chapter is devoted to. \n",
921921
"\n",
922922
"\n",
923923
"### Using `MAP` to improve convergence\n",
@@ -1177,7 +1177,7 @@
11771177
"cell_type": "markdown",
11781178
"metadata": {},
11791179
"source": [
1180-
"The largest plot on the right-hand side is the histograms of the samples, plus a few extra features. The thickest vertical line represents the posterior mean, which is a good summary of posterior distribution. The interval between the two dashed vertical lines in each the posterior distributions represent the *95% credible interval*, not to be confused with a *95% confidence interval*. I won't get into the latter, but the former can be interpreted as \"there is a 95% chance the parameter of interested lies in this interval\". (Changing default parameters in the call to `mcplot` provides alternatives to 95%.) When communicating your results to others, it is incredibly important to state this interval. One of our purposes for studying Bayesian methods is to have a clear understanding of our uncertainty in unknowns. Combined with the posterior mean, the 95% credible interval provides a reliable interval to communicate the likely location of the unknown (provided by the mean) *and* the uncertainty (represented by the width of the interval)."
1180+
"The largest plot on the right-hand side is the histograms of the samples, plus a few extra features. The thickest vertical line represents the posterior mean, which is a good summary of posterior distribution. The interval between the two dashed vertical lines in each the posterior distributions represent the *95% credible interval*, not to be confused with a *95% confidence interval*. I won't get into the latter, but the former can be interpreted as \"there is a 95% chance the parameter of interest lies in this interval\". (Changing default parameters in the call to `mcplot` provides alternatives to 95%.) When communicating your results to others, it is incredibly important to state this interval. One of our purposes for studying Bayesian methods is to have a clear understanding of our uncertainty in unknowns. Combined with the posterior mean, the 95% credible interval provides a reliable interval to communicate the likely location of the unknown (provided by the mean) *and* the uncertainty (represented by the width of the interval)."
11811181
]
11821182
},
11831183
{

Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,7 @@
489489
"cell_type": "markdown",
490490
"metadata": {},
491491
"source": [
492-
"One way to determine a prior on the upvote ratio is that look at the historical distribution of upvote ratios. This can be accomplished by scrapping Reddit's comments and determining a distribution. There are a few problems with this technique though:\n",
492+
"One way to determine a prior on the upvote ratio is that look at the historical distribution of upvote ratios. This can be accomplished by scraping Reddit's comments and determining a distribution. There are a few problems with this technique though:\n",
493493
"\n",
494494
"1. Skewed data: The vast majority of comments have very few votes, hence there will be many comments with ratios near the extremes (see the \"triangular plot\" in the above Kaggle dataset), effectively skewing our distribution to the extremes. One could try to only use comments with votes greater than some threshold. Again, problems are encountered. There is a tradeoff between number of comments available to use and a higher threshold with associated ratio precision. \n",
495495
"2. Biased data: Reddit is composed of different subpages, called subreddits. Two examples are *r/aww*, which posts pics of cute animals, and *r/politics*. It is very likely that the user behaviour towards comments of these two subreddits are very different: visitors are likely friend and affectionate in the former, and would therefore upvote comments more, compared to the latter, where comments are likely to be controversial and disagreed upon. Therefore not all comments are the same. \n",
@@ -995,7 +995,7 @@
995995
"& b = 1 + N - S \\\\\\\\\n",
996996
"\\end{align}\n",
997997
"\n",
998-
"where $N$ is the number of users who rated, and $S$ is the sum of all the ratings, under the equivilance scheme mentioned above. "
998+
"where $N$ is the number of users who rated, and $S$ is the sum of all the ratings, under the equivalence scheme mentioned above. "
999999
]
10001000
},
10011001
{
@@ -1133,7 +1133,7 @@
11331133
"### References\n",
11341134
"\n",
11351135
"1. Wainer, Howard. *The Most Dangerous Equation*. American Scientist, Volume 95.\n",
1136-
"2. Clarck, Torin K., Aaron W. Johnson, and Alexander J. Stimpson. \"Going for Three: Predicting the Likelihood of Field Goal Success with Logistic Regression.\" (2013): n. page. Web. 20 Feb. 2013.\n",
1136+
"2. Clarck, Torin K., Aaron W. Johnson, and Alexander J. Stimpson. \"Going for Three: Predicting the Likelihood of Field Goal Success with Logistic Regression.\" (2013): n. page. [Web](http://www.sloansportsconference.com/wp-content/uploads/2013/Going%20for%20Three%20Predicting%20the%20Likelihood%20of%20Field%20Goal%20Success%20with%20Logistic%20Regression.pdf). 20 Feb. 2013.\n",
11371137
"3. http://en.wikipedia.org/wiki/Beta_function#Incomplete_beta_function"
11381138
]
11391139
},
@@ -1236,4 +1236,4 @@
12361236
"metadata": {}
12371237
}
12381238
]
1239-
}
1239+
}

0 commit comments

Comments
 (0)