Skip to content

Commit 2daef54

Browse files
Merge pull request CamDavidsonPilon#257 from FabianInostroza/master
Fix pdf exporting problems
2 parents b73df5a + 2f4fdbd commit 2daef54

File tree

3 files changed

+18
-18
lines changed

3 files changed

+18
-18
lines changed

Chapter2_MorePyMC/Chapter2.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -284,10 +284,10 @@
284284
"\n",
285285
"$$\n",
286286
"\\lambda = \n",
287-
"\\cases{\n",
287+
"\\begin{cases}\n",
288288
"\\lambda_1 & \\text{if } t \\lt \\tau \\cr\n",
289289
"\\lambda_2 & \\text{if } t \\ge \\tau\n",
290-
"}\n",
290+
"\\end{cases}\n",
291291
"$$\n",
292292
"\n",
293293
"And in PyMC code:"
@@ -703,7 +703,7 @@
703703
"source": [
704704
"Had we had stronger beliefs, we could have expressed them in the prior above.\n",
705705
"\n",
706-
"For this example, consider $p_A = 0.05$, and $N = 1500$ users shown site A, and we will simulate whether the user made a purchase or not. To simulate this from $N$ trials, we will use a *Bernoulli* distribution: if $ X\\ \\sim \\text{Ber}(p)$, then $X$ is 1 with probability $p$ and 0 with probability $1 - p$. Of course, in practice we do not know $p_A$, but we will use it here to simulate the data."
706+
"For this example, consider $p_A = 0.05$, and $N = 1500$ users shown site A, and we will simulate whether the user made a purchase or not. To simulate this from $N$ trials, we will use a *Bernoulli* distribution: if $X\\ \\sim \\text{Ber}(p)$, then $X$ is 1 with probability $p$ and 0 with probability $1 - p$. Of course, in practice we do not know $p_A$, but we will use it here to simulate the data."
707707
]
708708
},
709709
{
@@ -1402,7 +1402,7 @@
14021402
"Given a value for $p$ (which from our god-like position we know), we can find the probability the student will answer yes: \n",
14031403
"\n",
14041404
"\\begin{align}\n",
1405-
"P(\\text{\"Yes\"}) = & P( \\text{Heads on first coin} )P( \\text{cheater} ) + P( \\text{Tails on first coin} )P( \\text{Heads on second coin} ) \\\\\\\\\n",
1405+
"P(\\text{\"Yes\"}) &= P( \\text{Heads on first coin} )P( \\text{cheater} ) + P( \\text{Tails on first coin} )P( \\text{Heads on second coin} ) \\\\\\\\\n",
14061406
"& = \\frac{1}{2}p + \\frac{1}{2}\\frac{1}{2}\\\\\\\\\n",
14071407
"& = \\frac{p}{2} + \\frac{1}{4}\n",
14081408
"\\end{align}\n",
@@ -2613,4 +2613,4 @@
26132613
"metadata": {}
26142614
}
26152615
]
2616-
}
2616+
}

Chapter5_LossFunctions/Chapter5.ipynb

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"metadata": {
33
"name": "",
4-
"signature": "sha256:d818711b2f97a3cf92c91a0b3b4d08d98d49425ad4a069bb8f75864b32488370"
4+
"signature": "sha256:7abaef36c81768505e7cd92f65687bf4cf6bef623e8c18b82087e9cb1964deed"
55
},
66
"nbformat": 3,
77
"nbformat_minor": 0,
@@ -38,31 +38,31 @@
3838
"\n",
3939
"We introduce what statisticians and decision theorists call *loss functions*. A loss function is a function of the true parameter, and an estimate of that parameter\n",
4040
"\n",
41-
"$$ L( \\theta, \\hat{\\theta} ) = f( \\theta, \\hat{\\theta} )$$\n",
41+
"$$L( \\theta, \\hat{\\theta} ) = f( \\theta, \\hat{\\theta} )$$\n",
4242
"\n",
4343
"The important point of loss functions is that it measures how *bad* our current estimate is: the larger the loss, the worse the estimate is according to the loss function. A simple, and very common, example of a loss function is the *squared-error loss*:\n",
4444
"\n",
45-
"$$ L( \\theta, \\hat{\\theta} ) = ( \\theta - \\hat{\\theta} )^2$$\n",
45+
"$$L( \\theta, \\hat{\\theta} ) = ( \\theta - \\hat{\\theta} )^2$$\n",
4646
"\n",
4747
"The squared-error loss function is used in estimators like linear regression, UMVUEs and many areas of machine learning. We can also consider an asymmetric squared-error loss function, something like:\n",
4848
"\n",
49-
"$$ L( \\theta, \\hat{\\theta} ) = \\begin{cases} ( \\theta - \\hat{\\theta} )^2 & \\hat{\\theta} \\lt \\theta \\\\\\\\ c( \\theta - \\hat{\\theta} )^2 & \\hat{\\theta} \\ge \\theta, \\;\\; 0\\lt c \\lt 1 \\end{cases}$$\n",
49+
"$$L( \\theta, \\hat{\\theta} ) = \\begin{cases} ( \\theta - \\hat{\\theta} )^2 & \\hat{\\theta} \\lt \\theta \\\\\\\\ c( \\theta - \\hat{\\theta} )^2 & \\hat{\\theta} \\ge \\theta, \\;\\; 0\\lt c \\lt 1 \\end{cases}$$\n",
5050
"\n",
5151
"\n",
5252
"which represents that estimating a value larger than the true estimate is preferable to estimating a value below. A situation where this might be useful is in estimating web traffic for the next month, where an over-estimated outlook is preferred so as to avoid an underallocation of server resources. \n",
5353
"\n",
5454
"A negative property about the squared-error loss is that it puts a disproportionate emphasis on large outliers. This is because the loss increases quadratically, and not linearly, as the estimate moves away. That is, the penalty of being three units away is much less than being five units away, but the penalty is not much greater than being one unit away, though in both cases the magnitude of difference is the same:\n",
5555
"\n",
56-
"$$ \\frac{1^2}{3^2} \\lt \\frac{3^2}{5^2}, \\;\\; \\text{although} \\;\\; 3-1 = 5-3 $$\n",
56+
"$$\\frac{1^2}{3^2} \\lt \\frac{3^2}{5^2}, \\;\\; \\text{although} \\;\\; 3-1 = 5-3$$\n",
5757
"\n",
5858
"This loss function imposes that large errors are *very* bad. A more *robust* loss function that increases linearly with the difference is the *absolute-loss*\n",
5959
"\n",
60-
"$$ L( \\theta, \\hat{\\theta} ) = | \\theta - \\hat{\\theta} | $$\n",
60+
"$$L( \\theta, \\hat{\\theta} ) = | \\theta - \\hat{\\theta} |$$\n",
6161
"\n",
6262
"Other popular loss functions include:\n",
6363
"\n",
64-
"- $ L( \\theta, \\hat{\\theta} ) = \\mathbb{1}_{ \\hat{\\theta} \\neq \\theta } $ is the zero-one loss often used in machine learning classification algorithms.\n",
65-
"- $ L( \\theta, \\hat{\\theta} ) = -\\hat{\\theta}\\log( \\theta ) - (1-\\hat{ \\theta})\\log( 1 - \\theta ), \\; \\; \\hat{\\theta} \\in {0,1}, \\; \\theta \\in [0,1]$, called the *log-loss*, also used in machine learning. \n",
64+
"- $L( \\theta, \\hat{\\theta} ) = \\mathbb{1}_{ \\hat{\\theta} \\neq \\theta }$ is the zero-one loss often used in machine learning classification algorithms.\n",
65+
"- $L( \\theta, \\hat{\\theta} ) = -\\hat{\\theta}\\log( \\theta ) - (1-\\hat{ \\theta})\\log( 1 - \\theta ), \\; \\; \\hat{\\theta} \\in {0,1}, \\; \\theta \\in [0,1]$, called the *log-loss*, also used in machine learning. \n",
6666
"\n",
6767
"Historically, loss functions have been motivated from 1) mathematical convenience, and 2) they are robust to application, i.e., they are objective measures of loss. The first reason has really held back the full breadth of loss functions. With computers being agnostic to mathematical convenience, we are free to design our own loss functions, which we take full advantage of later in this Chapter.\n",
6868
"\n",
@@ -71,10 +71,10 @@
7171
"By shifting our focus from trying to be incredibly precise about parameter estimation to focusing on the outcomes of our parameter estimation, we can customize our estimates to be optimized for our application. This requires us to design new loss functions that reflect our goals and outcomes. Some examples of more interesting loss functions:\n",
7272
"\n",
7373
"\n",
74-
"- $ L( \\theta, \\hat{\\theta} ) = \\frac{ | \\theta - \\hat{\\theta} | }{ \\theta(1-\\theta) }, \\; \\; \\hat{\\theta}, \\theta \\in [0,1] $ emphasizes an estimate closer to 0 or 1 since if the true value $\\theta$ is near 0 or 1, the loss will be *very* large unless $\\hat{\\theta}$ is similarly close to 0 or 1. \n",
74+
"- $L( \\theta, \\hat{\\theta} ) = \\frac{ | \\theta - \\hat{\\theta} | }{ \\theta(1-\\theta) }, \\; \\; \\hat{\\theta}, \\theta \\in [0,1]$ emphasizes an estimate closer to 0 or 1 since if the true value $\\theta$ is near 0 or 1, the loss will be *very* large unless $\\hat{\\theta}$ is similarly close to 0 or 1. \n",
7575
"This loss function might be used by a political pundit who's job requires him or her to give confident \"Yes/No\" answers. This loss reflects that if the true parameter is close to 1 (for example, if a political outcome is very likely to occur), he or she would want to strongly agree as to not look like a skeptic. \n",
7676
"\n",
77-
"- $L( \\theta, \\hat{\\theta} ) = 1 - \\exp \\left( -(\\theta - \\hat{\\theta} )^2 \\right) $ is bounded between 0 and 1 and reflects that the user is indifferent to sufficiently-far-away estimates. It is similar to the zero-one loss above, but not quite as penalizing to estimates that are close to the true parameter. \n",
77+
"- $L( \\theta, \\hat{\\theta} ) = 1 - \\exp \\left( -(\\theta - \\hat{\\theta} )^2 \\right)$ is bounded between 0 and 1 and reflects that the user is indifferent to sufficiently-far-away estimates. It is similar to the zero-one loss above, but not quite as penalizing to estimates that are close to the true parameter. \n",
7878
"- Complicated non-linear loss functions can programmed: \n",
7979
"\n",
8080
" def loss(true_value, estimate):\n",
@@ -1550,7 +1550,7 @@
15501550
"1. Construct a prior distribution for the halo positions $p(x)$, i.e. formulate our expectations about the halo positions before looking at the data.\n",
15511551
"2. Construct a probabilistic model for the data (observed ellipticities of the galaxies) given the positions of the dark matter halos: $p(e | x)$.\n",
15521552
"3. Use Bayes\u2019 rule to get the posterior distribution of the halo positions, i.e. use to the data to guess where the dark matter halos might be.\n",
1553-
"4. Minimize the expected loss with respect to the posterior distribution over the predictions for the halo positions: $ \\hat{x} = \\arg \\min_{\\text{prediction} } E_{p(x|e)}[ L( \\text{prediction}, x) ]$ , i.e. tune our predictions to be as good as possible for the given error metric.\n",
1553+
"4. Minimize the expected loss with respect to the posterior distribution over the predictions for the halo positions: $\\hat{x} = \\arg \\min_{\\text{prediction} } E_{p(x|e)}[ L( \\text{prediction}, x) ]$ , i.e. tune our predictions to be as good as possible for the given error metric.\n",
15541554
"\n"
15551555
]
15561556
},

Chapter6_Priorities/Chapter6.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
"source": [
123123
"We must remember that choosing a prior, whether subjective or objective, is still part of the modeling process. To quote Gelman [5]:\n",
124124
"\n",
125-
">...after the model has been \ufb01t, one should look at the posterior distribution\n",
125+
">...after the model has been fit, one should look at the posterior distribution\n",
126126
"and see if it makes sense. If the posterior distribution does not make sense, this implies\n",
127127
"that additional prior knowledge is available that has not been included in the model,\n",
128128
"and that contradicts the assumptions of the prior distribution that has been used. It is\n",
@@ -7059,7 +7059,7 @@
70597059
"\n",
70607060
"Earlier, we talked about objective priors rarely being *objective*. Partly what we mean by this is that we want a prior that doesn't bias our posterior estimates. The flat prior seems like a reasonable choice as it assigns equal probability to all values. \n",
70617061
"\n",
7062-
"But the flat prior is not transformation invariant. What does this mean? Suppose we have a random variable $ \\bf X $ from Bernoulli($\\theta$). We define the prior on $p(\\theta) = 1$. "
7062+
"But the flat prior is not transformation invariant. What does this mean? Suppose we have a random variable $\\bf X$ from Bernoulli($\\theta$). We define the prior on $p(\\theta) = 1$. "
70637063
]
70647064
},
70657065
{

0 commit comments

Comments
 (0)