You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Chapter1_Introduction/Chapter1_Introduction.ipynb
+11-11Lines changed: 11 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@
107
107
"\n",
108
108
"Notice in the paragraph above, I assigned the belief (probability) measure to an *individual*, not to Nature. This is very interesting, as this definition leaves room for conflicting beliefs between individuals. Again, this is appropriate for what naturally occurs: different individuals have different beliefs of events occuring, because they possess different *information* about the world.\n",
109
109
"\n",
110
-
"Think about how we can extend this definition of probability to events that are not *really* random. That is, think about how we can extend this to anything that is fixed, but we are unsure about: \n",
110
+
"Think about how we can extend this definition of probability to events that are not *really* random. That is, we can extend this to anything that is fixed, but we are unsure about: \n",
111
111
"\n",
112
112
"- Your code either has a bug in it or not, but we do not know for certain which is true. Though we have a belief about the presence or absence of a bug. \n",
113
113
"\n",
@@ -119,17 +119,17 @@
119
119
"\n",
120
120
"To align ourselves with traditional probability notation, we denote our belief about event $A$ as $P(A)$.\n",
121
121
"\n",
122
-
"John Maynard Keynes, a great economist and thinker, said \"When the facts change, I change my mind. What do you do, sir?\" This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even -especially- if the evidence is counter to what was initially believed, it cannot be ignored. We denote our updated belief as $P(A |X )$, interpreted as the probability of $A$ given the evidence $X$. We call it the *posterior probability* so as to contrast the pre-evidence *prior probability*. Consider the posterior probabilities (read: posterior belief) of the above examples, after observing evidence $X$.:\n",
122
+
"John Maynard Keynes, a great economist and thinker, said \"When the facts change, I change my mind. What do you do, sir?\" This quote reflects the way a Bayesian updates his or her beliefs after seeing evidence. Even --especially-- if the evidence is counter to what was initially believed, the evidence cannot be ignored. We denote our updated belief as $P(A |X )$, interpreted as the probability of $A$ given the evidence $X$. We call the updated belief the *posterior probability* so as to contrast it with the *prior probability*. For example, consider the posterior probabilities (read: posterior belief) of the above examples, after observing some evidence $X$.:\n",
123
123
"\n",
124
124
"1\\. $P(A): \\;\\;$ This big, complex code likely has a bug in it. $P(A | X): \\;\\;$ The code passed all $X$ tests; there still might be a bug, but its presence is less likely now.\n",
125
125
"\n",
126
-
"2\\. $P(A):\\;\\;$ The patient could have any number of diseases. $P(A | X):\\;\\;$ Performing a urine test generated evidence $X$, ruling out some of the possible diseases from consideration.\n",
126
+
"2\\. $P(A):\\;\\;$ The patient could have any number of diseases. $P(A | X):\\;\\;$ Performing a blood test generated evidence $X$, ruling out some of the possible diseases from consideration.\n",
127
127
"\n",
128
128
"3\\. $P(A):\\;\\;$ That girl in your class probably doesn't have a crush on you. $P(A | X): \\;\\;$ She sent you an SMS message about some statistics homework. Maybe she does like me... \n",
129
129
"\n",
130
130
"It's clear that in each example we did not completely discard the prior belief after seeing new evidence, but we *re-weighted the prior* to incorporate the new evidence (i.e. we put more weight, or confidence, on some beliefs versus others). \n",
131
131
"\n",
132
-
"By introducing prior uncertainity about events, we are already admitting that any guess we make is potentially very wrong. After observing data, evidence, or other information, and we update our beliefs, our guess becomes *less wrong*. This is the opposite side of the prediction coin, where typically we try to be *more right*.\n"
132
+
"By introducing prior uncertainity about events, we are already admitting that any guess we make is potentially very wrong. After observing data, evidence, or other information, and we update our beliefs, our guess becomes *less wrong*. This is the alternative side of the prediction coin, where typically we try to be *more right*.\n"
133
133
]
134
134
},
135
135
{
@@ -148,23 +148,23 @@
148
148
"\n",
149
149
"\n",
150
150
"\n",
151
-
"This is very different from the answer the frequentist function returned. Notice that the Bayesian function accepted an additional argument: *\"Often my code has bugs\"*. This parameter, the *prior*, is that intuition in your head that says \"wait- something looks different with this situation\", or conversely \"yes, this is what I expected\". In our example, the programmer often sees debugging tests fail, but this time we didn't, which signals an alert in our head. By including the prior parameter, we are telling the Bayesian function to include our personal intuition. Technically this parameter in the Bayesian function is optional, but we will see excluding it has its own consequences. \n",
151
+
"This is very different from the answer the frequentist function returned. Notice that the Bayesian function accepted an additional argument: *\"Often my code has bugs\"*. This parameter is the *prior*. By including the prior parameter, we are telling the Bayesian function to include our personal belief about the situation. Technically this parameter in the Bayesian function is optional, but we will see excluding it has its own consequences. \n",
152
152
"\n",
153
153
"\n",
154
-
"As we acquire more and more instances of evidence, our prior belief is *washed out* by the new evidence. This is to be expected. For example, if your prior belief is something ridiculous, like \"I expect the sun to explode today\", and each day you are proved wrong, you would hope that any inference would correct you, or at least align your beliefs. \n",
154
+
"As we acquire more and more instances of evidence, our prior belief is *washed out* by the new evidence. This is to be expected. For example, if your prior belief is something ridiculous, like \"I expect the sun to explode today\", and each day you are proved wrong, you would hope that any inference would correct you, or at least align your beliefs better. \n",
155
155
"\n",
156
156
"\n",
157
-
"Denote $N$ as the number of instances of evidence we possess. As we gather an *infinite* amount of evidence, say as $N \\rightarrow \\infty$, our Bayesian results align with frequentist results. Hence for large $N$, statistical inference is more or less objective. On the other hand, for small $N$, inference is much more *unstable*: frequentist estimates have more variance and larger confidence intervals. This is where Bayesian analysis excels. By introducing a prior, and returning a distribution (instead of an scalar estimate), we *preserve the uncertainity* to reflect the instability of stasticial inference of a small $N$ dataset. \n",
157
+
"Denote $N$ as the number of instances of evidence we possess. As we gather an *infinite* amount of evidence, say as $N \\rightarrow \\infty$, our Bayesian results align with frequentist results. Hence for large $N$, statistical inference is more or less objective. On the other hand, for small $N$, inference is much more *unstable*: frequentist estimates have more variance and larger confidence intervals. This is where Bayesian analysis excels. By introducing a prior, and returning a distribution (instead of a scalar estimate), we *preserve the uncertainity* to reflect the instability of stasticial inference of a small $N$ dataset. \n",
158
158
"\n",
159
-
"One may think that for large $N$, one can be indifferent between the two techniques, and might lean towards the computational-simpler, frequentist methods. An analysist should consider the following quote by Andrew Gelman (2005)[1], before making such a decision:\n",
159
+
"One may think that for large $N$, one can be indifferent between the two techniques, and might lean towards the computational-simpler, frequentist methods. An analysist in this position should consider the following quote by Andrew Gelman (2005)[1], before making such a decision:\n",
160
160
"\n",
161
161
"> Sample sizes are never large. If $N$ is too small to get a sufficiently-precise estimate, you need to get more data (or make more assumptions). But once $N$ is \"large enough,\" you can start subdividing the data to learn more (for example, in a public opinion poll, once you have a good estimate for the entire country, you can estimate among men and women, northerners and southerners, different age groups, etc etc). $N$ is never enough because if it were \"enough\" you'd already be on to the next problem for which you need more data.\n",
162
162
"\n",
163
163
"\n",
164
164
"#### A note on *Big Data*\n",
165
-
"Paradoxically, big data's prediction problems are actually solved by relatively simple models [2]. Thus we can argue that big data's prediction difficulty does not lie in the algorithm used, but instead on the computational difficulties of storage and execution on big data. (One should also consider Gelman's qoute from above and ask \"Do I really have a big data prediction problem?\" )\n",
165
+
"Paradoxically, big data's predictive analytic problems are actually solved by relatively simple models [2]. Thus we can argue that big data's prediction difficulty does not lie in the algorithm used, but instead on the computational difficulties of storage and execution on big data. (One should also consider Gelman's qoute from above and ask \"Do I really have a big data prediction problem?\" )\n",
166
166
"\n",
167
-
"The much more difficult prediction problems involve *medium data* and, especially troublesome, *really small data*. Using a similar argument as Gelman's above, if big data problems are *big enough* to be readily solved, then we should be more interested in the *not-big enough* datasets. "
167
+
"The much more difficult analytic problems involve *medium data* and, especially troublesome, *really small data*. Using a similar argument as Gelman's above, if big data problems are *big enough* to be readily solved, then we should be more interested in the *not-quite-big enough* datasets. "
0 commit comments