Statistical Inference For Everyone
Brian Blais, Bryant University
Pub Date: 2017
Conditions of Use
This book is not a comprehensive introduction to elementary statistics, or even statistical inference, as the author Brian Blais deliberately chose read more
This book is not a comprehensive introduction to elementary statistics, or even statistical inference, as the author Brian Blais deliberately chose not to cover all topics of statistical inference. For example, the term matched pairs never appears; neither do Type I or Type II error. The Student's t distribution gets much less attention than in almost every other book; the author offers a rarely used standard-deviation change (page 153) as a way to keep things Gaussian. The author justifies the reduced topic set by calling typical "traditional" approaches flawed in the first pages of text, the Proposal. Instead, Blais tries to develop statistical inference from logic, in a way that might be called Bayesian inference. Other books have taken this approach, more than just Donald Berry's book mentioned on page 32. [For more references, see the ICOTS6 paper by James Albert at https://iase-web.org/documents/papers/icots6/3f1_albe.pdf ] None of those books are open-resource, though; an accurate, comprehensive textbook would have potential. This PDF does not contain that desired textbook, however. As mentioned below under accuracy, clarity, and structure, there are too many missing elements, including the lack of an index. As I read, this PDF felt more like a augmented set of lecture notes than a textbook which stands without instructor support. It's not good enough. (For more on this decision, see the other comments at the end.)
The only non-troubling number of errors in a textbook is zero, but this book has many more than that. In the version I read from the Minnesota-hosted website, my error list includes not defining quartiles from the left (page 129), using ICR instead of IQR (page 133), misstating the 68-95-99 rule as 65-95-99 (page 134), flipping numbers in the combination of the binomial formula (page 232), repeating Figure C-2 as Figure C-1 (page 230), and titling section 2.6 "Monte Hall" instead of "Monty Hall". Infuriatingly, several of these mistakes are correct elsewhere in the book - Monty Hall in section 5.4, the binomial formula in the main text, and 68-95-99 on page 142. I'm also annoyed that some datasets have poor source citations, such as not indicating Fisher's iris data on page 165 and calling something "student measurements during a physics lab" on page 173.
Because there are so many gaps, including full support for computer presentation, it would be easy to update completed sections as needed, such as when Python becomes less popular.
Quality of the prose is fine, but many jargon terms are not well defined. Students learning a subject need clear definitions, but they don't appear. In my notes, I see exclusive (page 36), conditioning (page 40), complement (used on page 40 but never appears in the text), posterior (page 54), correlation (page 55), uniform distribution (page 122), and Greek letters for which the reference to a help table appears on page 140, but Greek letters have appeared earlier. Additionally, several important terms receive insufficient or unusual definitions, including labeling summary description of data as inference (page 34), mutually exclusive (page 36) versus independence (page 43), and plus/minus (page 146, as this definition of +/- applies in lab bench science but not social sciences). I appreciate that the author is trying to avoid calculus with "area under the curve" on page 127, but there's not enough written for a non-calculus student to understand how these probabilities are calculated. To really understand posterior computation, a magical computer and a few graphs aren't good enough.
Internal consistency to Bayesian inference is quite strong; many of the examples repeat the steps of Bayes' Recipe. This is not a concern.
The book needs to be read in linear order, like most statistics books, but that's not necessarily a negative thing. Dr. Blais is trying to take the reader through a structured development of Bayesian inference, which has a single path. There are a few digressions, such as fallacies about probability reasoning, but the book generally maintains a single path from chapters 1 to at least 7. Most sections are less than 10 pages and don't involve lots of self-references. Although I rated reorganization possibility as low, due to the near-impossibility of realigning the argument, I consider it harsh to penalize the book for this.
There isn't enough structure for a textbook; this feels more like a set of augmented lecture notes that a book for guided study. I mentioned poor definitions under "Clarity", so let me add other topics here. The most frustrating structural problem for me is the presentation of the fundamental idea of Bayesian inference, posterior proportional to prior * likelihood. The word prior first appears on page 48, but receives no clear definition until a side-note on page 97. The word posterior first appears on page 53. Despite this, the fundamental equation is never written with all three words in the correct places until page 154. That's way, way too late. The three key terms should have been defined around page 50 and drilled throughout all the sections. The computer exercises also have terrible structure. The first section with computer exercises, section 2.9 on page 72, begins with code. The reader has no idea about the language, package, or purpose of these weird words in boxes. The explanation about Python appears as Appendix A, after all the exercises. It would not have taken much to explain Python and the purpose of the computer exercises in Chapter 1 or 2, but it didn't happen. A classroom instructor could explain this in class, but the Open Resource Project doesn't provide an instructor with every book. Like the other things mentioned, the structure around computing is insufficient.
I had no problems navigating through the chapters. Images look fine as well.
Grammar and spelling are good. I only spotted one typographical error, "posterier" on page 131, and very few awkward sentences.
This is a US-centered book, since it refers to the "standard deck" of playing cards on page 36 as the US deck; other places like Germany have different suits. The book also uses "heads" and "tails" for coins, while other countries such as Mexico use different terms. I wouldn't call this a major problem, however; the pictures and diagrams make the coins and cards pretty clear. There aren't many examples involving people, so there's little scope for ethnicities and backgrounds.
On Brian Blais's webpage for the book, referenced only in Appendix A for some reason, he claims that this book is targeted to the typical Statistics 101 college student. It is NOT. Typical college students need much more support than what this book offers - better structure, better scaffolding, more worked examples, support for computing. What percentage of all college students would pick up Python given the contents presented here? My prior estimate would be 5%. Maybe students at Bryant university, where Pre-Calculus is the lowest math course offered, have a higher Python rate, but the bottom 20% of my students at Oklahoma State struggle with order of operations and using the combinations formula. They would need massive support, and Oklahoma State enrolls above-average college students. This book does not have massive support - or much at all. This makes me sad, because I've argued that we should teach hypothesis testing through credible intervals because I think students will understand the logic better than the frequentist philosophical approach. In 2014, I wrote a guest blog post [http://www.culturalcognition.net/blog/2014/9/5/teaching-how-to-teach-bayess-theorem-covariance-recognition.html] on teaching Bayes' Rule. I would value a thorough book that might work for truly typical students, but for the students in my everyone, this won't work.
Table of Contents
- 1 Introduction to Probability
- 2 Applications of Probability
- 3 Random Sequences and Visualization
- 4 Introduction to Model Comparison
- 5 Applications of Model Comparison
- 6 Introduction to Parameter Estimation
- 7 Priors, Likelihoods, and Posteriors
- 8 Common Statistical Significance Tests
- 9 Applications of Parameter Estimation and Inference
- 10 Multi-parameter Models
- 11 Introduction to MCMC
- 12 Concluding Thoughts
Appendix A: Computational Analysis
Appendix B: Notation and Standards
Appendix C: Common Distributions and Their Properties
Appendix D: Tables
About the Book
This is a new approach to an introductory statistical inference textbook, motivated by probability theory as logic. It is targeted to the typical Statistics 101 college student, and covers the topics typically covered in the first semester of such a course. It is freely available under the Creative Commons License, and includes a software library in Python for making some of the calculations and visualizations easier.
About the Contributors
Brian Blais professor of Science and Technology, Bryant University and a research professor at the Institute for Brain and Neural Systems, Brown University.