Skip to content

    OpenIntro Statistics

    Reviewed by Paul Murtaugh, Associate Professor, Oregon State University on 7/15/14

    Comprehensiveness rating: 3

    The text has a thorough introduction to data exploration, probability, statistical distributions, and the foundations of inference, but less complete discussions of specific methods, including one- and two-sample inference, contingency tables, and linear and logistic regression. Supposedly intended for "introductory statistics courses at the high school through university levels", it's not clear where this text would fit in at my institution. It includes too much theory for our undergraduate service courses, but not enough practical details for our graduate-level service courses.

    Content Accuracy rating: 4

    The text is mostly accurate, especially the sections on probability and statistical distributions, but there are some puzzling gaffes. For example, it is claimed that the Poisson distribution is suitable only for rare events (p. 148); the unequal-variances form of the standard error of the difference between means is used in conjunction with the t-distribution, with no mention of the need for the Satterthwaite adjustment of the degrees of freedom (p. 231); and the degrees of freedom in the chi-square goodness-of-fit test are not adjusted for the number of estimated parameters (p. 282).

    Relevance/Longevity rating: 3

    Some of the content seems dated. For example, there is a strong emphasis on assessing the normality assumption, even though most of the covered methods work well for non-normal data with reasonable sample sizes. Normal approximations are presented as the tool of choice for working with binomial data, even though exact methods are efficiently implemented in modern computer packages. Fisher's exact test is not even mentioned. The section on model selection, covering just backward elimination and forward selection, seems especially old-fashioned.

    Clarity rating: 3

    The prose is sometimes tortured and imprecise. For example: "Researchers perform an observational study when they collect data in a way that does not directly interfere with how the data arise" (p. 13). "Standard error" is defined as the "standard deviation associated with an estimate" (p. 163), but it is often unclear whether population or sample-based quantities are being referred to. Use of the t-distribution is motivated as a way to "resolve the problem of a poorly estimated standard error", when really it is a way to properly characterize the distribution of a test statistic having a sample-based standard error in the denominator.

    Consistency rating: 3

    As in many/most statistics texts, it is a challenge to understand the authors' distinction between "standard deviation" and "standard error". The title of Chapter 5, "Inference for numerical data", took me by surprise, after the extensive use of numerical data in the discussion of inference in Chapter 4. Some topics seem to be introduced repeatedly, e.g., the Central Limit Theorem (pp. 167, 185, and 222) and the comparison of two proportions (pp. 191 and 268). The authors are sloppy in their use of hat notation when discussing regression models, expressing the fitted value as a function of the parameters, instead of the estimated parameters (pp. 325 and 357).

    Modularity rating: 4

    The text includes sections that could easily be extracted as modules. For example, I can imagine using pieces of Chapters 2 (Probability) and 3 (Distributions of random variables) to motivate methods that I discuss in service courses.

    Organization/Structure/Flow rating: 3

    Chapters 1 through 4, covering data, probability, distributions, and principles of inference flow nicely, but the remaining chapters seem like a somewhat haphazard treatment of some commonly used methods. One-way analysis of variance is introduced as a special topic, with no mention that it is a generalization of the equal-variances t-test to more than two groups. The final chapter (8) gives superficial treatments of two huge topics, multiple linear regression and logistic regression, with insufficient detail to guide serious users of these methods. It is as if the authors ran out of gas after the first seven chapters and decided to use the final chapter as a catchall for some important, uncovered topics.

    Interface rating: 5

    The interface is nicely designed. The availability of data sets and functions at a website (www.openintro.org) and as an R package (cran.r-project.org/web/packages/openintro) is a huge plus that greatly increases the usefulness of the text.

    Grammatical Errors rating: 3

    There are distracting grammatical errors. "Data" is sometimes singular, sometimes plural in the authors' prose. Other examples: "Each of the conclusions are based on some data" (p. 9); "You might already be familiar with many aspects of probability, however, formalization of the concepts is new for most" (p. 68); and "Sometimes two variables is one too many" (p. 21).

    Cultural Relevance rating: 3

    I have no idea how to characterize the cultural relevance of a statistics textbook.

    Comments

    In my opinion, the text is not a strong candidate for an introductory textbook for typical statistics courses, but it contains many sections (particulary on probability and statistical distributions) that could profitably be used as supplemental material in such courses.

    Back