# OpenIntro Statistics - Fourth Edition

David M. Diez, Harvard School of Public Health

Christopher D. Barr, Harvard School of Public Health

Mine Cetinkaya-Rundel, Duke University

Copyright Year: 2015

Last Update: 2019

Publisher: OpenIntro

Language: English

## Conditions of Use

Attribution-ShareAlike

CC BY-SA

## Reviews

Unless I missed something, the following topics do not seem to be covered: stem-and-leaf plots, outlier analysis, methods for finding percentiles, quartiles, Coefficient of Variation, inclusion of calculator or other software, combinatorics,... read more

Unless I missed something, the following topics do not seem to be covered: stem-and-leaf plots, outlier analysis, methods for finding percentiles, quartiles, Coefficient of Variation, inclusion of calculator or other software, combinatorics, simulation methods, bootstrap intervals, or CI's for variance, critical value method for testing, and nonparametric methods.

No inaccuracies found.

Statistics is not a subject that becomes out of date, but in the last couple decades, more emphasis has been given to usage of computer technology and relevant data. Lots of good graphics and referenced data sets, but not much discussion or inclusion of prevailing software such as R, SPSS, Minitab, or free online packages.

Some topics in descriptive statistics are presented without much explanation, such as dotplots and boxplots. Also, the discussion on hypothesis testing could be more detailed and specific.

No issues noted.

Seems fine.

The rationale for assigning topics in Section 1 and 2 is not clear. Also, grouping confidence intervals and hypothesis testing in Ch.5 is odd, when Ch.7 covers hypothesis testing of numerical data. And why dump Ch.6 in between with hypothesis testing of categorical data between them?

Interface appears to be seamless.

No problems noted.

No cultural insensitivity noted.

Overall, I would consider this a decent text for a one-quarter or one-semester introductory statistics textbook. The presentation is professional with plenty of good homework sets and relevant data sets and examples. However, it would not suffice for our two-quarter statistics sequence that includes nonparametrics. The lack of discussion/examples/inclusion of statistical software or calculator usage is disappointing, as is the inclusion of statistical inference using critical values. The fourth edition is a definite improvement over previous editions, but still not the best choice for our curriculum.

There is more than enough material for any introductory statistics course. There are a lot of topics covered. The topics are not covered in great depth; however, as an introductory text, it is appropriate. My biggest complaint is that... read more

There is more than enough material for any introductory statistics course. There are a lot of topics covered. The topics are not covered in great depth; however, as an introductory text, it is appropriate. My biggest complaint is that one-sided tests are basically ignored. There is only a small section explaining why they do not use one sided tests and a brief explanation on how to perform a one sided test.

It is accurate. There is also a list of known errors that shows that errors are fixed in a timely manner. There is some bias in terms of what the authors prioritize. I am not necessarily in disagreement with the authors, but there is a clear voice.

For the most part, examples are limited to biological/medical studies or experiments, so they will last. There are a few instances referencing specific technology (such as iPods) that makes the text feel a bit dated.

The narrative of the text is grounded in examples which I appreciate. The authors introduce a definition or concept by first introducing an example and then reference back to that example to show how that object arises in practice.

The terms and notation are consistent throughout the text.

Each chapter is broken up into sections and each section has sub-sections using standard LaTex numbering. There are chapters and sections that are optional. So future sections will not rely on them. In particular, I like that the probability chapter (which comes early in the text) is not necessary for the chapters on inference.

The topics are in a reasonable order. An interesting note is that they introduce inference with proportions before inference with means.

No display issues with the devices that I have.

Typos that are identified and reported appear to be fixed within a few days which is great.

Examples stay away from cultural topics. However, there are a few instances where he/she are used to refer to a "theoretical person" rather than using they/them

I found the book to be very comprehensive for an undergraduate introduction to statistics - I would likely skip several of the more advanced sections (a few of these I mention below in my comments on its relevance) for this level, but I was glad... read more

I found the book to be very comprehensive for an undergraduate introduction to statistics - I would likely skip several of the more advanced sections (a few of these I mention below in my comments on its relevance) for this level, but I was glad to see them included. I also found it very refreshing to see a wide variability of fields and topics represented in the practice problems. One topic I was surprised to see trimmed and placed online as extra content were the calculations for variance estimates in ANOVA, but these are of course available as supplements for the book. Two topics I found absent were the calculation of effect sizes, such as Cohen's d, and the coverage of interval and ratio scales of measurement (the authors provide a breakdown of numerical variables as only discrete and continuous). I did have a bit of trouble looking up topics in the index - the page numbers seemed to be off for some topics (e.g., effect size).

I did not see any issues with accuracy, though I think the p-value definition could be simplified.

I found the content in the 4th edition is extremely up-to-date - both in terms of its examples, and in terms of keeping up with the "movements" in many disciplines to be more transparent and considered in hypothesis testing choices (e.g., all hypothesis tests are two-tailed [though the reasoning for this is explained, especially in Section 5.3.7 on one-tailed tests), they include Bayes' theorem, many less common distributions for the introductory level like Bernoulli and Poisson, and estimating statistical power/desired sample size). The sections on these advanced topics would make this a candidate for more advanced-level courses than the introductory undergraduate one I teach, and I think will help with longevity. The examples will likely become dated, but that is always the case with statistics textbooks; for now, they all seem very current (in one example, we solve for the % of cat videos out of all the videos on Youtube).

I found the book's prose to be very straightforward and clear overall. The p-value definition could be simplified by eliminating mention of a hypothesis being tested.

I did not find any issues with consistency in the text, though it would be nice to have an additional decimal place reported for the t-values in the t-table, so as to make the presentation of corresponding values between the z and t-tables easier to introduce to students (e.g., tail p of .05 corresponds to t of 1.65 - with rounding - in large samples; but the same tail p falls precisely halfway between z of 1.64 and z of 1.65).

The sections seem easily labeled and would make it easy to skip particular sections, etc. The authors also offer an "alternative" series of sections that could be covered in class to fast-track to regression (the book deals with grouped analyses first) in their introduction to the book.

I found the overall structure to be standard of an introductory statistics course, with the exception of introducing inference with proportions first (as opposed to introducing this with means first instead). However, even with this change, I found the presentation to overall be clear and logical.

The B&W textbook did not seem to pose any problems for me in terms of distortion, understanding images/charts, etc., in print. However, I did find the inclusion of practice problems at the end of each section vs. all together the end of the whole chapter (which is the new arrangement in the 4th edition) to be a challenge - specifically, this made it difficult for me to identify easily where sections ended, and in some places, to follow the train of thought across sections. This could make it easier for students or instructors alike to identify practice on particular concepts, but it may make it more difficult for students to grasp the larger picture from the text alone.

I did not find any grammatical errors.

I was impressed by the scope of fields represented in the example problems - everything from estimating the length of possums' heads, to smoke inhalation in one's line of work, to child development, and so on.

I reviewed a paperback B&W copy of the 4th edition of this book (published 2019), which came with a list describing the major changes/reorganization that was done between this and the 3rd edition.

This book covers the standard topics for an introductory statistics courses: basic terminology, a one-chapter introduction to probability, a one-chapter introduction to distributions, inference for numerical and categorical data, and a one-chapter... read more

This book covers the standard topics for an introductory statistics courses: basic terminology, a one-chapter introduction to probability, a one-chapter introduction to distributions, inference for numerical and categorical data, and a one-chapter introduction to linear regression. The overall length of the book is 436 pages, which is about half the length of some introductory statistics books. Therefore, while the topics are largely the same the depth is lighter in this text than it is in some alternative introductory texts.

Everything appeared to be accurate. There were some author opinions on such things as how to go about analyzing the data and how to determine when a test was appropriate, but those things seem appropriate to me and are welcome in providing guidance to people trying to understand when to choose a particular statistical test or how to interpret the results of one.

The material in the book is currently relevant and, given the topic, some of it will never be irrelevant. The examples were up-to-date, for example, discussing the fact that Google conducts experiments in which different users are given search results in different ways to compare the effectiveness of the presentations. Another welcome topic that is not typical of introductory texts is logistic regression, which I have seen many references to in the currently hot topic of Data Science.

The text is well-written and with interesting examples, many of which used real data.

The book was fairly consistent in its use of terminology. The only issue I had in the layout was that at the end of many sections was a box high-lighting a term. The issue I had with this was that I found the definitions within these boxes to often be more clear than when the term was introduced earlier, which often made me go looking for these boxes before I reached them naturally.

The book is divided into many subsections. I was able to read the entire book in about a month by knocking out a couple of subsections per day.

The order of the topics seemed appropriate and not unlike many alternatives, but there was the issue of the term highlight boxes terms mentioned above.

I read the physical book, which is easy to navigate through the many references. In the PDF of the book, these references are links that take you to the appropriate section.

I found virtually no issues in the grammar or sentence structure of the text.

There aren't really any cultural references in the book.

Overall, I liked the book. The pros are that it's small enough that a person can work their way through it much faster than would be possible with many of the alternatives. Within each chapter are many examples and what the authors call "Guided Practice"; all of these have answers in the book. The odd-numbered exercises also have answers in the book. I think that these features make the book well-suited to self-study. The cons are that the depth is often very light, for example, it would be difficult to learn how to perform simple or multiple regression from this book. Also, I had some issues finding terms in the index.

Covers all of the topics usually found in introductory statistics as well as some extra topics (notably: log transforming data, randomization tests, power calculation, multiple regression, logistic regression, and map data). Similar to most intro... read more

Covers all of the topics usually found in introductory statistics as well as some extra topics (notably: log transforming data, randomization tests, power calculation, multiple regression, logistic regression, and map data). Similar to most intro stat books, it does not cover the Bayesian view at all. It does a more thorough job than most books of covering ideas about data, study design, summarizing data and displaying data. Online supplements cover interactions and bootstrap confidence intervals. The book is written as though one will use tables to calculate, but there is an online supplement for TI-83 and TI-84 calculator. There are labs and instructions for using SAS and R as well. The index and table of contents are clear and useful.

I have used this book now to teach for 4 semesters and have found no errors. It covers all the standard topics fully.

Many examples use real data sets that are on the larger side for intro stats (hundreds or thousands of observations). The book has relevant and easily understood scientific questions. It recognizes the prevalence of technology in statistics and covers reading output from software. Updates and supplements for new topics have been appearing regularly since I first saw the book (in 2013). In addition all of the source code to build the book is available so it can be easily modified.

The writing in this book is very clear and straightforward. It defines terms, explains without jargon, and doesn’t skip over details. It has scientific examples for the topics so they are always in context. I often assign reading and homework before I discuss topics in lecture. Students are able to follow the text on their own. There are also matching videos for students who need a little more help to figure something out.

All of the notation and terms are standard for statistics and consistent throughout the book.

There are sections that can be added and removed at the instructor’s discretion. It would be feasible to use any part of the book without using previous sections as long as students had appropriate prerequisite knowledge. In addition, some topics are marked as “special topics”. These are not necessary knowledge for future sections, so it is easy to see which sections you might leave out if there isn’t time or desire to complete the whole book.

The topics all proceed in an orderly fashion. This book differs a bit in its treatment of inference. Ideas about “unusual” results are seeded throughout the early chapters. Then, the basics of both hypothesis tests and confidence intervals are covered in one chapter. The subsequent chapters have all of the specifics about carrying out hypothesis tests and calculating intervals for different types of data. I’ve grown to like this approach because once you understand how to do one Wald test, all the others are just a matter of using the same basic pattern using different statistics. It definitely makes the students more comfortable with learning a new test because it’s “just the same thing” with different statistics.

Comes in pdf, tablet friendly pdf, and printed (15 dollars from amazon as of March, 2019). The pdf and tablet pdf have links to videos and slides. The text is easy to read without a lot of distracting clutter. There are two drawbacks to the interface. The pdf is untagged which can make it difficult for students who are visually impaired and using screen readers. The second is that “examples” and “exercises” are numbered in a similar manner and students frequently confuse them early in the class.

None. In addition, the book is written with paragraphs that make the text readable. (Unlike many modern books that seem to have random sentences scattered in between bullet points and boxes.)

The book includes examples from a variety of fields (psychology, biology, medicine, and economics to name a few). None of the examples seemed alarming or offensive.

There are many additional resources available for this book including lecture slides, a free online homework system, labs, sample exams, sample syllabuses, and objectives.

The text covers all the core topics of statistics—data, probability and statistical theories and tools. According to the authors, the text is to help students “forming a foundation of statistical thinking and methods,” unfortunately, some basic... read more

The text covers all the core topics of statistics—data, probability and statistical theories and tools. According to the authors, the text is to help students “forming a foundation of statistical thinking and methods,” unfortunately, some basic topics are missed for reaching the goal. For examples, the distinction between descriptive statistics and inferential statistics, the measures of central tendency and dispersion. These concepts should be clarified at the first chapter.

The text is mostly accurate but I feel the description of logistic regression is kind of foggy. The learner can’t capture what is logistic regression without a clear definition and explanation. It should be pointed out that logistic regression is using a logistic function to model a binary dependent variable.

The text needs real world data analysis examples from finance, business and economics which are more relevant to real life. As an example, I suggest the text provides data analysis by using Binomial option pricing model and Black-Scholes option pricing model. It should be appealing to the learners, dealing with a real-life case for better and deeper understanding of Binomial distribution, Normal approximation to the Binomial distribution.

The distinction and common ground between “standard deviation” and “standard error” needs to be clarified.

The contents are consistent.

The modularity is creative and compares well. Chapter4 (foundations of inference), chapter 5 (inference of numerical data) and chapter 6 (inference of categorical data) provide clear and fresh logic for understanding statistics.

The organization/structure provides a smooth way for the contents to gradually progress in depth and breadth.

The interface is great! The nicely designed website (https://www.openintro.org) contains abundant resources which are very valuable for both students and teachers, including the labs, videos, forums and extras. This is the most innovative and comprehensive statistics learning website I have ever seen.

The grammar is good.

The text is culturally inclusive with examples from diverse industries.

There is a Chinese proverb: “one flaw cannot obscure the splendor of the jade.” In my opinion, the text is like jade, and can be used as a standalone text with abundant supplements on its website (https://www.openintro.org). It is especially well suited for social science undergraduate students.

The texts includes basic topics for an introductory course in descriptive and inferential statistics. The approach is mathematical with some applications. More extensive coverage of contingency tables and bivariate measures of association would... read more

The texts includes basic topics for an introductory course in descriptive and inferential statistics. The approach is mathematical with some applications. More extensive coverage of contingency tables and bivariate measures of association would be helpful. Probability is an important topic that is included as a "special topic" in the course.

The text and graphs are accurate.

My interest in this text is for a graduate course in applied statistics in the field of public service. This is a particular use of the text, and my students would benefit from and be interested in more social-political-economic examples. Some examples in the text are traditional ones that are overused, i.e., throwing dice and drawing cards to teach probability. The examples for tree diagrams are very good, e.g., small pox in Boston, breast cancer.

The writing is clear, and numerous graphs and examples make concepts accessible to students. The text, however, is not engaging and can be dry.

The text is consistent.

The text is organized into sections, and the numbering system within each chapter facilitates assigning sections of a chapter. This is a statistics text, and much of the content would be kept in this order.

The content is well-organized. The flow of a chapter is especially good when the authors continue to use a certain example in developing related concepts. There are exercises at the end of each chapter (and exercise solutions at the end of the text).

Display of graphs and figures is good, as is the use of color. The graphs are readable in black and white also. The text is in PDF format; there are no problems of navigation.

There are no grammatical errors.

The examples are general and do not deal with racial or cultural matters.

This text will be useful as a supplement in the graduate course in applied statistics for public service.

The text covers the foundations of data, distributions, probability, regression principles and inferential principles with a very broad net. It is certainly a fitting means of introducing all of these concepts to fledgling research students. At... read more

The text covers the foundations of data, distributions, probability, regression principles and inferential principles with a very broad net. It is certainly a fitting means of introducing all of these concepts to fledgling research students. At the same time, the material is covered in such a matter as to provide future research practitioners with a means of understanding the possibilities when considering research that may prove to be of value in their respective fields. In other words, breadth, yes; and depth, not so much. It can be considered comprehensive if you consider this an introductory text. It's very fitting for my use with teachers whose primary focus is on data analysis rather than post-graduate research.

The text is accurate due to its rather straight forward approach to presenting material. In fact, I particularly like that the authors occasionally point out means by which data or statistics can be presented in a method that can distort the truth. Additionally concepts related to flawed practices in data collection and analysis were presented to point out how inaccuracies could arise in research.

While it would seem that the data in a statistics textbook would remain relevant forever, there are a few factors that may impact such a textbook's relevance and longevity. Since this particular textbook relies heavily on the use of scenarios or case study type examples to introduce/teach concepts, the need to update this information on occasion is real. These updates would serve to ensure the connection between the learner and the material that is conducive to learning. Additionally, as research and analytical methods evolve, then so will the need to cover more non-traditional types of content i.e mixed methodologies, non parametric data sets, new technological research tools etc.

I feel that the greatest strength of this text is its clarity. The simple mention of the subject "statistics" can strike fear in the minds of many students. Perhaps we don't help the situation much with the way we begin launching statistical terminology while demonstrating a few "concepts" on a white board. Well, this text provides a kinder and gentler introduction to data analysis and statistics. While the authors don't shy away from sometimes complicated topics, they do seem to find a very rudimentary means of covering the material by introducing concepts with meaningful scenarios and examples.

On occasion, all of us in academia have experienced a text where the progression from one chapter to another was not very seamless. This is especially true when there are multiple authors. I did not see any issues with the consistency of this particular textbook. In fact, I could not differentiate a change in style or clarity in any sections of this text. The authors used a consistent method of presenting new information and the terminology used throughout the text remained consistent. This is sometimes a problem in statistics as there are a variety of ways to express the similar statistical concepts. This can be particularly confusing to "beginners."

While to some degree the text is easily and readily divisible into smaller reading sections, I would not recommend that anyone alter the sequence of the content until after Chapters 1, 3, and 4 are completed. Materials in the later sections of the text are snaffled upon content covered in these initial chapters. The authors point out that Chapter 2, which deals with probabilities, is optional and not a prerequisite for grasping the content covered in the later chapters. Of course, the content in Chapters 5-8 would surely be useful as supplementary materials/refreshers for students who have mastered the basics in previous statistical coursework.

After much searching, I particularly like the scope and sequence of this textbook. As aforementioned, the authors gently introduce students to very basic statistical concepts. These concepts are reinforced by authentic examples that allow students to connect to the material and see how it is applied in the real world. This introductory material then serves as the foundation for later chapter where students are introduced to inferential statistical practices. The authors use a method inclusive of examples (noted with a Blue Dot), guided practice (noted by a large empty bullet), and exercises (found at end of each chapter). I find this method serves to give the students confidence in knowing that they understand concepts before moving on to new material. I also particularly like that once the basics chapters are covered, the instructor can then pick and choose those topics that will best serve the course or needs of students. In some instances, various groups of students may be directed to certain chapters, while others hone in on that material relevant to their topic.

I viewed the text as a PDF and was pleasantly surprised at the clarity the fluid navigation that is not the norm with many PDFs. The document was very legible. The graphs and diagrams were also clear and provided information in a way that aided in understanding concepts. This was not necessarily the case with some of the tables in the text. I was sometimes confused by tables with missing data or, as was the case on page 11, when the table was sideways on the page.

I did not see any grammatical issues that distract form the content presented.

I did not view an material that I felt would be offensive. The material was culturally relevant to the demographic most likely to use the text in the United State. This is important since examples used authentic situations to connect to the readers. While the examples did connect with the diversity within our country or i.e. the U.K., they may not be the best examples that could be used to connect with those from non-western countries.

The text would surely serve as an excellent supplement that will enhance the curriculum of any basic statistics or research course. While the text could be used in both undergraduate and graduate courses, it is best suited for the social sciences.

There is one section that is under-developed (general concepts about continuous probability distributions), but aside from this, I think the book provides a good coverage of topics appropriate for an introductory statistics course. read more

There is one section that is under-developed (general concepts about continuous probability distributions), but aside from this, I think the book provides a good coverage of topics appropriate for an introductory statistics course.

I did not see any inaccuracies in the book.

I do not see introductory statistics content ever becoming obsolete.

I think that the book is fairly easy to read. The authors bold important terms, and frequently put boxes around important formulas or definitions. If anything, I would prefer the book to have slightly more mathematical notation.

I did not see any problems in regards to the book's notation or terminology. It appears smooth and seamless.

The book is broken into small sections for each topic. Any significant rearranging of those sections would be incredibly detrimental to the reader, but that is true of any statistics textbook, especially at the introductory level: Earlier concepts provide the basis for later concepts.

For the most part I liked the flow of the book, though there were a few instances where I would have liked to see some different organization. For example, the Central Limit Theorem is introduced and used early in the inference section, and then later examined in more detail. I would tend to group this in with sampling distributions. Also, for how the authors seem to be focusing on practicalities, I was somewhat surprised about some of the organization of the inference sections. The authors use the Z distribution to work through much of the 1-sample inference. The t distribution is introduced much later. I realize this is how some prefer it, but I think introducing the t distribution sooner is more practical. The organization in chapter 5 also seems a bit convoluted to me. The chapter is about "inference for numerical data". They authors already discussed 1-sample inference in chapter 4, so the first two sections in chapter 5 are Paired Data and Difference of Means, then they introduce the t-distribution and go back to 1-sample inference for the mean, and then to inference for two means using he t-distribution. It strikes me as jumping around a bit. Overall the organization is good, so I'm still rating it high, but individual instructors may disagree with some of the order of presentation.

In general I was satisfied. My only complaint in this is that, unlike a number of "standard" introductory statistics textbooks I have seen, is that the exercises are organized in a page-wide format, instead of, say, in two columns. I assume this is for the benefit of those using mobile devices to view the book, but scrolling through on a computer, the sections and the exercises tend to blend together. Some more separation between sections, and between text vs. exercises would be appreciated.

I think it's fine.

The examples and exercises seem to be USA-centric (though I did spot one or two UK-based examples), but I do not think that it was being insensitive to any group.

In addition to the above item-specific comments: #. I think that the first chapter has some good content about experiments vs. observational studies, and about sampling. Better than most of the introductory book that I have used thus far (granted, my books were more geared towards engineers). #. Some of the sections have only a few exercises, and more exercises are provided at the end of chapters. This is similar to many other textbooks, but since there are generally fewer section exercises, they are easy to miss when scrolling through, and provide less selection for instructors. I think it would be better to group all of the chapter's exercises until each section can have a greater number of exercises. #. I do not think that the exercises focus in on any discipline, nor do they exclude any discipline. This could be either a positive or a negative to individual instructors. I think in general it is a good choice, because it makes the book more accessible to a broad audience. #. That being said, I frequently teach a course geared toward engineering students and other math-heavy majors, so I'm not sure that this book would be fully suitable for my particular course in its present form (with expanded exercise selection, and expanded chapter 2, I would adopt it almost immediately).

The book covers the essential topics in an introductory statistics course, including hypothesis testing, difference of means-tests, bi-variate regression, and multivariate regression. The authors make effective use of graphs both to illustrate the... read more

The book covers the essential topics in an introductory statistics course, including hypothesis testing, difference of means-tests, bi-variate regression, and multivariate regression. The authors make effective use of graphs both to illustrate the subject matter and to teach students how to construct and interpret graphs in their own work. Examples from a variety of disciplines are used to illustrate the material. The discussion of data analysis is appropriately pitched for use in introductory quantitative analysis courses in a variety of disciplines in the social sciences . However, to meet the needs of this audience, the book should include more discussion of the measurement key concepts, construction of hypotheses, and research design (experiments and quasi-experiments). These are essential components of quantitative analysis courses in the social sciences.

The book covers familiar topics in statistics and quantitative analysis and the presentation of the material is accurate and effective.

One of the real strengths of the book is the many examples and datasets that it includes. Some of these will continue to be useful over time, but others may be may have a shorter shelf life. In particular, examples and datasets about county characteristics, elections, census data, etc, can become outdated fairly quickly.

Given that this is an introductory textbook, it is clearly written and accessible to students with a variety of disciplinary backgrounds. The purpose of the course is to teach students technical material and the book is well-designed for achieving that goal.

Like most statistics books, each topic builds on ones that have come before and readers will have no trouble following the terminology as they progress through the book.

One of the real strengths of the book is that it is nicely separated into coherent chapters and instructors would will have no trouble picking and choosing among them. For example, the authors have intentionally included a chapter on probability that some instructors may want to include, but others may choose to excludes without loss of continuity.

The book does build from a good foundation in univariate statistics and graphical presentation to hypothesis testing and linear regression. There are separate chapters on bi-variate and multiple regression and they work well together. The chapter on hypothesis testing is very clear and effectively used in subsequent chapters.

The formatting and interface are clear and effective. There are lots of graphs in the book and they are very readable. There are also pictures in the book and they appear clear and in the proper place in the chapters.

There are no issues with the grammar in the book.

The authors present material from lots of different contexts and use multiple examples. They have done an excellent job choosing ones that are likely to be of interest to and understandable by students with diverse backgrounds.

The supplementary material for this book is excellent, particularly if instructors are familiar with R and Latex. The code and datasets are available to reproduce materials from the book. And, the authors have provided Latex code for slides so that instructors can customize the slides to meet their own needs.

For a Statistics I course at most community colleges and some four year universities, this text thoroughly covers all necessary topics. For example, types of data, data collection, probability, normal model, confidence intervals and inference for... read more

For a Statistics I course at most community colleges and some four year universities, this text thoroughly covers all necessary topics. For example, types of data, data collection, probability, normal model, confidence intervals and inference for single proportions. A thoughtful index is provided at the end of the text as well as a strong library of homework / practice questions at the end of each chapter.

The content is accurate in terms of calculations and conclusions and draws on information from many sources, including the U.S. Census Bureau to introduce topics and for homework sets. Errors are not found as of yet. The content stays unbiased by constantly reminding the reader to consider data, context and what one’s conclusions might mean rather than being partial to an outcome or conclusions based on one’s personal beliefs in that the conclusions sense that statistics texts give special. Some examples of this include the discussion of anecdotal evidence, bias in data collection, flaws in thinking using probability and practical significance vs statistical significance.

The text is up to date and the content / data used is able to be modified or updated over time to help with the longevity of the text. For example, a scatterplot involving the poverty rate and federal spending per capita could be updated every year. Another example that would be easy to update and is unlikely to become non-relevant is email and amount of spam, used for numerous topics. The probability section uses a data set on smallpox to discuss inoculation, another relevant topic whose topic set could be easily updated. This selection of topics and their respective data sets are layered throughout the book. The book uses relevant topics throughout that could be quickly updated.

The writing style and context to not treat students like Phd academics (too high of a reading level), nor does it treat them like children (too low of a reading level). The text meets students at a nice place medium where they are challenged with thoughtful, real situations to consider and how and why statistical methods might be useful. For example, a goodness of fit test begins by having readers consider a situation of whether or not the ethnic representation of a jury is consistent with the ethnic representation of the area. The introduction of jargon is easy streamlined in after this example introduction.

Notation is consistent and easy to follow throughout the text. The text’s selection for notation with common elements such as p-hat, subscripts, compliments, standard error and standard deviation is very clear and consistent. Tables and graphs are sensibly annotated and well organized. Distributions and definitions that are defined are consistently referenced throughout the text as well as they apply or hold in the situations used.

Each chapter consists of 5-10 sections. These sections generally are all under ten page in total. This easily allow for small sets of reading on a class to class basis or larger sets of reading over a weekend. Each section within a chapter build on the previous sections making it easy to align content. For example, the inference for categorical data chapter is broken in five main section. Single proportion, two proportions, goodness of fit, test for independence and small sample hypothesis test for proportions. This keeps all inference for proportions close and concise helping the reader stay uninterrupted in the topic.

The topics are presented in a logical order with each major topics given a thorough treatment. The text begins with data collection, followed by probability and distributions of a random variable and then finishing (for a Statistics I course) with inference. Perhaps an even stronger structure would see all the types of content mentioned above applied to each type of data collection. That is, do probability and inference topics for a SRS, then do probability and inference for a stratified sample and each time taking your probability and inference ideas further so that they are constantly being built upon, from day one!

Navigation as a PDF document is simple since all chapters and subsection within the table of contents are hyperlinked to the respective section. Graphs and tables are clean and clearly referenced, although they are not hyperlinked in the sections. The only visual issues occurs in some graphs, such as on page 40-41, which have maps of the U.S. using color to show “intensity”. However with the print version, which can only show varying scales of white through black, it can be hard to compare “intensity”.

No grammatical errors have been found as of yet.

The text would not be found to be culturally insensitive in any way, as a large part of the investigations and questions are introspective of cultures and opinions. For example, income variations in two cities, ethnic distribution across the country, or synthesis of data from Africa.

The book has a great logical order, with concise thoughts and sections. While section are concise they are not limited in rigor or depth (as exemplified by a great section on the "power" of a hypothesis test) and numerous case studies to introduce topics. The reading of the book will challenge students but at the same time not leave them behind. Overall I like it a lot. The best statistics OER I have seen yet.

More depth in graphs: histograms especially. Percentiles? Also, non-parametric alternatives would be nice, especially Monte Carlo/bootstrapping methods. read more

More depth in graphs: histograms especially. Percentiles? Also, non-parametric alternatives would be nice, especially Monte Carlo/bootstrapping methods.

The most accurate open-source textbook in statistics I have found. Though I might define p-values and interpret confidence intervals slightly differently. I did not see much explanation on what it means to fail to reject Ho. I would consider this "omission" as almost inaccurate.

Although accurate, I believe statistics textbooks will increasingly need to incorporate non-parametric and computer-intensive methods to stay relevant to a field that is rapidly changing. Also, as fewer people do manual computations, interpretation of computer software output becomes increasingly important.

Quite clear. The text, though dense, is easy to read. More color, diagrams, photos? Marginal notes for key concepts & formulae?

No problems here.

This textbook is nicely parsed. Especially like homework problems clearly divided by concept.

Great job overall. However, the introduction to hypothesis testing is a bit awkward (this is not unusual). Create a clear way to explain this multi-faceted topic and the world will beat a path to your door.

No problems, but again, the text is a bit dense. Reads more like a 300-level text than 100/200-level. More color, diagrams, etc.?

I did not encounter any issues.

Overall it was not offensive to me, but I am a college-educated white guy. Examples of how statistics can address gender bias were appreciated. It would be nice to see more examples of how statistics can bring cultural/social/economic issues to light (without being heavy handed) would be very motivating to students.

Overall, this is the best open-source statistics text I have reviewed. Most contain glaring conceptual and pedagogical errors, and are painful to read (don't get me started on percentiles or confidence intervals). Also, a reminder for reviewers to save their work as they complete this review would be helpful.

The coverage of this text conforms to a solid standard (very classical) semester long introductory statistics course that begins with descriptive statistics, basic probability, and moves through the topics in frequentist inference including basic... read more

The coverage of this text conforms to a solid standard (very classical) semester long introductory statistics course that begins with descriptive statistics, basic probability, and moves through the topics in frequentist inference including basic hypothesis tests of means, categories, linear and multiple regression. The regression treatment of categorical predictors is limited to dummy coding (though not identified as such) with two levels in keeping with the introductory nature of the text. There is a bit of coverage on logistic regression appropriate for categorical (specifically, dichotomous) outcome variables that usually is not part of a basic introduction. Within each appears an adequate discussion of underlying assumptions and a representative array of applications. Some of the more advanced topics are treated as 'special topics' within the sections (e.g., power and standard error derivations). Some more modern concepts, such as various effect size measures, are not covered well or at all (for example, eta squared in ANOVA). However, classical measures of effect such as confidence intervals and R squared appear when appropriate though they are not explicitly identified as measures of effect.

Technical accuracy is a strength for this text especially with respect to underlying theory and impacts of assumptions.

The basics of classical inferential statistics changes little over time and this text covers that ground exceptionally well. More modern approaches to statistical methods, however, will need to include concepts of important to the current replicability crisis in research: measures of effect, extensive applications of power analyses, and Bayesian alternatives. The task of reworking statistical training in response to this crisis will be daunting for any text author not just this one.

One of the strengths of this text is the use of motivated examples underlying each major technique. These examples and techniques are very carefully described with quality graphical and visual aids to support learning. To many texts that cover basic theory are organized as theorem/proof/example which impedes understanding of the beginner. This defect is not present here: this text embraces an 'embodied' view of learning which prioritizes example applications first and then explanation of technique.

The consistency of this text is quite good. Notation, language, and approach are maintained throughout the chapters.

It is difficult for a topic that in inherently cumulative to excel at modularity in the manner that is usually understanding. Each topic builds on the one before it in any statistical methods course. This text does indicate that some topics can be omitted by identifying them as 'special topics'.

The structure and organization of this text corresponds to a very classic treatment of the topic. It begins with the basics of descriptive statistics, probability, hypothesis test concepts, tests of numerical variables, categorical, and ends with regression. I have seen other texts begin with correlation and regression prior to tests of means, etc., and wonder which approach is best.

This is the third edition and benefits from feedback from prior versions. I found no negative issues with regard to interface elements. It is a pdf download rather than strictly online so the format is more classical textbook as would be experienced in a print version.

Typos and errors were minimal (I could find none).

It is clear that the largest audience is assumed to be from the United States as most examples draw from regions in the U.S. (e.g., U.S. presidential elections, data from California, data from U.S. colleges, etc.) though some examples come from other parts of the world (Greece economics, Australian wildlife). The language seems to be free of bias.

This text is an excellent choice for an introductory statistics course that has a broad group of students from multiple disciplines. The basic theory is well covered and motivated by diverse examples from different fields. This diversity in discipline comes at the cost of specificity of techniques that appear in some fields such as the importance of measures of effect in psychology.

This book covers topics in a traditional curriculum of an introductory statistics course: probabilities, distributions, sampling distribution, hypothesis tests for means and proportions, linear regression, multiple regression and logistic... read more

<p> This book covers topics in a traditional curriculum of an introductory statistics course: probabilities, distributions, sampling distribution, hypothesis tests for means and proportions, linear regression, multiple regression and logistic regression. While the traditional curriculum does not cover multiple regression and logistic regression in an introductory statistics course, this book offers the information in these two areas. The book started with several examples and case study to introduce types of variables, sampling designs and experimental designs (chapter 1). It would be nice if the authors can start with the big picture of how people perform statistical analysis for a data set. Chapter 2 covers the knowledge of probabilities including the definition of probability, Law of Large Numbers, probability rules, conditional probability and independence and linear combinations of random variables. However, the linear combination of random variables is too much math focused and may not be good for students at the introductory level. Chapter 3 covers random variables and distributions including normal, geometry and binomial distributions. Chapter 4-6 cover the inferences for means and proportions and the Chi-square test. Chapter 7 and 8 cover the linear , multiple and logistic regression. The book used plenty of examples and included a lot of tips to understand basic concepts such as probabilities, p-values and significant levels etc. The book provides an effective index. The drawback of this book is that it does not cover how to use any computer software or even a graphing calculator to perform the calculations for inferences. All of the calculations covered in this book were performed by hand using the formulas. As the trend of analysis, students will be confronted with the needs to use computer software or a graphing calculator to perform the analyses. Calculations by hand are not realistic.</p>

<p> The content of the book is accurate and unbiased. However, when introducing the basic concepts of null and alternative hypotheses and the p-value, the book used different definitions than other textbooks. For example, when introducing the p-value, the authors used the definition "the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis is true." The wording "at least as favorable to the alternative hypothesis as our current data" is misleading. Students can easily get confused and think the p-value is in favor of the alternative hypothesis.</p>

<p> The content that this book focuses on is relatively stable and so changes would be few and far between. The content is up-to-date. Especially, this book covers Bayesian probabilities, false negative and false positive calculations. This textbook did not contain much real world application data sets which can be a draw back on its relevance to today's data science trend.</p>

<p> The text is written in lucid, accessible prose, and provides plenty of examples for students to understand the concepts and calculations. The text also provides enough context for students to understand the terminologies and definitions, especially this textbook provides plenty of tips for each concept and that is very helpful for students to understand the materials.</p>

<p> The text is quite consistent in terms of terminology and framework. The organization for each chapter is also consistent.</p>

<p> The text is easily and readily divisible into subsections. Each chapter contains short sections and each section contains small subsections. The text is easily reorganized and re-sequenced. The later chapters (chapter 4-8) are self-contained and can be re-ordered. The later chapters (chapters 4-8) are built upon the knowledge from the former chapters (chapters 1-3).</p>

<p> The overall organization of the text is logical. The later chapters on inferences and regression (chapters 4-8) are built upon the former chapters (chapters 1-3). But there are instances where similar topics are not arranged very well: 1) when introducing the sampling distribution in chapter 4, the authors should introduce both the sampling distribution of mean and the sampling distribution of proportion in the same chapter. The authors spend many pages on the sampling distribution of mean in chapter 4, but only a few sentences on the sampling distribution of proportion in chapter 6; 2) the authors introduced independence after talking about the conditional probability. Introducing independence using the definition of conditional probability P(A|B)=P(A) is more accurate and easier for students to understand. The order of introducing independence and conditional probability should be switched. The approach of introducing the inferences of proportions and the Chi-square test in the same chapter is novel. The students can easily see the connections between the two types of tests.</p>

<p> The text is free of significant interface issues. The graphs and tables in the text are well designed and accurate. These graphs and tables help the readers to understand the materials well, especially most of the graphs are colored figures.</p>

<p> The text contains no grammatical errors.</p>

<p> There is no evidence that the text is culturally insensiteve or offensive. Some examples are related to United States. Most of the examples are general and not culturally related. The text offered quite a lot of examples in the medical research field and that is probably related to the background of the authors.</p>

<p> Overall, this is a well written book for introductory level statistics. The text provides enough examples, exercises and tips for the readers to understand the materials. It also offered enough graphs and tables to facilatate the reading. The drawbacks of the textbook are: 1) it doesn't offer how to use of any computer software or graphing calculator to perform the calculations and analyses; 2) it didn't offer any real world data analysis examples.</p>

This text provides decent coverage of probability, inference, descriptive statistics, bivariate statistics, as well as introductory coverage of the bivariate and multiple linear regression model and logistics regression. Although there are some... read more

<p> This text provides decent coverage of probability, inference, descriptive statistics, bivariate statistics, as well as introductory coverage of the bivariate and multiple linear regression model and logistics regression. Although there are some materials on experimental and observational data, this is, first and foremost, a book on mathematical and applied statistics. Professors looking for in-depth coverage of research methods and data collection techniques will have to look elsewhere. The coverage of probability and statistics is, for the most part, sound. Most essential materials for an introductory probability and statistics course are covered. The authors do a terrific job in chapter 1 introducing key ideas about data collection, sampling, and rudimentary data analysis. Chapters 4-6 on statistical inference are especially strong, and the discussion of outliers and leverage in the regression chapters should prove useful to students who work with small n data sets. Teachers might quibble with a particular omission here or there (e.g., it would be nice to have kernel densities in chapter 1 to complement the histogram graphics and some more probability distributions for continuous random variables such as the F distribution), but any missing material could be readily supplemented. In other cases I found the omissions curious. For instance, the text shows students how to calculate the variance and standard deviation of an observed variable's distribution, but does not give the actual formula. As well, the authors define probability but this is not connected as directly as it could be to the 3 fundamental axioms that comprise the mathematical definition of probability. The authors limit their discussion on categorical data analysis to the chi square statistic, which centers on inference rather than on the substantive magnitude of the bivariate relationship. I wish they included measures of association for categorical data analysis that are used in sociology and political science, such as gamma, tau b and tau c, and Somers d. Finally, I think the book needs to add material on the desirable properties of statistical estimators (i.e., unbiasedness, efficiency, consistency). Appendix A contains solutions to the end of chapter exercises. The index is decent, but there is no glossary of terms or summary of formula, which is disappointing.</p>

<p> From what I can tell, the book is accurate in terms of what it covers. There are some things that should probably be included in subsequent revisions.</p>

<p> Statistical methods, statistical inference and data analysis techniques do change much over time; therefore, I suspect the book will be relevant for years to come. The key will be ensuring that the latest research trends/improvements/refinements are added to the book and that omitted materials are added into subsequent editions.</p>

<p> The book is clear and well written. All of the chapters contain a number of useful tips on best practices and common misunderstandings in statistical analysis. There are also a number of exercises embedded in the text immediately after key ideas and concepts are presented. I suspect these will prove quite helpful to students. The authors also make GREAT use of statistical graphics in all the chapters. Overall, the book is heavy on using ordinary language and common sense illustrations to get across the main ideas. They draw examples from sources (e.g., The Daily Show, The Colbert Report) and daily living (e.g., Mario Kart video games) that college students will surely appreciate. There are no proofs that might appeal to the more mathematically inclined. There are lots of great exercises at the end of each chapter that professors can use to reinforce the concepts and calculations appearing in the chapter. I also appreciated that the authors use examples from the hard sciences, life sciences, and social sciences. This will increase the appeal of the text.</p>

<p> The book is very consistent from what I can see.</p>

<p> This book can work in a number of ways. A teacher can sample the germane chapters and incorporate them without difficulty in any research methods class. Things flow together so well that the book can be used as is.</p>

<p> The organization is fine. The book presents all the topics in an appropriate sequence.</p>

<p> The interface is fine. I didn't experience any problems. The color graphics come through clearly and the embedded links work as they should.</p>

<p> I didn't see any errors, it looks fine.</p>

<p> The book is not culturally offensive.</p>

<p> Teachers looking for a text that they can use to introduce students to probability and basic statistics should find this text helpful. It might be asking too much to use it as a standalone text, but it could work very well as a supplement to a more detailed treatment or in conjunction with some really good slides on the various topics. I think it would work well for liberal arts/social science students, but not for economics/math/science students who would need more mathematical rigor.</p>

The text has a thorough introduction to data exploration, probability, statistical distributions, and the foundations of inference, but less complete discussions of specific methods, including one- and two-sample inference, contingency tables,... read more

<p> The text has a thorough introduction to data exploration, probability, statistical distributions, and the foundations of inference, but less complete discussions of specific methods, including one- and two-sample inference, contingency tables, and linear and logistic regression. Supposedly intended for "introductory statistics courses at the high school through university levels", it's not clear where this text would fit in at my institution. It includes too much theory for our undergraduate service courses, but not enough practical details for our graduate-level service courses.</p>

<p> The text is mostly accurate, especially the sections on probability and statistical distributions, but there are some puzzling gaffes. For example, it is claimed that the Poisson distribution is suitable only for rare events (p. 148); the unequal-variances form of the standard error of the difference between means is used in conjunction with the t-distribution, with no mention of the need for the Satterthwaite adjustment of the degrees of freedom (p. 231); and the degrees of freedom in the chi-square goodness-of-fit test are not adjusted for the number of estimated parameters (p. 282).</p>

<p> Some of the content seems dated. For example, there is a strong emphasis on assessing the normality assumption, even though most of the covered methods work well for non-normal data with reasonable sample sizes. Normal approximations are presented as the tool of choice for working with binomial data, even though exact methods are efficiently implemented in modern computer packages. Fisher's exact test is not even mentioned. The section on model selection, covering just backward elimination and forward selection, seems especially old-fashioned.</p>

<p> The prose is sometimes tortured and imprecise. For example: "Researchers perform an observational study when they collect data in a way that does not directly interfere with how the data arise" (p. 13). "Standard error" is defined as the "standard deviation associated with an estimate" (p. 163), but it is often unclear whether population or sample-based quantities are being referred to. Use of the t-distribution is motivated as a way to "resolve the problem of a poorly estimated standard error", when really it is a way to properly characterize the distribution of a test statistic having a sample-based standard error in the denominator.</p>

<p> As in many/most statistics texts, it is a challenge to understand the authors' distinction between "standard deviation" and "standard error". The title of Chapter 5, "Inference for numerical data", took me by surprise, after the extensive use of numerical data in the discussion of inference in Chapter 4. Some topics seem to be introduced repeatedly, e.g., the Central Limit Theorem (pp. 167, 185, and 222) and the comparison of two proportions (pp. 191 and 268). The authors are sloppy in their use of hat notation when discussing regression models, expressing the fitted value as a function of the parameters, instead of the estimated parameters (pp. 325 and 357).</p>

<p> The text includes sections that could easily be extracted as modules. For example, I can imagine using pieces of Chapters 2 (Probability) and 3 (Distributions of random variables) to motivate methods that I discuss in service courses.</p>

<p> Chapters 1 through 4, covering data, probability, distributions, and principles of inference flow nicely, but the remaining chapters seem like a somewhat haphazard treatment of some commonly used methods. One-way analysis of variance is introduced as a special topic, with no mention that it is a generalization of the equal-variances t-test to more than two groups. The final chapter (8) gives superficial treatments of two huge topics, multiple linear regression and logistic regression, with insufficient detail to guide serious users of these methods. It is as if the authors ran out of gas after the first seven chapters and decided to use the final chapter as a catchall for some important, uncovered topics.</p>

<p> The interface is nicely designed. The availability of data sets and functions at a website (www.openintro.org) and as an R package (cran.r-project.org/web/packages/openintro) is a huge plus that greatly increases the usefulness of the text.</p>

<p> There are distracting grammatical errors. "Data" is sometimes singular, sometimes plural in the authors' prose. Other examples: "Each of the conclusions are based on some data" (p. 9); "You might already be familiar with many aspects of probability, however, formalization of the concepts is new for most" (p. 68); and "Sometimes two variables is one too many" (p. 21).</p>

<p> I have no idea how to characterize the cultural relevance of a statistics textbook.</p>

<p> In my opinion, the text is not a strong candidate for an introductory textbook for typical statistics courses, but it contains many sections (particulary on probability and statistical distributions) that could profitably be used as supplemental material in such courses.</p>

## Table of Contents

- 1. Introduction to data.
- 2. Summarizing data.
- 3. Probability.
- 4. Distributions of random variables.
- 5. Foundations for inference.
- 6. Inference for categorical data.
- 7. Inference for numerical data.
- 8. Introduction to linear regression.
- 9. Multiple and logistic regression.

## Ancillary Material

## About the Book

OpenIntro Statistics covers a first course in statistics, providing a rigorous introduction to applied

statistics that is clear, concise, and accessible. This book was written with the undergraduate level

in mind, but it’s also popular in high schools and graduate courses.

We hope readers will take away three ideas from this book in addition to forming a foundation

of statistical thinking and methods.

• Statistics is an applied field with a wide range of practical applications.

• You don’t have to be a math guru to learn from real, interesting data.

• Data are messy, and statistical tools are imperfect. But, when you understand the strengths

and weaknesses of these tools, you can use them to learn about the world.

## About the Contributors

### Authors

**David M. Diez** is a Quantitative Analyst at Google where he works with massive data sets and performs statistical analyses in areas such as user behavior and forecasting.

**Christopher D. Barr** is an Assistant Research Professor with the Texas Institute for Measurement, Evaluation, and Statistics at the University of Houston.

**Mine Cetinkaya-Rundel** is the Director of Undergraduate Studies and Assistant Professor of the Practice in the Department of Statistical Science at Duke University.