OpenIntro Statistics
David Diez, Harvard School of Public Health
Christopher Barr, Harvard School of Public Health
Mine Cetinkaya-Rundel, Duke University
Pub Date: 2015
Publisher: OpenIntro
Language: English
Read this book
Conditions of Use
Attribution-ShareAlike
CC BY-SA
Reviews
The texts includes basic topics for an introductory course in descriptive and inferential statistics. The approach is mathematical with some applications. More extensive coverage of contingency tables and bivariate measures of association would... read more
The texts includes basic topics for an introductory course in descriptive and inferential statistics. The approach is mathematical with some applications. More extensive coverage of contingency tables and bivariate measures of association would be helpful. Probability is an important topic that is included as a "special topic" in the course.
The text and graphs are accurate.
My interest in this text is for a graduate course in applied statistics in the field of public service. This is a particular use of the text, and my students would benefit from and be interested in more social-political-economic examples. Some examples in the text are traditional ones that are overused, i.e., throwing dice and drawing cards to teach probability. The examples for tree diagrams are very good, e.g., small pox in Boston, breast cancer.
The writing is clear, and numerous graphs and examples make concepts accessible to students. The text, however, is not engaging and can be dry.
The text is consistent.
The text is organized into sections, and the numbering system within each chapter facilitates assigning sections of a chapter. This is a statistics text, and much of the content would be kept in this order.
The content is well-organized. The flow of a chapter is especially good when the authors continue to use a certain example in developing related concepts. There are exercises at the end of each chapter (and exercise solutions at the end of the text).
Display of graphs and figures is good, as is the use of color. The graphs are readable in black and white also. The text is in PDF format; there are no problems of navigation.
There are no grammatical errors.
The examples are general and do not deal with racial or cultural matters.
This text will be useful as a supplement in the graduate course in applied statistics for public service.
The text covers the foundations of data, distributions, probability, regression principles and inferential principles with a very broad net. It is certainly a fitting means of introducing all of these concepts to fledgling research students. At... read more
The text covers the foundations of data, distributions, probability, regression principles and inferential principles with a very broad net. It is certainly a fitting means of introducing all of these concepts to fledgling research students. At the same time, the material is covered in such a matter as to provide future research practitioners with a means of understanding the possibilities when considering research that may prove to be of value in their respective fields. In other words, breadth, yes; and depth, not so much. It can be considered comprehensive if you consider this an introductory text. It's very fitting for my use with teachers whose primary focus is on data analysis rather than post-graduate research.
The text is accurate due to its rather straight forward approach to presenting material. In fact, I particularly like that the authors occasionally point out means by which data or statistics can be presented in a method that can distort the truth. Additionally concepts related to flawed practices in data collection and analysis were presented to point out how inaccuracies could arise in research.
While it would seem that the data in a statistics textbook would remain relevant forever, there are a few factors that may impact such a textbook's relevance and longevity. Since this particular textbook relies heavily on the use of scenarios or case study type examples to introduce/teach concepts, the need to update this information on occasion is real. These updates would serve to ensure the connection between the learner and the material that is conducive to learning. Additionally, as research and analytical methods evolve, then so will the need to cover more non-traditional types of content i.e mixed methodologies, non parametric data sets, new technological research tools etc.
I feel that the greatest strength of this text is its clarity. The simple mention of the subject "statistics" can strike fear in the minds of many students. Perhaps we don't help the situation much with the way we begin launching statistical terminology while demonstrating a few "concepts" on a white board. Well, this text provides a kinder and gentler introduction to data analysis and statistics. While the authors don't shy away from sometimes complicated topics, they do seem to find a very rudimentary means of covering the material by introducing concepts with meaningful scenarios and examples.
On occasion, all of us in academia have experienced a text where the progression from one chapter to another was not very seamless. This is especially true when there are multiple authors. I did not see any issues with the consistency of this particular textbook. In fact, I could not differentiate a change in style or clarity in any sections of this text. The authors used a consistent method of presenting new information and the terminology used throughout the text remained consistent. This is sometimes a problem in statistics as there are a variety of ways to express the similar statistical concepts. This can be particularly confusing to "beginners."
While to some degree the text is easily and readily divisible into smaller reading sections, I would not recommend that anyone alter the sequence of the content until after Chapters 1, 3, and 4 are completed. Materials in the later sections of the text are snaffled upon content covered in these initial chapters. The authors point out that Chapter 2, which deals with probabilities, is optional and not a prerequisite for grasping the content covered in the later chapters. Of course, the content in Chapters 5-8 would surely be useful as supplementary materials/refreshers for students who have mastered the basics in previous statistical coursework.
After much searching, I particularly like the scope and sequence of this textbook. As aforementioned, the authors gently introduce students to very basic statistical concepts. These concepts are reinforced by authentic examples that allow students to connect to the material and see how it is applied in the real world. This introductory material then serves as the foundation for later chapter where students are introduced to inferential statistical practices. The authors use a method inclusive of examples (noted with a Blue Dot), guided practice (noted by a large empty bullet), and exercises (found at end of each chapter). I find this method serves to give the students confidence in knowing that they understand concepts before moving on to new material. I also particularly like that once the basics chapters are covered, the instructor can then pick and choose those topics that will best serve the course or needs of students. In some instances, various groups of students may be directed to certain chapters, while others hone in on that material relevant to their topic.
I viewed the text as a PDF and was pleasantly surprised at the clarity the fluid navigation that is not the norm with many PDFs. The document was very legible. The graphs and diagrams were also clear and provided information in a way that aided in understanding concepts. This was not necessarily the case with some of the tables in the text. I was sometimes confused by tables with missing data or, as was the case on page 11, when the table was sideways on the page.
I did not see any grammatical issues that distract form the content presented.
I did not view an material that I felt would be offensive. The material was culturally relevant to the demographic most likely to use the text in the United State. This is important since examples used authentic situations to connect to the readers. While the examples did connect with the diversity within our country or i.e. the U.K., they may not be the best examples that could be used to connect with those from non-western countries.
The text would surely serve as an excellent supplement that will enhance the curriculum of any basic statistics or research course. While the text could be used in both undergraduate and graduate courses, it is best suited for the social sciences.
There is one section that is under-developed (general concepts about continuous probability distributions), but aside from this, I think the book provides a good coverage of topics appropriate for an introductory statistics course. read more
There is one section that is under-developed (general concepts about continuous probability distributions), but aside from this, I think the book provides a good coverage of topics appropriate for an introductory statistics course.
I did not see any inaccuracies in the book.
I do not see introductory statistics content ever becoming obsolete.
I think that the book is fairly easy to read. The authors bold important terms, and frequently put boxes around important formulas or definitions. If anything, I would prefer the book to have slightly more mathematical notation.
I did not see any problems in regards to the book's notation or terminology. It appears smooth and seamless.
The book is broken into small sections for each topic. Any significant rearranging of those sections would be incredibly detrimental to the reader, but that is true of any statistics textbook, especially at the introductory level: Earlier concepts provide the basis for later concepts.
For the most part I liked the flow of the book, though there were a few instances where I would have liked to see some different organization. For example, the Central Limit Theorem is introduced and used early in the inference section, and then later examined in more detail. I would tend to group this in with sampling distributions. Also, for how the authors seem to be focusing on practicalities, I was somewhat surprised about some of the organization of the inference sections. The authors use the Z distribution to work through much of the 1-sample inference. The t distribution is introduced much later. I realize this is how some prefer it, but I think introducing the t distribution sooner is more practical. The organization in chapter 5 also seems a bit convoluted to me. The chapter is about "inference for numerical data". They authors already discussed 1-sample inference in chapter 4, so the first two sections in chapter 5 are Paired Data and Difference of Means, then they introduce the t-distribution and go back to 1-sample inference for the mean, and then to inference for two means using he t-distribution. It strikes me as jumping around a bit. Overall the organization is good, so I'm still rating it high, but individual instructors may disagree with some of the order of presentation.
In general I was satisfied. My only complaint in this is that, unlike a number of "standard" introductory statistics textbooks I have seen, is that the exercises are organized in a page-wide format, instead of, say, in two columns. I assume this is for the benefit of those using mobile devices to view the book, but scrolling through on a computer, the sections and the exercises tend to blend together. Some more separation between sections, and between text vs. exercises would be appreciated.
I think it's fine.
The examples and exercises seem to be USA-centric (though I did spot one or two UK-based examples), but I do not think that it was being insensitive to any group.
In addition to the above item-specific comments: #. I think that the first chapter has some good content about experiments vs. observational studies, and about sampling. Better than most of the introductory book that I have used thus far (granted, my books were more geared towards engineers). #. Some of the sections have only a few exercises, and more exercises are provided at the end of chapters. This is similar to many other textbooks, but since there are generally fewer section exercises, they are easy to miss when scrolling through, and provide less selection for instructors. I think it would be better to group all of the chapter's exercises until each section can have a greater number of exercises. #. I do not think that the exercises focus in on any discipline, nor do they exclude any discipline. This could be either a positive or a negative to individual instructors. I think in general it is a good choice, because it makes the book more accessible to a broad audience. #. That being said, I frequently teach a course geared toward engineering students and other math-heavy majors, so I'm not sure that this book would be fully suitable for my particular course in its present form (with expanded exercise selection, and expanded chapter 2, I would adopt it almost immediately).
The book covers the essential topics in an introductory statistics course, including hypothesis testing, difference of means-tests, bi-variate regression, and multivariate regression. The authors make effective use of graphs both to illustrate the... read more
The book covers the essential topics in an introductory statistics course, including hypothesis testing, difference of means-tests, bi-variate regression, and multivariate regression. The authors make effective use of graphs both to illustrate the subject matter and to teach students how to construct and interpret graphs in their own work. Examples from a variety of disciplines are used to illustrate the material. The discussion of data analysis is appropriately pitched for use in introductory quantitative analysis courses in a variety of disciplines in the social sciences . However, to meet the needs of this audience, the book should include more discussion of the measurement key concepts, construction of hypotheses, and research design (experiments and quasi-experiments). These are essential components of quantitative analysis courses in the social sciences.
The book covers familiar topics in statistics and quantitative analysis and the presentation of the material is accurate and effective.
One of the real strengths of the book is the many examples and datasets that it includes. Some of these will continue to be useful over time, but others may be may have a shorter shelf life. In particular, examples and datasets about county characteristics, elections, census data, etc, can become outdated fairly quickly.
Given that this is an introductory textbook, it is clearly written and accessible to students with a variety of disciplinary backgrounds. The purpose of the course is to teach students technical material and the book is well-designed for achieving that goal.
Like most statistics books, each topic builds on ones that have come before and readers will have no trouble following the terminology as they progress through the book.
One of the real strengths of the book is that it is nicely separated into coherent chapters and instructors would will have no trouble picking and choosing among them. For example, the authors have intentionally included a chapter on probability that some instructors may want to include, but others may choose to excludes without loss of continuity.
The book does build from a good foundation in univariate statistics and graphical presentation to hypothesis testing and linear regression. There are separate chapters on bi-variate and multiple regression and they work well together. The chapter on hypothesis testing is very clear and effectively used in subsequent chapters.
The formatting and interface are clear and effective. There are lots of graphs in the book and they are very readable. There are also pictures in the book and they appear clear and in the proper place in the chapters.
There are no issues with the grammar in the book.
The authors present material from lots of different contexts and use multiple examples. They have done an excellent job choosing ones that are likely to be of interest to and understandable by students with diverse backgrounds.
The supplementary material for this book is excellent, particularly if instructors are familiar with R and Latex. The code and datasets are available to reproduce materials from the book. And, the authors have provided Latex code for slides so that instructors can customize the slides to meet their own needs.
For a Statistics I course at most community colleges and some four year universities, this text thoroughly covers all necessary topics. For example, types of data, data collection, probability, normal model, confidence intervals and inference for... read more
For a Statistics I course at most community colleges and some four year universities, this text thoroughly covers all necessary topics. For example, types of data, data collection, probability, normal model, confidence intervals and inference for single proportions. A thoughtful index is provided at the end of the text as well as a strong library of homework / practice questions at the end of each chapter.
The content is accurate in terms of calculations and conclusions and draws on information from many sources, including the U.S. Census Bureau to introduce topics and for homework sets. Errors are not found as of yet. The content stays unbiased by constantly reminding the reader to consider data, context and what one’s conclusions might mean rather than being partial to an outcome or conclusions based on one’s personal beliefs in that the conclusions sense that statistics texts give special. Some examples of this include the discussion of anecdotal evidence, bias in data collection, flaws in thinking using probability and practical significance vs statistical significance.
The text is up to date and the content / data used is able to be modified or updated over time to help with the longevity of the text. For example, a scatterplot involving the poverty rate and federal spending per capita could be updated every year. Another example that would be easy to update and is unlikely to become non-relevant is email and amount of spam, used for numerous topics. The probability section uses a data set on smallpox to discuss inoculation, another relevant topic whose topic set could be easily updated. This selection of topics and their respective data sets are layered throughout the book. The book uses relevant topics throughout that could be quickly updated.
The writing style and context to not treat students like Phd academics (too high of a reading level), nor does it treat them like children (too low of a reading level). The text meets students at a nice place medium where they are challenged with thoughtful, real situations to consider and how and why statistical methods might be useful. For example, a goodness of fit test begins by having readers consider a situation of whether or not the ethnic representation of a jury is consistent with the ethnic representation of the area. The introduction of jargon is easy streamlined in after this example introduction.
Notation is consistent and easy to follow throughout the text. The text’s selection for notation with common elements such as p-hat, subscripts, compliments, standard error and standard deviation is very clear and consistent. Tables and graphs are sensibly annotated and well organized. Distributions and definitions that are defined are consistently referenced throughout the text as well as they apply or hold in the situations used.
Each chapter consists of 5-10 sections. These sections generally are all under ten page in total. This easily allow for small sets of reading on a class to class basis or larger sets of reading over a weekend. Each section within a chapter build on the previous sections making it easy to align content. For example, the inference for categorical data chapter is broken in five main section. Single proportion, two proportions, goodness of fit, test for independence and small sample hypothesis test for proportions. This keeps all inference for proportions close and concise helping the reader stay uninterrupted in the topic.
The topics are presented in a logical order with each major topics given a thorough treatment. The text begins with data collection, followed by probability and distributions of a random variable and then finishing (for a Statistics I course) with inference. Perhaps an even stronger structure would see all the types of content mentioned above applied to each type of data collection. That is, do probability and inference topics for a SRS, then do probability and inference for a stratified sample and each time taking your probability and inference ideas further so that they are constantly being built upon, from day one!
Navigation as a PDF document is simple since all chapters and subsection within the table of contents are hyperlinked to the respective section. Graphs and tables are clean and clearly referenced, although they are not hyperlinked in the sections. The only visual issues occurs in some graphs, such as on page 40-41, which have maps of the U.S. using color to show “intensity”. However with the print version, which can only show varying scales of white through black, it can be hard to compare “intensity”.
No grammatical errors have been found as of yet.
The text would not be found to be culturally insensitive in any way, as a large part of the investigations and questions are introspective of cultures and opinions. For example, income variations in two cities, ethnic distribution across the country, or synthesis of data from Africa.
The book has a great logical order, with concise thoughts and sections. While section are concise they are not limited in rigor or depth (as exemplified by a great section on the "power" of a hypothesis test) and numerous case studies to introduce topics. The reading of the book will challenge students but at the same time not leave them behind. Overall I like it a lot. The best statistics OER I have seen yet.
More depth in graphs: histograms especially. Percentiles? Also, non-parametric alternatives would be nice, especially Monte Carlo/bootstrapping methods. read more
More depth in graphs: histograms especially. Percentiles? Also, non-parametric alternatives would be nice, especially Monte Carlo/bootstrapping methods.
The most accurate open-source textbook in statistics I have found. Though I might define p-values and interpret confidence intervals slightly differently. I did not see much explanation on what it means to fail to reject Ho. I would consider this "omission" as almost inaccurate.
Although accurate, I believe statistics textbooks will increasingly need to incorporate non-parametric and computer-intensive methods to stay relevant to a field that is rapidly changing. Also, as fewer people do manual computations, interpretation of computer software output becomes increasingly important.
Quite clear. The text, though dense, is easy to read. More color, diagrams, photos? Marginal notes for key concepts & formulae?
No problems here.
This textbook is nicely parsed. Especially like homework problems clearly divided by concept.
Great job overall. However, the introduction to hypothesis testing is a bit awkward (this is not unusual). Create a clear way to explain this multi-faceted topic and the world will beat a path to your door.
No problems, but again, the text is a bit dense. Reads more like a 300-level text than 100/200-level. More color, diagrams, etc.?
I did not encounter any issues.
Overall it was not offensive to me, but I am a college-educated white guy. Examples of how statistics can address gender bias were appreciated. It would be nice to see more examples of how statistics can bring cultural/social/economic issues to light (without being heavy handed) would be very motivating to students.
Overall, this is the best open-source statistics text I have reviewed. Most contain glaring conceptual and pedagogical errors, and are painful to read (don't get me started on percentiles or confidence intervals). Also, a reminder for reviewers to save their work as they complete this review would be helpful.
The coverage of this text conforms to a solid standard (very classical) semester long introductory statistics course that begins with descriptive statistics, basic probability, and moves through the topics in frequentist inference including basic... read more
The coverage of this text conforms to a solid standard (very classical) semester long introductory statistics course that begins with descriptive statistics, basic probability, and moves through the topics in frequentist inference including basic hypothesis tests of means, categories, linear and multiple regression. The regression treatment of categorical predictors is limited to dummy coding (though not identified as such) with two levels in keeping with the introductory nature of the text. There is a bit of coverage on logistic regression appropriate for categorical (specifically, dichotomous) outcome variables that usually is not part of a basic introduction. Within each appears an adequate discussion of underlying assumptions and a representative array of applications. Some of the more advanced topics are treated as 'special topics' within the sections (e.g., power and standard error derivations). Some more modern concepts, such as various effect size measures, are not covered well or at all (for example, eta squared in ANOVA). However, classical measures of effect such as confidence intervals and R squared appear when appropriate though they are not explicitly identified as measures of effect.
Technical accuracy is a strength for this text especially with respect to underlying theory and impacts of assumptions.
The basics of classical inferential statistics changes little over time and this text covers that ground exceptionally well. More modern approaches to statistical methods, however, will need to include concepts of important to the current replicability crisis in research: measures of effect, extensive applications of power analyses, and Bayesian alternatives. The task of reworking statistical training in response to this crisis will be daunting for any text author not just this one.
One of the strengths of this text is the use of motivated examples underlying each major technique. These examples and techniques are very carefully described with quality graphical and visual aids to support learning. To many texts that cover basic theory are organized as theorem/proof/example which impedes understanding of the beginner. This defect is not present here: this text embraces an 'embodied' view of learning which prioritizes example applications first and then explanation of technique.
The consistency of this text is quite good. Notation, language, and approach are maintained throughout the chapters.
It is difficult for a topic that in inherently cumulative to excel at modularity in the manner that is usually understanding. Each topic builds on the one before it in any statistical methods course. This text does indicate that some topics can be omitted by identifying them as 'special topics'.
The structure and organization of this text corresponds to a very classic treatment of the topic. It begins with the basics of descriptive statistics, probability, hypothesis test concepts, tests of numerical variables, categorical, and ends with regression. I have seen other texts begin with correlation and regression prior to tests of means, etc., and wonder which approach is best.
This is the third edition and benefits from feedback from prior versions. I found no negative issues with regard to interface elements. It is a pdf download rather than strictly online so the format is more classical textbook as would be experienced in a print version.
Typos and errors were minimal (I could find none).
It is clear that the largest audience is assumed to be from the United States as most examples draw from regions in the U.S. (e.g., U.S. presidential elections, data from California, data from U.S. colleges, etc.) though some examples come from other parts of the world (Greece economics, Australian wildlife). The language seems to be free of bias.
This text is an excellent choice for an introductory statistics course that has a broad group of students from multiple disciplines. The basic theory is well covered and motivated by diverse examples from different fields. This diversity in discipline comes at the cost of specificity of techniques that appear in some fields such as the importance of measures of effect in psychology.
This book covers topics in a traditional curriculum of an introductory statistics course: probabilities, distributions, sampling distribution, hypothesis tests for means and proportions, linear regression, multiple regression and logistic... read more
This book covers topics in a traditional curriculum of an introductory statistics course: probabilities, distributions, sampling distribution, hypothesis tests for means and proportions, linear regression, multiple regression and logistic regression. While the traditional curriculum does not cover multiple regression and logistic regression in an introductory statistics course, this book offers the information in these two areas. The book started with several examples and case study to introduce types of variables, sampling designs and experimental designs (chapter 1). It would be nice if the authors can start with the big picture of how people perform statistical analysis for a data set. Chapter 2 covers the knowledge of probabilities including the definition of probability, Law of Large Numbers, probability rules, conditional probability and independence and linear combinations of random variables. However, the linear combination of random variables is too much math focused and may not be good for students at the introductory level. Chapter 3 covers random variables and distributions including normal, geometry and binomial distributions. Chapter 4-6 cover the inferences for means and proportions and the Chi-square test. Chapter 7 and 8 cover the linear , multiple and logistic regression. The book used plenty of examples and included a lot of tips to understand basic concepts such as probabilities, p-values and significant levels etc. The book provides an effective index. The drawback of this book is that it does not cover how to use any computer software or even a graphing calculator to perform the calculations for inferences. All of the calculations covered in this book were performed by hand using the formulas. As the trend of analysis, students will be confronted with the needs to use computer software or a graphing calculator to perform the analyses. Calculations by hand are not realistic.
The content of the book is accurate and unbiased. However, when introducing the basic concepts of null and alternative hypotheses and the p-value, the book used different definitions than other textbooks. For example, when introducing the p-value, the authors used the definition "the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis is true." The wording "at least as favorable to the alternative hypothesis as our current data" is misleading. Students can easily get confused and think the p-value is in favor of the alternative hypothesis.
The content that this book focuses on is relatively stable and so changes would be few and far between. The content is up-to-date. Especially, this book covers Bayesian probabilities, false negative and false positive calculations. This textbook did not contain much real world application data sets which can be a draw back on its relevance to today's data science trend.
The text is written in lucid, accessible prose, and provides plenty of examples for students to understand the concepts and calculations. The text also provides enough context for students to understand the terminologies and definitions, especially this textbook provides plenty of tips for each concept and that is very helpful for students to understand the materials.
The text is quite consistent in terms of terminology and framework. The organization for each chapter is also consistent.
The text is easily and readily divisible into subsections. Each chapter contains short sections and each section contains small subsections. The text is easily reorganized and re-sequenced. The later chapters (chapter 4-8) are self-contained and can be re-ordered. The later chapters (chapters 4-8) are built upon the knowledge from the former chapters (chapters 1-3).
The overall organization of the text is logical. The later chapters on inferences and regression (chapters 4-8) are built upon the former chapters (chapters 1-3). But there are instances where similar topics are not arranged very well: 1) when introducing the sampling distribution in chapter 4, the authors should introduce both the sampling distribution of mean and the sampling distribution of proportion in the same chapter. The authors spend many pages on the sampling distribution of mean in chapter 4, but only a few sentences on the sampling distribution of proportion in chapter 6; 2) the authors introduced independence after talking about the conditional probability. Introducing independence using the definition of conditional probability P(A|B)=P(A) is more accurate and easier for students to understand. The order of introducing independence and conditional probability should be switched. The approach of introducing the inferences of proportions and the Chi-square test in the same chapter is novel. The students can easily see the connections between the two types of tests.
The text is free of significant interface issues. The graphs and tables in the text are well designed and accurate. These graphs and tables help the readers to understand the materials well, especially most of the graphs are colored figures.
The text contains no grammatical errors.
There is no evidence that the text is culturally insensiteve or offensive. Some examples are related to United States. Most of the examples are general and not culturally related. The text offered quite a lot of examples in the medical research field and that is probably related to the background of the authors.
Overall, this is a well written book for introductory level statistics. The text provides enough examples, exercises and tips for the readers to understand the materials. It also offered enough graphs and tables to facilatate the reading. The drawbacks of the textbook are: 1) it doesn't offer how to use of any computer software or graphing calculator to perform the calculations and analyses; 2) it didn't offer any real world data analysis examples.
This text provides decent coverage of probability, inference, descriptive statistics, bivariate statistics, as well as introductory coverage of the bivariate and multiple linear regression model and logistics regression. Although there are some... read more
This text provides decent coverage of probability, inference, descriptive statistics, bivariate statistics, as well as introductory coverage of the bivariate and multiple linear regression model and logistics regression. Although there are some materials on experimental and observational data, this is, first and foremost, a book on mathematical and applied statistics. Professors looking for in-depth coverage of research methods and data collection techniques will have to look elsewhere. The coverage of probability and statistics is, for the most part, sound. Most essential materials for an introductory probability and statistics course are covered. The authors do a terrific job in chapter 1 introducing key ideas about data collection, sampling, and rudimentary data analysis. Chapters 4-6 on statistical inference are especially strong, and the discussion of outliers and leverage in the regression chapters should prove useful to students who work with small n data sets. Teachers might quibble with a particular omission here or there (e.g., it would be nice to have kernel densities in chapter 1 to complement the histogram graphics and some more probability distributions for continuous random variables such as the F distribution), but any missing material could be readily supplemented. In other cases I found the omissions curious. For instance, the text shows students how to calculate the variance and standard deviation of an observed variable's distribution, but does not give the actual formula. As well, the authors define probability but this is not connected as directly as it could be to the 3 fundamental axioms that comprise the mathematical definition of probability. The authors limit their discussion on categorical data analysis to the chi square statistic, which centers on inference rather than on the substantive magnitude of the bivariate relationship. I wish they included measures of association for categorical data analysis that are used in sociology and political science, such as gamma, tau b and tau c, and Somers d. Finally, I think the book needs to add material on the desirable properties of statistical estimators (i.e., unbiasedness, efficiency, consistency). Appendix A contains solutions to the end of chapter exercises. The index is decent, but there is no glossary of terms or summary of formula, which is disappointing.
From what I can tell, the book is accurate in terms of what it covers. There are some things that should probably be included in subsequent revisions.
Statistical methods, statistical inference and data analysis techniques do change much over time; therefore, I suspect the book will be relevant for years to come. The key will be ensuring that the latest research trends/improvements/refinements are added to the book and that omitted materials are added into subsequent editions.
The book is clear and well written. All of the chapters contain a number of useful tips on best practices and common misunderstandings in statistical analysis. There are also a number of exercises embedded in the text immediately after key ideas and concepts are presented. I suspect these will prove quite helpful to students. The authors also make GREAT use of statistical graphics in all the chapters. Overall, the book is heavy on using ordinary language and common sense illustrations to get across the main ideas. They draw examples from sources (e.g., The Daily Show, The Colbert Report) and daily living (e.g., Mario Kart video games) that college students will surely appreciate. There are no proofs that might appeal to the more mathematically inclined. There are lots of great exercises at the end of each chapter that professors can use to reinforce the concepts and calculations appearing in the chapter. I also appreciated that the authors use examples from the hard sciences, life sciences, and social sciences. This will increase the appeal of the text.
The book is very consistent from what I can see.
This book can work in a number of ways. A teacher can sample the germane chapters and incorporate them without difficulty in any research methods class. Things flow together so well that the book can be used as is.
The organization is fine. The book presents all the topics in an appropriate sequence.
The interface is fine. I didn't experience any problems. The color graphics come through clearly and the embedded links work as they should.
I didn't see any errors, it looks fine.
The book is not culturally offensive.
Teachers looking for a text that they can use to introduce students to probability and basic statistics should find this text helpful. It might be asking too much to use it as a standalone text, but it could work very well as a supplement to a more detailed treatment or in conjunction with some really good slides on the various topics. I think it would work well for liberal arts/social science students, but not for economics/math/science students who would need more mathematical rigor.
The text has a thorough introduction to data exploration, probability, statistical distributions, and the foundations of inference, but less complete discussions of specific methods, including one- and two-sample inference, contingency tables,... read more
The text has a thorough introduction to data exploration, probability, statistical distributions, and the foundations of inference, but less complete discussions of specific methods, including one- and two-sample inference, contingency tables, and linear and logistic regression. Supposedly intended for "introductory statistics courses at the high school through university levels", it's not clear where this text would fit in at my institution. It includes too much theory for our undergraduate service courses, but not enough practical details for our graduate-level service courses.
The text is mostly accurate, especially the sections on probability and statistical distributions, but there are some puzzling gaffes. For example, it is claimed that the Poisson distribution is suitable only for rare events (p. 148); the unequal-variances form of the standard error of the difference between means is used in conjunction with the t-distribution, with no mention of the need for the Satterthwaite adjustment of the degrees of freedom (p. 231); and the degrees of freedom in the chi-square goodness-of-fit test are not adjusted for the number of estimated parameters (p. 282).
Some of the content seems dated. For example, there is a strong emphasis on assessing the normality assumption, even though most of the covered methods work well for non-normal data with reasonable sample sizes. Normal approximations are presented as the tool of choice for working with binomial data, even though exact methods are efficiently implemented in modern computer packages. Fisher's exact test is not even mentioned. The section on model selection, covering just backward elimination and forward selection, seems especially old-fashioned.
The prose is sometimes tortured and imprecise. For example: "Researchers perform an observational study when they collect data in a way that does not directly interfere with how the data arise" (p. 13). "Standard error" is defined as the "standard deviation associated with an estimate" (p. 163), but it is often unclear whether population or sample-based quantities are being referred to. Use of the t-distribution is motivated as a way to "resolve the problem of a poorly estimated standard error", when really it is a way to properly characterize the distribution of a test statistic having a sample-based standard error in the denominator.
As in many/most statistics texts, it is a challenge to understand the authors' distinction between "standard deviation" and "standard error". The title of Chapter 5, "Inference for numerical data", took me by surprise, after the extensive use of numerical data in the discussion of inference in Chapter 4. Some topics seem to be introduced repeatedly, e.g., the Central Limit Theorem (pp. 167, 185, and 222) and the comparison of two proportions (pp. 191 and 268). The authors are sloppy in their use of hat notation when discussing regression models, expressing the fitted value as a function of the parameters, instead of the estimated parameters (pp. 325 and 357).
The text includes sections that could easily be extracted as modules. For example, I can imagine using pieces of Chapters 2 (Probability) and 3 (Distributions of random variables) to motivate methods that I discuss in service courses.
Chapters 1 through 4, covering data, probability, distributions, and principles of inference flow nicely, but the remaining chapters seem like a somewhat haphazard treatment of some commonly used methods. One-way analysis of variance is introduced as a special topic, with no mention that it is a generalization of the equal-variances t-test to more than two groups. The final chapter (8) gives superficial treatments of two huge topics, multiple linear regression and logistic regression, with insufficient detail to guide serious users of these methods. It is as if the authors ran out of gas after the first seven chapters and decided to use the final chapter as a catchall for some important, uncovered topics.
The interface is nicely designed. The availability of data sets and functions at a website (www.openintro.org) and as an R package (cran.r-project.org/web/packages/openintro) is a huge plus that greatly increases the usefulness of the text.
There are distracting grammatical errors. "Data" is sometimes singular, sometimes plural in the authors' prose. Other examples: "Each of the conclusions are based on some data" (p. 9); "You might already be familiar with many aspects of probability, however, formalization of the concepts is new for most" (p. 68); and "Sometimes two variables is one too many" (p. 21).
I have no idea how to characterize the cultural relevance of a statistics textbook.
In my opinion, the text is not a strong candidate for an introductory textbook for typical statistics courses, but it contains many sections (particulary on probability and statistical distributions) that could profitably be used as supplemental material in such courses.
Table of Contents
About the Book
OpenIntro Statistics 3rd Edition strives to be a complete introductory textbook of the highest caliber. Its core derives from the classic notions of statistics education and is extended by recent innovations. The textbook meets high quality standards and has been used at Princeton, Vanderbilt, UMass Amherst, and many other schools. We look forward to expanding the reach of the project and working with teachers from all colleges and schools. The chapters of this book are as follows:
- Introduction to data. Data structures, variables, summaries, graphics, and basicdata collection techniques.
- Probability (special topic). The basic principles of probability. An understandingof this chapter is not required for the main content in Chapters 3-8.
- Distributions of random variables. Introduction to the normal model and otherkey distributions.
- Foundations for inference. General ideas for statistical inference in the context ofestimating the population mean.
- Inference for numerical data. Inference for one or two sample means using the normal model and t distribution, and also comparisons of many means using ANOVA.
- Inference for categorical data. Inference for proportions using the normal and chi-square distributions, as well as simulation and randomization techniques.
- Introduction to linear regression. An introduction to regression with two variables.Most of this chapter could be covered after Chapter 1.
- Multiple and logistic regression. An introduction to multiple regression and logistic regression for an accelerated course.
OpenIntro Statistics was written to allow exibility in choosing and ordering coursetopics. The material is divided into two pieces: main text and special topics. The maintext has been structured to bring statistical inference and modeling closer to the front of acourse. Special topics, labeled in the table of contents and in section titles, may be addedto a course as they arise naturally in the curriculum.
About the Contributors
Authors
David M. Diez is a Quantitative Analyst at Google where he works with massive data sets and performs statistical analyses in areas such as user behavior and forecasting.
Christopher D. Barr is an Assistant Research Professor with the Texas Institute for Measurement, Evaluation, and Statistics at the University of Houston.
Mine Cetinkaya-Rundel is the Director of Undergraduate Studies and Assistant Professor of the Practice in the Department of Statistical Science at Duke University.