# Introductory Statistics with Randomization and Simulation - First Edition

David Diez, Google/YouTube

Christopher Barr, Varadero Capital

Mine Çetinkaya-Rundel, Duke University

Copyright Year: 2014

Publisher: OpenIntro

Language: English

## Formats Available

## Conditions of Use

Attribution-NonCommercial-ShareAlike

CC BY-NC-SA

## Reviews

The chapters are comprehensive enough to provide a solid introduction. Supplementary materials add to the robustness of explanation and conceptual grounding. read more

The chapters are comprehensive enough to provide a solid introduction. Supplementary materials add to the robustness of explanation and conceptual grounding.

No errors or omissions were noteworthy during my first month's use of this textbook and accompanying resources.

The authors and authoring agency explain that they are moving from this text to the Introduction to Modern Statistics. Still, I enjoy using this book with my students who need additional support and practice in narrow areas of statistics without requiring them to learn R.

Thorough and well explained examples and problem sets.

Formatting and language are consistent throughout the book. As well, the resources are consistent within each domain (i.e., each presenter for the YouTube videos has their own style, and it is consistent within their thread of videos).

Perhaps the strongest use for this text is the modularity of the chapters, videos, and practice opportunities for students.

The book has a clear structure that leads from fundamental bases of statistics to randomization and sampling.

No issues with navigation or viewing content.

No noteworthy errors noticed.

No concerning content detected. As other reviews have noted, there may be some examples which could be updated to be more contemporary and sensitive to societal norms (e.g., a practice problem with the normal distribution using data reported versus population data of online dating profiles, such as how males report their height). This is not inherently concerning, problematic, or offensive, but perhaps a more sensitive approach could add some context to why this data is being shared this way.

Overall a strong textbook that they've rolled into a larger, R-oriented textbook. I prefer to use this book because the accompanying resources are exceptionally well organized, highly useful, and do not require students to use R. For a more advanced statistics class, I would recommend using the updated Introduction to Modern Statistics book from the same open-source text organization.

The book covers content typical of an introductory statistics course, plus a nice chapter on simulation. Given there are only six chapters, it means a lot of content ends up in each chapter, for better or worse. Exercises are numerous, diverse,... read more

The book covers content typical of an introductory statistics course, plus a nice chapter on simulation. Given there are only six chapters, it means a lot of content ends up in each chapter, for better or worse. Exercises are numerous, diverse, interesting, and targeted to key concepts.

The authors accurately describe the concepts covered.

The concepts covered are those most relevant for new students of statistics, and little is likely to change to make the book outdated. Including simulation as a general frame of inference future-proofs the book even more.

The language, examples, and narrative of the book are exceptionally clear. Needless technical language and jargon are minimized. The language strikes the right balance between informal enough to be readable by students, but still accurate.

Chapters tend to use many small examples to highlight concepts. The benefit is exposure to lots of cases, and the potential to include a topic that resonates with individual student interests. However, the downside is a higher cognitive load, with lots of quick dives into understanding situations, rather than the deeper dive possible by using a single, reasonably complex, well-developed example throughout a chapter. Having a single focal "story" that is used to illustrate a set of concepts can also help students remember the content in a block of related memory.

The book is consistent in tone, depth, and quality.

The nature of the topic limits how modular chapters might be, as the concepts develop from one to the next. However, if a particular instructor's students already had prerequisite knowledge, the chapters could serve as independent introductions to the next set of topics.

The progression of topics is orderly and sensible, and matches most other textbooks and course organizations in this discipline.

Perhaps it is just my computer, but visually, the pdf could benefit from more whitespace around the pages. With the footnotes and no left/right margins, section breaks, headers, and lines that separate subsections, the pages are graphically very busy. Also, the links to jump to previous sections or figures (i.e. returning to a previous example) for review are great, but in practice, there is no easy way to "jump back" after reviewing, which makes navigation harder than on paper. There must be a different pdf format or toolset to solve this.

The grammar is solid. There are just a few typos.

I saw no content that was worrisome, though perhaps some opportunities to replace more trivial data/examples with examples that are more meaningful to students. That being said, there were some nice examples (e.g., stents and strokes) where the topic is important and the results unexpected, which should motivate students to learn the underlying ideas.

Introducing inference with simulation is likely easier for new learners, and I appreciate the attempt to modernize introductory statistics. However, I was disappointed randomization and simulation were never really discussed again in subsequent chapters. This seems like a missed opportunity to make comparisons and deepen the student's understanding of each. This of course could be done in class by the faculty, though.

Data from the book are available on the web and in an associated R package, which is a great feature. The inclusion of distribution tables seems antiquated - given the tables are provided on the web, surely there are web resources that are easier and more exact to find the needed values. Is there a pedagogical benefit to having students see the tables?

The exposition of materials allows readers to grasp concepts and apply successful knowledge gained to real world examples. read more

The exposition of materials allows readers to grasp concepts and apply successful knowledge gained to real world examples.

I believe that the text is accurate. I did not find any error or bias.

The text is very relevant and covers most of the ground in Introductory Statistics and beyond. The best selection of examples provides more strength to the text.

The English composition and its sound are accessible to average readers. I believe my students would like the text.

The text is very consistent and the order of presentation is outstanding-the logical sequence is there.

The division of different pieces of the text is amenable to a deep understanding of the text.

The organization structure is there.

The inferface is great. I did not have any issue at all.

The English composition is great and accessible to a broader audience.

The text includes several examples that are relevant and culturally sound.

I suggest to create a separate chapter for Probability Models including the LOGIT Models(in the text), the PROBIT Models(missing), and the Linear Probability Models[LPM)(missing). I would like to see the discussions of the probability model assumptions and the way to select the best specification(LOGIT, PROBIT, LPM) using the HIT RATE or the classification table and series of additional statistics for model selection as appropriate. Another request is to extend the presentation and interpretation of results to the estimated probabilities, the odds ratios[missing], and the marginal effects[missing]. As for ease of interpretation, I suggest the marginal effects and the elasticities[missing] because they are very information as reported in well-received scholarship and the empirical enterprise. At the very least, one can apply the estimated probabilities and the odds ratios.

This text provides thorough coverage of a wide variety of introductory statistical concepts. While no text at this level contains all possible topics, this book provides excellent coverage of important concepts for using statistics to examine... read more

This text provides thorough coverage of a wide variety of introductory statistical concepts. While no text at this level contains all possible topics, this book provides excellent coverage of important concepts for using statistics to examine data. These topics are covered at a level appropriate for an introductory statistics class, especially for an audience of students in the social, physical, and life sciences. The text also includes an effective index.

This text appears to be free of errors.

The underlying topics covered in this text are up to date and are not expected to become obsolete in the foreseeable future. While a few of the examples are a bit dated, this does not interfere with the utility of the text. Further, the examples are incorporated in a manner that would allow for easy updates.

The text is written clearly and reads easily. The subject specific terminology included is relevant, necessary, and incorporated into the text. When necessary, definitions are included, and examples are provided to illustrate new terms.

This text is internally consistent, without apparent inconsistencies in terminology or organization.

The material found in this text is organized into large topic categories, with sections and subsections of increasing specificity. By its very nature, some topics included in this text must be learned as a foundation for later topics. However, this treatment of the topics does not rely on internal references any more than is necessitated by the nature of the material.

This text organizes concepts into the large categories on the basis of the type of data or methodologies covered in that section. This is an intuitive organization that clusters together methodologies for similar types of data or experiments, which may be desirable for the intended audience and level of this text.

This text appears to be free from any interface issues; the graphics are clear, links are functional, and there are no apparent navigational issues.

This text seems to have only very minor grammatical issues, and these do not interfere with the utility of the document. There are some minor issues with changing tense and voice; on occasion it appears that different authors wrote and proofread the text. However, these issues are minor and do not appear to provide a distraction.

This text does not appear to contain any culturally insensitive or offensive material. The examples included do not generally mention the race, ethnicity or background of any human study participants and are sufficiently general to avoid any inference of these characteristics.

This text is accompanied by data sets in the R programming language for many of the examples. This can be a helpful learning tool in statistics courses, especially given that the R language is a freely available, open source language for statistical computing.

The text covers all areas and ideas of the subject appropriately, read more

The text covers all areas and ideas of the subject appropriately,

The textbook is accurate, error free and unbiased

The text is written and/or arranged in such a way that necessary update easy and straightforward to implement.

The material is presented in a clear and concise manner.

Yes. The text is internally consistent in terms of terminology and framework.

The text is easily and readily divided into smaller reading sections that can be assigned at different points within the course.

The topics in the text are presented in a logical, clear fashion. However the first chapter covers part of Descriptive statistic, I think that should not be the case. Descriptive Statistic should have their own chapter.

The text is free of significant interference issues. It is on a PDF format and can be printed easily

No grammatical errors.

The textbook is not culturally insensitive or offensive in any way.

Overall the textbook is a good one. The auteurs covered in details how to interpret computer output for the regression and what which coefficient represents. They covered how to interpret the slope and the y-intercept in the context of the problem at hand.

## Table of Contents

1. Introduction to data.

2. Foundations for inference.

3. Inference for categorical data.

4. Inference for numerical data.

5. Introduction to linear regression.

6. Multiple and logistic regression.

Appendix A. Probability.

## Ancillary Material

## About the Book

We hope readers will take away three ideas from this book in addition to forming a foundation of statistical thinking and methods.

(1) Statistics is an applied field with a wide range of practical applications.

(2) You don't have to be a math guru to learn from interesting, real data.

(3) Data are messy, and statistical tools are imperfect. However, when you understand the strengths and weaknesses of these tools, you can use them to learn interesting things about the world.

**Textbook overview**

The chapters of this book are as follows:**1. Introduction to data. **Data structures, variables, summaries, graphics, and basic data collection techniques.**2. Foundations for inference.** Case studies are used to introduce the ideas of statistical inference with randomization and simulations. The content leads into the standard parametric framework, with techniques reinforced in the subsequent chapters.1It is also possible to begin with this chapter and introduce tools from Chapter 1 as theyare needed.**3. Inference for categorical data. **Inference for proportions using the normal and chi-square distributions, as well as simulation and randomization techniques.**4. Inference for numerical data.** Inference for one or two sample means using the t distribution, and also comparisons of many means using ANOVA. A special section for bootstrapping is provided at the end of the chapter.**5. Introduction to linear regression. **An introduction to regression with two variables. Most of this chapter could be covered immediately after Chapter 1.**6. Multiple and logistic regression.** An introduction to multiple regression and logistic regression for an accelerated course.

**Appendix A. Probability.** An introduction to probability is provided as an optional reference. Exercises and additional probability content may be found in Chapter 2 of OpenIntro Statistics at openintro.org. Instructor feedback suggests that probability, if discussed, is best introduced at the very start or very end of the course.

## About the Contributors

### Authors

**David Diez** is a Senior Quantitative Analyst at Google/YouTube.

**Christopher Barr** is an Investment Analyst at Varadero Capital.

**Dr. Mine Çetinkaya-Rundel **is the Director of Undergraduate Studies and an Associate Professor of the Practice in the Department of Statistical Science at Duke University. She received her Ph.D. in Statistics from the University of California, Los Angeles, and a B.S. in Actuarial Science from New York University’s Stern School of Business. Her work focuses on innovation in statistics pedagogy, with an emphasis on student-centered learning, computation, reproducible research, and open-source education.