# Learning Statistics with R: A tutorial for psychology students and other beginners

Danielle Navarro, University of New South Wales

Copyright Year: 2018

Publisher: Danielle Navarro

Language: English

## Formats Available

## Conditions of Use

Attribution-ShareAlike

CC BY-SA

## Reviews

This text (Learning Statistics with R ~ Ed. 0.6) covers every major topic one would expect to encounter in an introductory statistics course, and then some. It will teach its readers everything from levels of measurement, random variables, and... read more

This text (Learning Statistics with R ~ Ed. 0.6) covers every major topic one would expect to encounter in an introductory statistics course, and then some. It will teach its readers everything from levels of measurement, random variables, and probability distributions to p-values, ANOVA, Chi-Square, and multiple regression. In fact, one could argue that it’s a little ambitious in its coverage. For example, do instructors really teach “Bayesian analysis” in an introductory course? Many concepts in statistics (and the way they’re typically taught) are the subject of fierce debates among statisticians. Instead of merely giving students formulas and distribution density plots, Dr. Navarro takes the more daring approach of not avoiding the controversies, but instead decides to inform the students as much as possible on why certain camps across the disciplines have their preferences. The author often traces the origin of concepts and their most common applications in a way that can only enhance student learning. This book probably contains the most comprehensive, critical, and unbiased discussion I have seen on the most controversial issues in the contemporary practice of statistics: sampling, p-values, degrees of freedom, hypothesis testing, Student T-test, linear regression, etc. The author gives thorough explanations that move the reader from the most elementary facts to the richest explorations of scientific questions. There is an even a little blurb about power analysis – but the author acknowledges that it would need to be more deeply explored in an expanded section or chapter. In any case, I am happy to see that this author is a lot more forthcoming about the real usefulness of power analysis and they did not encourage undergrads to blindly join the power analysis craze that statisticians like Andrew Gelman have warned us about. Here’s a book that covers more than most manuals at this level of instruction usually do and still finds the humility to openly state what it could have done better. Bravo! Overall, this is a truly lively book with good examples and a solid R package to support coding and learning R/RStudio.

As far as I know, the book contains no conceptual error or technically misleading statements. Its discussion of the concepts is extensive, balanced and impartial.

The text is written in a way that should keep it relevant for years to come. However, I am on the fence about the introductory chapters that walk students through the psychology of statistics. I wonder about the effectiveness of such material – maybe some will get it, most will probably not. My advice? Save it for a more advanced course. Also, this textbook’s emphasis is more on offering simple coding tricks rather than on students learning to work out the calculations with pen and paper. Yes. I prefer the latter. Or at least I use a combination of both. In my view, the speed and convenience that coding provides is only for students who have earned their stripes by doing the procedures with pen and paper first. I firmly believe that’s students learn better if they can perform the paired t-test by hand before executing it in their R script. Having an R package `lsr` with pre-loaded functions and data to accompany the text is great for student. lsr as a package also incorporates clever solutions and tricks that overcome and extend some of the limitations of the simple functions in base R for describing data and investigating relationships between two or more variables. However, should there be a time in the future when this lsr package is no longer maintained and updated then students who will continue to refer to it or new cohorts of students may be in trouble. The standard student t-test tools in R/RStudio are not likely to go away, but insular packages get obsolete, developers neglect them, etc. There is a potential hazard of learning the basics of statistics using this lsr package instead of the standard R tools... This nice package has its potential downside too, and the possibility that it might not be maintained in the future is one of them.

The text is written in lucid prose, using a conversational tone that is likely to draw both students and instructors in. It is clear that Dr. Navarro cares deeply about the reader. However, the explanations are not always as clear as they could be. For example, the explanation for the p-value does not land as well as I hoped. Next to the p-value, I find the concept of degrees of freedom the hardest thing to explain to a student unfamiliar with statistical reasoning. This text attempts to be as comprehensive as possible with both concepts. It does a decent job introducing them but uses too many words and, in the end, students may still be unclear about the p-value is or what degrees of freedom really are. In both cases the explanation involves too much backtracking, delaying, and the explanation is then spread over several sections. By the time the explanation fully unwraps, it is a little too convoluted to sink in. To be fair, I cannot do a better job; I was simply hoping this otherwise great (free) textbook would do better than what I have seen in every other text (which is to simply mention “df” like it’s a natural fact and use it in a formula without any prefatory remark).

The mathematical shorthand and terminology in this text may not necessarily be what one finds in the vast majority of textbooks, but it is simpler and consistent throughout. More importantly, the text is highly consistent in its approach to how statistics should be taught: by offering students full coverage on how concepts and methods are typically used and why there is disagreement over their usage. The author takes great care to underline the pros and cons of competing methods – for example, see its masterful exposition of the two main methods for doing the One-sample T-test. It’s a joy to read! Another example of the text remaining consistent is in the approach to hypothesis testing. For instance, significance levels are not chosen in advance… Rather, students perform the tests and then figure out at what level of significance their result may be relevant. I find it more realistic with how we determine the significance of results in regression models for scientific publication.

The PDF version is this free text is searchable and very easy to navigate. Some, though not all, of the chapters can be used as standalone units. Unfortunately, many references like “as we saw earlier” or “back in chapter Y, we did ...”, or worse "similarly to how we performed a chi-square test in chapter N..." negatively impact the book’s modularity. While these references may make the book more readable as a whole, they certainly get in the way of the adaptability of the sections and subsections as self-defined units. An instructor can teach some of the topics in a different order than how the author presents them, but it will require a little effort in planning. I, for one, cannot assign the chapters in the order they’re presented. But these self-referential disruptions are not necessarily a dealbreaker. This is still an impressive textbook.

From the polished table of contents to the numbered chapters and sections, to the neat graphs and the data tables used in the elaborate examples – it is clear that the text is well-organized and a lot of thought went into the sequence of the units. The summary section at the end of each chapter is golden. It’s the little things that help students remember what’s worth remembering. However, there are some features that could make the material more accessible. The most glaring omissions are the lack of practice sets and homework problems. Granted, the examples in the text are very elaborate and easy to follow, but the text sorely lacks some exercises and homework problems with solutions that could really help students solidify their knowledge before moving on to the next unit/chapter. I would have preferred a guided lab and some problems with solutions over the chapter on Bayesian analysis that almost no one is going to teach in an introductory course. The book also lacks, statistical tables (Z, T, F distributions etc.) – usually that helps students do the hypothesis tests quicker. I want students to learn doing the T-test, ANOVA, and Pearson’s correlation test by hand. That way they get to peek behind the curtain of what R does. Printed statistical tables is one set of tools that facilitate that process. I understand why this book excludes this material: the focus is on doing these procedures in R/RStudio/ That’s fine, but I wish it encouraged students to practice doing these procedures with a pen and paper first. Still, this is a FREE book, and perhaps we are (really, I am) asking too much of it.

The text’s presentation is nearly flawless. it’s a highly readable book in PDF format with full search functionality. My only gripe does not concern the book itself, but rather the supplemental materials associated with the lsr package to which students are introduced in this text. More specifically, the book doesn’t always tell students how to get the data they will need in order to follow along with an example. Yes, the data sets are posted online and can be downloaded, but… The data sets are NOT pre-loaded with the lsr library on CRAN. Why not? That would make everything so much simpler! In my experience, I find most undergraduates to be notoriously bad at downloading and opening data sets in R/RStudio. Perhaps other instructors have had better luck, but I have wasted far too many workshop hours helping students figure out where their lost files are hiding in the nooks and crannies of their expensive laptops (unfortunately, you can’t do much statistical analysis on a phone, the device on which they're an expert). I have lost that battle, and it’s ok. They will learn how to use their computer in another class beyond mine. For now, I just want them to be able to complete the coding exercises without me facing an avalanche of emails that all begin the same way: “HELP: I can’t find my data”. To Dr. Navarro, my most urgent plea is this one: please include the data sets used in the text with the next version of the lsr library so we can simply load them in RStudio. Then, I will adopt this textbook and not look back. Not for a long time...

It is hardly the case that a text of this size contains no typos or grammatical errors. For a free text, however, it surprisingly contains very few errors.

This text presents concepts in a way that is culturally neutral for the most part. I simply wish that the word “Psychology” was not splashed on the cover – why potentially limit your audience? I know several social scientists who are to skip this text simply because of the title -- they might assume it is better tailored to the needs of psychology students. Utter rubbish: a good intro to statistics book is hard to find across the board, so I find the title and the attempt at intra-disciplinary location a self-imposed limitation that is hardly justified. Giving advice on how to report the results of a hypothesis test, the author writes: “the best advice I can give is to suggest that you look at papers/reports written in your field and see what the convention seems to be” -- this clearly assumes that the book is aimed at readers from a variety of disciplines. Thus, the term “psychology” slapped of the cover is misleading. While some statistical techniques may be more familiar to members of one academic discipline versus another, I don’t see what is to be gained by falsely signaling to students that there is a “statistical science” for psychology, a different one for economics and another for sociology, etc...

Let’s be honest: most introductory statistics textbooks are just not that interesting. As an instructor, I typically dread adopting one almost as much as my students dread reading it. Dr. Navarro has given us a book (literally) that may change that. It is rare that a paid textbook is this good, let alone one that is entirely free. I won’t argue that this book is perfect – it is not, but it is an impressive contribution for which we should be highly grateful. There are a few choices that I found questionable, but most of the content is presented fairly and judiciously. Here is a text that both instructors and students might actually enjoy reading. This book is especially well suited for instructors who teach a sequence of two or three courses in statistics. Because there is so much content, it easily lends itself to continued learning from introductory level to intermediate courses (with some supplemental materials, of course). It’s perhaps the best book I have encountered on the subject – and it’s free! And it uses R. Enough said.

This books covers the fundamentals in both statistics and R programming. I would suggest add a little touch of Bayesian statistics in the section of the Stats Theory given the broad application of Bayesian inference in psychology. read more

This books covers the fundamentals in both statistics and R programming. I would suggest add a little touch of Bayesian statistics in the section of the Stats Theory given the broad application of Bayesian inference in psychology.

Content is accurate, error-free and unbiased in my opinion.

Contents are up-to-date.

Delivery of statistical theories in this book is clear, with the support of reproducible examples and relevant practice questions.

Terminology is consistent throughout the book.

Each chapter is divided into accessible modules that can be assigned to the course segments by demand.

It would be better if the author could add a brief "landmark" in the beginning to help the readers decide: Which chapters I need to read If I want to learn X given Y time? e.g. Which chapters to read If I want to learn about chi square test so that I can i) work on a dataset and interprets the results on next week's presentation or ii) develop in-depth understanding of the analysis and use it in my thesis in half a year.

Interface is clear and compact, my favorite style!

The text contains no grammatical errors to the best of my knowledge.

I feel the text is not culturally insensitive or offensive in any way.

The book did a very good job of gently working students up to analyses in R. The text was clear and incorporated existing datasets that students (and faculty) could use to engage in hands-on learning. read more

The book did a very good job of gently working students up to analyses in R. The text was clear and incorporated existing datasets that students (and faculty) could use to engage in hands-on learning.

Because of the plethora of R packages, there will always be some discussion about which packages are "easier" to use or more appropriate for particular sections. However, this text does a good job of guiding students toward the construction of an R package toolbox that is appropriate for social science analysis.

The only significant changes that are likely to be needed in the book are alterations to, or selection of, different packages if and when they arise. Those changes should be relatively easy to make.

This text is very clear. There is some jargon with R that one must become accustomed to, but once students understand the jargon (which is really essential to understanding how the R environment works) the text is clear and easy to understand.

No real comments here. The text is as consistent as one would expect from a book teaching students statistics in R.

The text has clear section delineation. As is true of many "beginner" texts teaching a particular statistical platform, the units largely build off one another. So, although the sections are very well delineated, I would not recommend rearranging the chapters as that would likely not benefit students.

The structure of the book made good sense and made R feel more accessible.

My one comment would be a link (at the end of sections or chapters) to take the reader back to the Table of Contents.

No significant errors.

I did not see any cultural insensitivities in my review of the book.

This text, version 0.6, clocks in at over 600 manuscript pages (to date no version has been typeset) -- but the length is worth it to gain great coverage. Navarro covers not only everything you could expect to learn in a two-course sequence of... read more

This text, version 0.6, clocks in at over 600 manuscript pages (to date no version has been typeset) -- but the length is worth it to gain great coverage. Navarro covers not only everything you could expect to learn in a two-course sequence of undergraduate behavioral science statistics -- descriptive statistics, probability, analysis of variance, regression, and a very welcome chapter on Bayesian approaches-- plus how to implement a lot of data description and analysis in R, including step-by. It is a effective and useful mashup of these two topics. It does not have an index or glossary. I did not miss them, as it’s easy to use the search function to find the first instance of any term within the text, and Navarro is very good about defining and contextualizing new terms clearly as she goes.

Version 0.6 appeared free of errors as far as I could see. Furthermore, it did a nonpartisan job of framing debates that are throwing out a lot of light at the moment (such as the debates between proponents of frequentist vs. Bayesian approaches). Navarro’s approach is exemplary: she carefully contextualizes the issues at stake, explains why she feels as she does, and provides useful resources to follow up in more detail.

The content of the book seems up-to-date; indeed, at the moment I write this, the book has emerged as a common recommendation on social media for those hoping to learn R, so it’s clear that it is broadly seen as relevant. Navarro has already implemented several revisions, showing that necessary updates are easy and straightforward to incorporate. The core information in the book (statistics) is nearly timeless and should not need constant updating.

Clarity is absolutely paramount when one is attempting to learn a new skill -- or to learn two new skills, R and statistics, as is envisioned here. Navarro is an extremely useful guide to this process: it’s as if she takes your hand and walks you through step by step, so that learning these new skills is quite painless. Version 0.6 features clear and carefully chosen examples, no doubt honed over the prior versions. As noted above, new terms are clearly defined (almost obviating the need for a glossary or index).

Navarro has thought carefully about when/where and how to introduce new concepts, and then is thoughtful in using them consistently. She includes summary sections at the end of each chapter that are more helpful than ‘typical’ summaries, and useful sample R code is provided where appropriate.

Where possible, the book is modular. For example, a reader who is reasonably competent in statistics but using the book to learn R would have no trouble using the Table of Contents and closing chapter summaries to jump right to a specific section (say on graphing) that captures what they need to learn. A reader who is trying to use the book to learn statistics would be best advised to go in order, as later topics build on earlier ones and trying to do an end-run around this organization is ill-advised, but this is not a flaw of the text per se; it is inherent in learning about the topic.

The book is organized carefully and intentionally. There is a ‘received organization’ to most texts that introduce behavioral statistics, building in terms of complexity (for example, covering t-tests before moving to analysis of variance). The book follows this in the relevant sections (most of its second half). There was more scope for choice and design in the book’s first half, which introduces all of the topics the reader will need to understand before diving into implementing statistical learning in R. Here, Navarro has done a fantastic job of making choices that are friendly to the reader.

The text is rendered as a PDF, and everything is laid out quite cleanly, with helpful clickable links in the Table of Contents to each section.

The text was very competently copy-edited (despite being still being in, as it were, beta-testing) and did not appear to contain any unintentional errors.

I found nothing culturally insensitive or offensive in the text. The author is not based in North America, which was occasionally lightly apparent, which I consider all to the good.

A valuable asset of this book is its congenial tone. Navarro is chatty and funny, sometime even a bit irreverent, and the reader benefits quite a bit from this well calibrated conversational tone.

## Table of Contents

I. Background

- Chapter 1: Why do we learn statistics?
- Chatper 2: A brief introduction to research design

II. An introduction to R

- Chapter 3: Getting started with R
- Chapter 4: Additional R concepts

III. Working with data

- Chapter 5: Descriptive statistics
- Chapter 6: Drawing graphs
- Chapter 7: Pragmatic matters
- Chapter 8: Basic programming

IV. Statistical theory

- Prelude
- Chapter 9: Introduction to probability
- Chapter 10: Estimating unknown quantities from a sample
- Chapter 11: Hypothesis testing

V. Statistical tools

- Chapter 12: Categorical data analysis
- Chapter 13: Comparing two means
- Chapter 14: Comparing several means (one-way ANOVA)
- Chapter 15: Linear regression
- Chapter 16: Factorial ANOVA

VI. Other topics

- Chapter 17: Bayesian statistics
- Chapter 18: Epilogue
- References

## Ancillary Material

## About the Book

*Learning Statistics with R* covers the contents of an introductory statistics class, as typically taught to undergraduate psychology students, focusing on the use of the R statistical software. The book discusses how to get started in R as well as giving an introduction to data manipulation and writing scripts. From a statistical perspective, the book discusses descriptive statistics and graphing first, followed by chapters on probability theory, sampling and estimation, and null hypothesis testing. After introducing the theory, the book covers the analysis of contingency tables, t-tests, ANOVAs and regression. Bayesian statistics are covered at the end of the book.

## About the Contributors

### Author

**Danielle Navarro, PhD** is a computational cognitive scientist at the University of New South Wales. Her research focuses on human concept learning, reasoning and decision making. She is also interested in language and cultural evolution, cognitive development, and statistical methods in the behavioural sciences