# Collaborative Statistics

Barbara Illowsky, De Anza College

Susan Dean, De Anza College

Pub Date: 2012

ISBN 13: 978-0-9787450-7-3

Publisher: OpenStax CNX

## Read This Book

## Conditions of Use

Attribution

CC BY

The text covers most of the areas that would normally be included in an introductory course with a few exceptions that I will note later. The index … read more

The text covers most of the areas that would normally be included in an introductory course with a few exceptions that I will note later. The index is definitely not effective and I feel that the glossary, while complete, needs revision. Text: The only major topic that is omitted is experimental design but that is not an important omission unless the course is for science or social science students. There is no section on ethics but very few Statistics texts include such a section. Probability plots are not covered and the chapter on regression makes no reference to residual plots which is highly unusual. In my opinion the biggest thing this textbook is missing is motivation for studying statistics. Statistics plays a huge part in trying to answer many important questions and this text gives little or no indication of this. The examples and problems generally deal with uninteresting questions predominantly with made up data. Even when the data is real there is rarely any motivation given or apparent reason to analyze it. Here is an example (pages 398-399) from Chapter 9, Hypothesis Testing: Single Mean and Single Proportion which is typical of most of the student generated questions in the chapter. “NOTE: The following questions were written by past students. They are excellent problems! Exercise 9.16.18 18. "Asian Family Reunion" by Chau Nguyen Every two years it comes around We all get together from different towns. In my honest opinion It's not a typical family reunion Not forty, or fifty, or sixty, But how about seventy companions! The kids would play, scream, and shout One minute they're happy, another they'll pout. The teenagers would look, stare, and compare From how they look to what they wear. The men would chat about their business That they make more, but never less. Money is always their subject And there's always talk of more new projects. The women get tired from all of the chats They head to the kitchen to set out the mats. Some would sit and some would stand Eating and talking with plates in their hands. Then come the games and the songs And suddenly, everyone gets along! With all that laughter, it's sad to say That it always ends in the same old way. They hug and kiss and say "good-bye" And then they all begin to cry! I say that 60 percent shed their tears But my mom counted 35 people this year. She said that boys and men will always have their pride, So we won't ever see them cry. I myself don't think she's correct, So could you please try this problem to see if you object?” I am not sure what hypothesis I am being asked to test here. I would certainly disagree with it being described as an excellent problem. While many of the student generated problems are similar to this one there was one about the endings of Japanese girl’s names (9.16.25 Page 402) that I found quite interesting. Index: The index clearly had little or no human input. As well as reasonable entries the index includes a host of random words. For example, the index includes 80 references for the word “elementary” and 186 references for the word “statistics”. It also includes references for many words such as “answer”, “box”, “word”, “good” and “two” that should not be in any index. Glossary: I would rate the glossary as somewhat effective. The glossary is fairly complete but I believe that many of the entries should be rewritten. It includes some minor errors such as the definition of a geometric distribution “The probability of exactly x failures before the first success is given by the formula: P (X = x)= p (1- p)^(x-1).” In at least one case an entry is given with no definition. Some of the other definitions are somewhat unclear. For example: Mutually Exclusive An observation cannot fall into more than one class (category). Being in more than one category prevents being in a mutually exclusive category. Standard Normal Distribution A continuous random variable (RV) X~N (0, 1) .. When X follows the standard normal distribution, it is often noted as Z~N (0, 1). Other definitions just don’t match my preferences. For example the definition of correlation includes the so called computational formula which I feel doesn’t belong in any statistics textbook. I also didn’t like the definition of “Random Variable” being given under the heading “Variable”. Doing that accentuates the confusion between a variable in algebra and a random variable in probability.

The content is generally accurate and unbiased, although I am not sure what a biased statistics text would look like. There are some errors such as the previously mentioned definition of the geometric distribution which is not much more than a typo and the occasional more serious error such as the statement: “True random sampling is done with replacement.” on page 20. In my opinion, virtually every graph in the chapter on graphing is done badly but they are not really errors.

This text is a mix, up-to-date in some ways, quite old fashioned in others. It makes good use of graphical calculator technology using the calculator to calculate probabilities rather than using antiquated tables although the tables are still included if an instructor prefers to use them. It also uses the graphical calculator in all aspects of statistical analysis. If you are convinced that a graphical calculator is the best technology to use when teaching introductory statistics, this is one of the primary strengths of the text. The fact that it includes no other technology is a weakness. For example, the text gives long detailed instructions for creating frequency tables and histograms from scratch. I do not feel that this section was done well and even done well it should have disappeared 30 years ago. The text correctly indicates that the normal approximation to the binomial is no longer necessary with the technology that is currently available. However it then uses the same normal approximation when doing inference with proportions. While this is still the norm for introductory classes and should probably be included, it would have been nice to include a justification for using the normal approximation after saying it isn’t necessary. One of the first sections I look at when I review a text for possible adoption is the section on comparing means using independent samples. The more modern texts use the Welch’s t-test. That is the test used by this text so for me that is a positive. However it follows that section with a long section using the assumption that the variances are known. The variances are never known so the only justification for including such a section is as a lead-in for Welch’s t-test. In that case it should be much shorter and should be included first as was done in the single population chapter. While the text indicates “In practice, we rarely know the population standard deviation.” (I would replace rarely by never) it devotes more space to the case when the variance or variances are known than when they are unknown. I also check to see if the text differentiates between large and small sample inference for means since there is no reason to do so. This text does not differentiate and it says why which is another plus. As I have mentioned before, this text gives very few examples of what statistics is being used for. Since few of the examples or problems are topical, it will take them a long time to become dated. I would consider this to be a minus but in the context of this question it might be considered a plus. The textbook is written in a way that updates and revisions will be straightforward to implement but in my opinion, so many are needed before I would consider adopting this text that it would not be easy.

The text is written very clearly in some places less so in others. It gives a very clear, step by step set of instructions for taking a small simple random sample from an already given sampling frame. However, no mention is made of how difficult it is to create a sampling frame for a large population and no mention is made of how a large simple random sample could be taken from a sampling frame. It also gives relatively clear instructions on how to create a frequency table and histogram including detailed instructions for calculating the number of bars of width 1 required to graph data consisting of the integers 1, 2 ,3 ,4, 5, and 6. (Spoiler: the answer is 6.) It gives a pretty good job of relating decisions using p-values to the concept of rare events. Other parts are less clear. My guess is that no one in a class of tourism students would get anything from the chapter on analysis of variance. It contains lots of jargon with very little context. For example, this is how the description of the F test starts out: “To calculate the F ratio, two estimates of the variance are made. 1. Variance between samples: An estimate of σ^2 that is the variance of the sample means multiplied by n (when there is equal n). If the samples are different sizes, the variance between samples is weighted to account for the different sample sizes. The variance is also called variation due to treatment or explained variation. 2. Variance within samples: An estimate of σ^2 that is the average of the sample variances (also known as a pooled variance). When the sample sizes are different, the variance within samples is weighted. The variance is also called the variation due to error or unexplained variation.” While most of the text is written clearly, I feel that a general shortcoming throughout this textbook is that it does not provide sufficient context for the techniques it looks at.

The text is consistent in terms of terminology and framework.

The text is easily and readily divisible into smaller reading sections; it is not overly self-referential and should be easily reorganized to the extent that any statistics text could be.

The organization is similar to most old-school intro stats texts and while it is not the same as what I use I am sure that it conforms to the organization that many instructors use. The only really awkward place that I noticed was introducing box-plots before measures of centre or location. It meant that the authors had to define quartiles and medians in that section and then define them again later. It would be easy to move the section on box-plots after the discussion of quartiles and medians.

I was working from the pdf file so I cannot comment on these issues.

I did not notice any grammatical errors.

The text is not culturally insensitive or offensive in any way. The names it uses in its examples are inclusive of a variety of ethnicities.

The six recommendations of the GAISE (Guidelines for the Assessment and Instruction in Statistics Education) college report prepared for the American Statistical Association are: 1. Emphasize statistical literacy and develop statistical thinking 2. Use real data 3. Stress conceptual understanding, rather than mere knowledge of procedures 4. Foster active learning in the classroom 5. Use technology for developing conceptual understanding and analyzing data 6. Use assessments to improve and evaluate student learning This textbook does an excellent job on points 4 and 5. There are many group exercises throughout the text. It is a conscious focus of the text and is its primary strength. The textbook is also based on the use of a graphic calculator. While I feel that it is a poor tool for doing statistics, it is a reasonable tool for use in an introductory statistics class. This textbook does an excellent job of integrating it into the curriculum. This is the other strength of the textbook. However, as I mentioned earlier, I feel that ignoring other technologies is a weakness. The book also is less successful in stressing conceptual understanding rather than mere knowledge of procedures, point 3. For example, in the chapter on sampling it gives brief descriptions of different sampling methods but says nothing about the conditions under which one method is better than another. It lists possible problems in sampling but gives no context. Another example is that it lists the properties of correlation but doesn’t relate them to data and the only formula given is the computational formula which I feel has no pedagogical value what-so-ever. It uses some real data but I don’t feel that it uses enough. The real data it uses does involve the students in the collection of data, making that data more relevant and fostering active learning, obviously a good thing. However, it does not include much data that was used to answer interesting questions. I feel that the critical failure of this textbook is that it doesn’t do a good job of teaching statistical thinking. Far too often it emphasizes how to do questions in a textbook rather than how to do statistics. This is a consistent focus throughout the text. Here are a few examples: These listed learning outcomes all talk about textbook questions: “By the end of this chapter, the student should be able to:” “Classify discrete word problems by their distributions.”(Chapter 4 Page 159) “Classify continuous word problems by their distributions.”(Chapter 7 Page 281) “Discriminate between problems applying the normal and the student-t distributions.” (Chapter 8 Page 319) As it introduces confidence intervals for proportions it does so in the context of a textbook problem: “How do you know you are dealing with a proportion problem? First, the underlying distribution is binomial. (There is no mention of a mean or average.)” (Page 331) In the discussion of using hypothesis testing to make decisions on page 375: “A systematic way to make a decision of whether to reject or not reject the null hypothesis is to compare the p-value and a preset or preconceived α (also called a "significance level"). A preset α is the probability of a Type I error (rejecting the null hypothesis when the null hypothesis is true). It may or may not be given to you at the beginning of the problem.” When working an example of a test for two means: “Example 10.1: Independent groups The average amount of time boys and girls ages 7 through 11 spend playing sports each day is believed to be the same. … Is there a difference in the mean amount of time boys and girls ages 7 through 11 play sports each day? Test at the 5% level of significance.” “The words "the same" tell you Ho has an "=". Since there are no other words to indicate Ha, then assume "is different." This is a two-tailed test.” Another example of the lack of statistical thinking is that while the textbook mentions the assumptions for the various procedures, it never indicates how to assess whether they are reasonable for a particular set of data. The only assumption checking it does is again based on textbook questions rather than data. For example (Page 381): “Example 9.13 Statistics students believe that the mean score on the first statistics test is 65. A statistics instructor thinks the mean score is higher than 65. He samples ten statistics students and obtains the scores 65; 65; 70; 67; 66; 63; 63; 68; 72; 71. He performs a hypothesis test using a 5% level of significance. The data are from a normal distribution. … “Distribution for the test: If you read the problem carefully, you will notice that there is no population standard deviation given. You are only given n = 10 sample data values. Notice also that the data come from a normal distribution. This means that the distribution for the test is a student’s-t.” Since the data are given for the question, the decision on whether to use a t-test should be based on the data, not artificially given in the statement of the question. While this textbook does an excellent job of integrating graphical calculators and includes a large number of collaborative exercises it does not come close to matching my needs for a textbook for an introductory statistics course. I feel that the first three recommendations of the GAISE college report are all critical and I do not believe that this textbook adequately addresses any of the three. I personally would not consider adopting it without extensive revision.

The text covers most of the topics I teach in an Introductory Statistics course, and covers them at the appropriate depth. Two emissions are … read more

The text covers most of the topics I teach in an Introductory Statistics course, and covers them at the appropriate depth. Two emissions are Experimental Designs and Bayes Theorem. I would like to see more detailed coverage in some areas, such as Sampling and Bias, the Central Limit Theorem for Proportion, and a few others. An explicit explanation on the Scales of Measurement would also be helpful in the discussion on Data and Variable. In Regression I would like to see a discussion on why we should not make prediction outside of the data range. On the other hand, some areas receive more coverage than they should. In the Linear Regression and Correlation, for example, I can do with a lot less manual calculation and sketching of the Least Square line. But overall, coverage and depth is satisfactory. I am able to find what I am looking for in the index. The glossary looks fine.

The definition of Median (p.59) is incorrect if there are repeated values in the data. Although I understand that, from a pedagogical point of view, it is sometimes preferable to present students, especially at the introductory level, with a 'simplified' definition the can understand intuitively as opposed to a technically correct one that may confuse or discourage learning, a footnote explaining how this definition may not work in some situations is needed. I did not find any other 'errors,' although some definitions, in my opinion, could be better worded.

I share the authors' philosophy in making the text contemporary without giving it too short of a shelf life. Most of the examples and exercises are from made-up data. One advantage of this is, unlike real-life data examples, they are not dated, and therefore will not quickly become outdated. Some of the real-life data examples and exercises are student-generated. While this is an excellent way to promote student involvement, I feel that better guidance is needed. For instance, almost all of the student-generated exercises on pp.397-404 were written in verse. Were they instructed to do so? I appreciate originality, and writing Statistics problems in verse is original – unless everyone else is doing it too. Many of these exercises are also lacking from a technical point of view.

The language itself is good. It strikes the right balance of accessibility and technical accuracy. This is very important for an Introductory Statistics text, where the main challenge for the instructor is to explain complicated and subtle concepts to students with limited mathematical background, many of whom are ESL students. But some explanations could be better worded. The definitions of type of data and type of variables are confusing. An explicit discussion on scale of measurement is needed.

I see no problem with respect to consistency of terminology. The group exercises are also consistent with text's collaborative approach.

I usually think of Correlation as an introduction to Regression. And I treat the two as related by separate topics. In the text, they are enmeshed. But other than this, I see no problem with modularity. Although the sequence of topics I use is different from the text. (e.g., I do Correlation and Regression before probability), I don't see any problem with this, as I can easily 'jump around' the text.

I see no problem with the text in this respect.

I see no problem with this item.

I found a couple of minor typos: p. 18 last paragraph should read: “Any group of n individuals is equally likely to be chosen as any other group of n individuals.” p. 532, the second sentence in the first bullet under “The assumptions underlying the test of significance are:” should read “In other words the expected value of y for each particular x value lies on a straight line in the population.” But these are minor, and I did not notice any other.

The examples in the text are inclusive of the cultures that made up the Canadian mosaic. Other than race and ethnicity, it is also important to me that a text is inclusive of people from different economic backgrounds. This text does that. In addition to business examples that refer, for example, to sales figures in the millions of dollars, there are also many examples of situations that working class or middle class people would find themselves in. More examples of small businesses or non-profit would be welcomed.

Most if not all of the examples involving politics are American. It would be nice to see examples involving Canadian political institutions, geography, etc. I think the text does an excellent job in facilitating students' participation and collaboration. Where it falls short is in encouraging Statistical thinking. There is too much rote doing, and recipe-following (e.g. calculation of least-square line), and not enough discussion on why one should choose one statistical procedure over another. This is a serious shortcoming in my opinion. While I consider this text as valuable resource, I will not be adopting it for my class.

This textbook is very long and covers a certain scope of material very completely at the level it targets. The number of procedures covered starting … read more

This textbook is very long and covers a certain scope of material very completely at the level it targets. The number of procedures covered starting in Chapter 8 and running to Chapter 13 is very large. However, an instructor, using a textbook like this, would find the comprehensiveness over the top I believe. For instance, probability runs from page 113 to page 251 or so. There is extensive discussion of special distributions: Binomial, Geometric, Hypergeometric, Uniform, Exponential and finally normal. This is much more probability than we would ever do in our courses -- other than our calculus-based course. One of the things I like least about elementary statistics courses is that we continue to teach students to use tables when we never ever use them ourselves. I understand that we find it easiest to give tests where students can use tables but we really need, as a discipline, to move beyond that. There is some focus on computing probabilities using tables but the book does see that tables are no longer really useful or used. Unfortunately the solution adopted here relies on calculators rather than computers. This make it unsuitable for a number of our courses at SFU where computing must be part of the syllabus. For students in the social sciences there are some gaps: the language of scales of measurement (nominal, ordinal, ratio and interval) and the discussion of cross tabulation, contingency tables and measures of association seems to be limited to illustrating probability calculations and then a short section on tests of independence in Chapter 11. My own view is that the explanation of the interpretation of independence is a bit thin. I also notice that quite a number of the contingency table examples have either rows or columns or both which have ordered categories. The usual Pearson chi-squared test is generally a bad idea in this context. I prefer illustrations where the suggested technique is likely to be a good technique.

I have only a few complaints here. I read phrases I didn't like from time to time but I always feel that way when I read texts. Here are some examples, though, including at least one which bothers me: "True random sampling is done with replacement." Page 20. I would not say this to students -- as if sampling without replacement were some inferior form of survey. "When you analyze data, it is important to be aware of sampling errors and nonsampling errors. The actual process of sampling causes sampling errors. For example, the sample may not be large enough." Page 21. I really don't like joining sample size to the issue of "sampling errors". "For example, in a college population of 10,000 people, suppose you want to randomly pick a sample of 1000 for a survey. For any particular sample of 1000, if you are sampling with replacement, the chance of picking the first person is 1000 out of 10,000 (0.1000); the chance of picking a different second person for this sample is 999 out of 10,000 (0.0999); the chance of picking the same person again is 1 out of 10,000 (very low)." Page 21. I really don't like this one. What does it mean to say "the chance of picking the first person is 1000 out of 10,000 (0.1000)"? This chance seems to distinguish some group of 1000 people from a group of 9000 people. Who are these people? The 1000 people in the sample? What is meant by "the first person"? Then why is 999 out of 10,000 the right probability of anything?. It feels like the authors didn't think through what they were saying here very carefully; I hope that does not reflect a general pattern but I confess that I have not read the whole book with the sort of attention needed to spot this sort of problem. "1.8 Answers and Rounding Off" on page 26. I think this is fine but do non-science students these days really understand phrases like "carry your final answer one more decimal place"? Is the bar graph in example 2.4 a good idea? Indeed is it a good idea to have age groups 13-25 (thirteen years) and then 26-44 (19 years) and 45-64 (20 years)? I don't think so; a histogram here would have quite different bar widths. Even if that is the way the data came from the source we have an obligation to try to help people understand what sort of groups they ought to make. The three dimensional graphs in Example 2.5 probably ought to be discouraged, I think. "Sampling Distributions and Statistic of a Sampling Distribution". This is the title of subsection 2.7.2 on page 69. What is "Statistic of a Sampling Distribution"? This little subsection contains the phrase "If you let the number of samples get very large (say, 300 million or more), the relative frequency table becomes a relative frequency distribution." If you look at Table 2.6 you are entitle to ask if that contains 1 sample or 30 samples and then ask what it means to "let the number of samples get very large"? On page 74 I see mu-bar in the formulas for population standard deviation. "The statistic of a sampling distribution was discussed in Descriptive Statistics: Measuring the Center of the Data." Page 74. Really? I still attach no meaning to the first 6 words of that sentence. Section 3.5 on Contingency Tables. In chapter 3 sample data is often used to DEFINE probabilities. I feel this runs the risk of confusing sample values (the statistics in the tables in this chapter) with population values. Since we spend a lot of effort on this distinction I wonder if it is wise to be so vague about the difference in this context. Do others like the discussion, on pae 159, of "Random Variable Notation"? Look at "If X is a random variable, then X is written in words. and x is given as a number." And earlier on the page "A random variable describes the outcomes of a statistical experiment in words." I would find this unteachable but others might cope. In Section 4.5.1 on page 166 I see the phrase "The parameters are n and p". I don't see that "parameter" has been used in this sense before. I think sometimes the authors are not careful about explaing new words as they use them; they appear to forget occasionally that some of these words have multiple technical uses. In particulart n and p in a Binomial model have not been connected to the population values of some numbers which is the previous meaning assigned to "parameter". "Often real estate prices fit a normal distribution." Page 253. Really? I doubt it profoundly. I am not happy about the "Empirical Rule". "About 68.27% of the x values lie between -1s and +1s of the mean m (within 1 standard deviation of the mean)." That is a lot of digits for an empirical rule and the word "about".

The text discusses computing only in the context of a specific brand of calculator. When we teach intro stats for social science students, for instance, we introduce them to SPSS -- our client departments (sociology and anthropology, criminology, communications and other arts programs) are very anxious that we do such a thing. The calculator references will be out of date rather quickly and I believe strongly that statistics without computing will leave students with no ability to connect our course with the statistics in their own disciplines. On the other hand the use of calculators is substantially confined to specific sections near the ends of units; perhaps these could be replaced by computing units. I don't think the presentation of the material could be called modern but the basic ideas underlying the Neyman-Pearson approach have not changed so this is probably ok.

I think this is true. Occasionally they seem to pick a piece of jargon and re-use it rather than re-explain but generally it is quite all right.

I noticed no problems here.

I don't think it is all that modular. It feels to me that it might be hard to skip the probability sections and get on to the normal curve directly. That would be a problem for our courses -- we have thirteen weeks to complete the one course most of these students will take and the ideas underlying hypothesis testing and confidence intervals seem to me to be more important that mastering jargon like "mutually exclusive".

As Shane Rollans says -- the index is computer generated and not useful. In an on-line / pdf document the page references in an index ought to be active. The actual order is very standard -- that is just fine.

I guess my comment about active links belongs here. I clicked on a number of links in the text and a depressing number did not lead to the objects they should have. This will be a problem for a long time to come in on-line materials and is not limited to this text.

No complaints from me.

No complaints from me.

In the material above I gave some commentary on the specific Review Criteria which we were given.I also want to discuss the issue in terms of who might actually use this text. I am reviewing a textbook for an introductory Statistics course. I have in mind two potential uses of the text: use in some course in my department at SFU; and use in some other post-secondary institution in BC. I am, I think, better qualified to be firm about the value of the text in the former context than in the latter. I will start, then, with the question: is this a useful text for the Statistics and Actuarial Science Department at Simon Fraser University? I think not. Over all I think the text is reasonable and sensible and has no significant technical flaws. But the book is pointed at an audience which is comfortable with more mathematical notation than I think is wise for our non-calculus based courses. At the same time the mathematical level is too low for our calculus based introductions. Thus I doubt that it will be used in any courses we offer. Here are some more details and concerns with respect to actually using the text. We teach three non-calculus introductions (general, social science, and life science students are the three target audiences) and one calculus based introduction. Only in the latter do I use the Greek letters which are used often in this book. I think the formulas and the algebra are not really suitable for the social science non-calculus course and probably would be problematic in our general course as well. Life science students are required to take calculus so the notation may be ok there. In any case I would much prefer a text which did not have so many formulas and symbols for the non-calculus introductions. Look, for instance, at the formula atop page 59 where they solve an equation to find out how many bars are needed in a histogram. I, for one, certainly avoid even the tiniest bit of algebra since it encourages students to think that the algebra is the important part. On collaborative activities: I guess that a lot of instructors would find many of these activities hard to do in a room with 250 students. They might be a good idea, though. I didn't get the feeling that the collaborative activities were terribly central in spite of the title of the book. If I were using this book for a life sciences audience I would be a bit disappointed by the examples, I feel. There are many which use data which is convenient to find on the web or generate in a class or in a small group. I see the value in this but worry that the result is data which is unconnected with the life science material the students are studying elsewhere. I think there is a serious risk that students in a statistics course will fail to see the relevance of the ideas to their own science.

For an introductory course or a reference, this book has comprehensive coverage of the intended content. Both the table of contents and index are … read more

For an introductory course or a reference, this book has comprehensive coverage of the intended content. Both the table of contents and index are excellent and complete. For my intended use (as a reference book for a senior-level discrete event simulation course), the book covers everything for which I am looking.

For my several hour review, I could not find a single error or typo. I was not able to determine any bias in presenting the topic (fundamental statistics).

The material in this book is the "bread and butter" of fundamental statistics. When I look back at my college reference book, the content has not changed. The examples in the book do not indicate a time period. They are simple, generic, easy to understand examples. I do not believe this book would become obsolete.

For the majority of the content, the clarity is excellent. However, at times, I needed to read through the entire section, then revist early paragrpahs to get the entire message. For example, in section 7.1.2 (Introduction to the Central Limit Theorem), the second paragraph discusses "both alternatives." At first, this was very confusing. However, upon finishing the section, then revisiting the paragraph, I did understand the intent. There were several similar examples I found in my review.

The format of the chapters is very, very consistent, from the Learning Outcomes, through the exercises, labs and solutions for each chapter. Extremely consistent.

Modularity was a strong desire as I searched for a reference book. This book has it in spades. Is it so modular, that one could assign individual sections and, I believe they would stand independently. For example, one of the sections I will assign is "Histograms", section 2.4. That section will stand alone, without having to assign any other material. Excellent modularity.

The topics are presented in increasing complexity. I believe I will use every chapter except, 3, 6, 7 and 12. For my application, I do not need these topics. However, for a fundamental statistics, course, these chapters are necessary, so I am glad they are there. I might have put chapter 13 (ANOVA) right after chapter 10 (Hyp Testing with two means). However, with the excellent modularity, this will not be an issue for me.

I reviewed using the pdf version of the book. This does not have a linked table of contents, which would allow direct access to the sections. I wish the pdf file had this functionality. I am pretty sure this would be available on the online version.

I could not find any grammatical errors.

I do not think this criteria applies for this statistics book. I could not perceive any offensive material in the book.

I was very surprized as the clarity and near-perfect match to my requirements for this statistics reference book. What a find for me and my students. Thank you very much for bringing my attention to Open Textbooks and this statistics reference.

The text covers all the areas needed for an Introduction to Statistics or Elementary Statistics. However there should have been instruction on how … read more

The text covers all the areas needed for an Introduction to Statistics or Elementary Statistics. However there should have been instruction on how students can use excel, SPSS, or minitab for some or all the caluculations.

I found the contents in the book to be accurate and unbiased. I didn’t find any errors or inaccuracies.

This text has different mix of questions for students to solve which is a good thing for a student taking an Intro to statistics course, but there should have been time period for which the data used in this book was obtained. As mentioned before in my comprehensive comments, it will be good to have a mix of technology use instructions to perform some of the computations, like using Excel, Minitab or SPSS.

The clarity in the book was excellent for an intro to statistics course. The language in the book is clear and concise. I found most instructions in the book to be very detailed and clear for students to follow. The calculator instructions were very clear and easy for a student to follow.

The contents in the book is very consistent from beginning to the end.

The text is subdivided well into parts for students to read and understand. Each section can be studied by students. Problems from each sections are independent. I found the text to have no modularity problems.

This is a well-organized book and flows extremely well but I would recommend the author bringing Chapter 13 F Distribution and ANOVA after Chapter 11 the Chi-Square Distribution. But overall the organization, structure and flow was well done.

I don’t find any problem with the interface since I can predict that the text (pdf version) was completely done in latex. I would suggest the author creating a link for the list in the table of contents to the actual pages in the textbook. Hyperlinks for additional resources were created in the pdf formats which makes it easy for students to locate those materials online. There are other options as reading the book online which is a good option for some students. Multiple formats were also available.

No grammatical errors were found.

The text was diverse with the examples used in the book and explanations. I do not find any cultural biasness here.

The text is a good book for an introduction to statistics or elementary statistics. Some improvements can be done on the graphics in the book to make it more attractive and catch the interest of students reading the book. I would recommend more instructors think about reviewing and possibly adopting this book.

It covers essentially all the topics that would be expected in an introductory statistics course.… read more

It covers essentially all the topics that would be expected in an introductory statistics course.

I did not notice any meaningful errors in the book.

Statistics at this level of study is considered to be a generally "complete" area of study, i.e., one that has not changed significantly in the recent past, nor is expected to in the future. As such, any statistics book that covers the required topics should not require significant changes.

The book is very inconsistent in this regard. There are times when definitions, contexts, and ideas are made abundantly clear. Chapter 1, Sampling and Data, is a very good example of that. However, there are occasions in which this is not true. A couple of sections leave it to the reader to puzzle out difficult ideas without adequate context, definition, or assistance in defining ideas. For example, the section on hypothesis testing makes it difficult for the student to figure out what they're doing, let alone how to do it. The discussion weaves from writing null and alternative hypotheses to errors to types of distributions to underlying assumptions to "rare events". As one who understands hypothesis tests, I see where all of these pieces fit in, but I can only imagine that the uninitiated would probably have a difficult time understanding all of these very different pieces of this puzzle. As another

With many different topics in statistics, it may not always be best to treat them all the same. However, notation, vocabulary, and the overall presentation do not vary widely in this booik.

Some sections are easier to take apart than others. This is to be expected.

By and large, the topics (in a big picture sense) are presented in a logical fashion, and prerequisite material is presented before it becomes necessary.

Overall, the interface of the book, as judged by the appearance of the pages, is very plain and monotonous. It consists of plain black text on plain white pages. There's essentially no "prettiness" within the book. I have a difficult time imagining that students would find this interesting or something they would want to read, spend time with, and try to understand. There are also small details of the typesetting that make this even more difficult. Often, titles of charts or sections will be "orphaned", i.e., the title of a chart will be on one page and the actual chart on the next page. Many of the graphics have inconsistent, random, and/or out of place features and/or fonts. It is difficult to determine the scale of some of the graphs, which could hinder understanding. As an example of a place in the text where the writing could make things easier, on page 89, in a summary of formulas, the term "#ofSTDEVs" is used as a variable. It is defined, but the letter z is often used to mean the same thing as a variable This would be much clearer and more consistent with the uses of that idea later on in the book.

The text is completely understandable (to one who has studied statistics) and generally clear. There are, however, enough minor errors and inconsistencies to warrant notice.

There is very little cultural reference in this book, which is generally appropriate for a statistics book.

It is perfectly adequate as a textbook to guide students through a journey into introductory statistics. The homework assignments foster understanding, and the labs are an appropriate way to understand the ideas presented in this book. When I compare it to the more expensive textbook I currently use, the differences are clear. The text is much plainer, the prose can be inaccessible, the graphics can be inadequate and plain, and the more expensive textbook simply has more of these features to help the instructor and student reach an understanding of statistics. Whether the difference in quality is worth the difference in price is a very debatable question.

This text covers most standard topics in the introductory course in statistics, including sampling, probability, descriptive statistics, and … read more

This text covers most standard topics in the introductory course in statistics, including sampling, probability, descriptive statistics, and inference. Experimental design receives little attention in the text, but ANOVA is a notable addition. A conceptual understanding of ideas is privileged while computation is deemphasized. Each chapter contains lessons, discussion prompts, collaborative exercises, labs, practice problems, and solutions. Technology tutorials are limited to TI calculators; other statistical packages are not supported. The table of contents provides a nice orientation to the text and the volume is nicely indexed.

Not only is the content of this text accurate, it is clearly presented and accessible to a wide audience.

The content of the text is quite standard and the general topics (notwithstanding some skepticism in some circles about the hegemony of p-values) will likely remain relevant for several years. The topical chapters are modular such that most can be taught in the order of an instructor's choosing and labs, projects, and data are not particularly time-sensitive. A major limitation of the book is its attachment to TI calculator usage. Given the widespread access to other computational tools--and the relative obsolescence of graphing calculators--the text as written may lack staying power.

This is an unequivocal strength of this book. I have taught the introductory course in statistics using several texts and my students have been critical of them all. This text, on the other hand, is readable and targeted to diverse non-majors in a community college setting. I would absolutely feel comfortable assigning reading from this text with the expectation that students come away with an understanding of basic principles.

While the book is internally consistent, it could be improved by making more acknowledgements of alternative notations and vocabulary found across disciplines.

The chapters are topical and lend themselves to modularity. This is a strength of the text. Given that much of the data is generated by students via collaboration, the modularity may actually pose problems since data is at risk of disappearing if not clearly recorded by students.

I LOVE that solutions to problems are found immediately after the problem sets (rather than at the end of the book). This is a phenomenal innovation! I'm also excited that discussion prompts, collaborative explorations, and labs are integrated through each topical chapter rather than relegated to the end of the chapter. It's as if students have access to the lesson plans and can follow along with classroom prompts and exercises appearing along the way. I no longer have to worry about projecting problems or displaying discussion prompts to accommodate all learners. It's all right in front of them in the text!

I have no major concerns in this respect. The text is not interactive (as some other statistics tests are) so students must manually turn to large data sets and/or to appendices. Bookmarking would be a great addition so that these can be accessed by clicking directly from an exercise.

The book consistently employs the English language correctly.

I certainly found no evidence is insensitivity in the text. In fact the primary audience of the text is a racially diverse community college student body. The problems and activities reflect this diversity and promote a level of cultural competency rivaled by very few, if any, texts.

It's not entirely clear to me what makes this text more collaborative than other instructional materials. While I certainly find the lessons in the book to be active, that does not necessarily imply the titular (collaborative) characteristic. Students collect much of their own data and are charged with working problems in groups or together as a class, but few of the labs and exercises motivate a need for collaboration. Furthermore, students are not given strategies for collaborating statistically and/or mathematically and are not given a satisfactory justification for why collaboration is merited. I don't find this to be a weakness; the active dimensions are a strength of the book, but not wholly collaborative.

This text covers all of the topics required in most introductory statistics courses – at least social science statistics. Students typically struggle … read more

This text covers all of the topics required in most introductory statistics courses – at least social science statistics. Students typically struggle with hypothesis testing. This textbook provides thorough coverage of this topic and many practice questions to allow students to improve their understanding of this topic. In my opinion, it covers probability theory a bit more than necessary for undergraduates, but it is better to err on the side of including rather than excluding. I do wish, however, that this textbook used R (which is free statistical software) rather than a graphing calculator, which can be expensive and is rarely used in graduate programs. It is relatively easy, however, to create lab work using R rather than a graphing calculator as used in this textbook. Overall, this text is substantively comprehensive and includes a useful index and glossary.

This text is accurate. I may have encountered one small error in the answers provided for a practice problem, but that occurs in expensive statistics textbooks as well and the important points and concepts are accurate, error-free, and unbiased.

Introductory statistics is unlikely to change very much in the coming decades. This textbook will not become obsolete anytime in the near future. The specific examples and problems also use current issues and, although the relevance of the issues or specific data used in practice or homework problems may become outdated, it will be very easy to update those things. In some cases, the questions are not always very applicable to students’ lives. To make the examples and data in the problems more interesting to students, I often change the text of the problems. This is very easy to do given the impressive number of questions or problems provided in the textbook. Graphing calculators seems more likely than the content to become outdated in the near future. Creating lab work using R rather than a graphing calculator is not difficult.

This textbook provides impressively clear explanations of concepts and methods. I was concerned about this when using this textbook, but was pleased with the clarity of the text for student understanding. The graphics are not as impressive as in expensive textbooks, but this trade-off seems well worth the difference in price.

This text uses consistent terminology and framework. For example, each chapter follows a consistent structure and provides practice and homework problems in a similar format. This consistency makes using this textbook easier for faculty.

This textbook is organized in a logical progression through introductory statistics. However, if instructors chose to teach chapters in a different order, that seems possible. For example, I believe Chapter 12 Linear Regression and Correlation could be covered earlier than some of the other topics without tremendous confusion on the part of students. There are certainly some exceptions, however. For example, Chapter 10 Hypothesis Testing: Two Means, Paired Data, Two Proportions logically follows Chapter 9: Hypothesis Testing: Single Mean and Single Proportion. Covering these chapters in a different order would likely be more difficult and confuse students. It is also possible to omit certain topics (e.g., covering less probability theory than the textbook does), with limited problems because the chapters often stand on their own as individual units.

This textbook is well organized and follows a logical and clear progression through the concepts and skills required for introductory statistics.

The text is available as a pdf, which is easy to use and search and students should all be able to access it readily. Its availability in electronic book format is convenient and students may prefer that version. The graphics are not as impressive as in expensive textbooks, but this trade-off seems well worth the difference in price.

The grammar in the textbook is fine and has no noticeable problems.

The textbook is not culturally insensitive. The questions and examples are inclusive of individuals from a variety of backgrounds. Having said that, the question content is not always highly applicable to students’ lives. However, I am impressed with the number of practice questions and had no problem editing some of the questions to fit students’ lives better and make the material more interesting for them.

Sometimes this textbook loses sight of the bigger picture. For example, I want students to be able to critically assess statistics they hear in the news or in advertisements. For example, I think this text could do a better job of emphasizing and illustrating that correlation does not equal causation, or what they should think about when they hear a statistic quoted on television – how did they sample, are there any hidden design issues that raise doubt about that statistic? I am not suggesting these topics are not covered at all (e.g., there is a note on page 534 that correlation does not imply causation), just that instructors may need to emphasize those bigger picture perspectives on their own since the textbook does not always remind students about those things.

The text is very comprehensive of the materials I teach in a first semester statistics course. I sometimes include Two-Way ANOVA, but not always … read more

The text is very comprehensive of the materials I teach in a first semester statistics course. I sometimes include Two-Way ANOVA, but not always depending on how well student progress through the preceding materials. This would not, however, impact my decision to use this text.

I found the book to be very accurate. The formulas were well presented. Definitions of terms were accurate and easily understood even by those with limited mathematical background.

The content is current and the examples involve common items, events, and situations to almost everyone's lives. This should give the text a very high level of longevity and the relevance is fantastic across a wide range of academic programs.

The clarity is exceptional. The definitions, descriptions, formulas, examples, and problems all were presented in very accessible terms and were easy to understand. The material was presented using very creative approaches that made it interesting and (dare I say) entertaining to read.

The text is consistent. The material was presented in a logical flow that built appropriately from one topic to the next.

The layout and organization are consistent with many other statistics texts. The chapters are logical and well ordered. The index makes it particularly easy to locate specific terms, tests, practice problems, and homework assignments.

Very appropriate and logical organization. Information is presented from the simple to the more complex types of analyses. Each chapter builds on the material presented in earlier chapters in an appropriate order.

Everything appeared to be clear, appropriately sized, etc. No real issues noted related to the interface. There were some diagrams that seemed a bit over-sized, but that certainly is not a problem.

The only thing I noted was in the very first paragraph where a sentence ended with ?". That period outside the end quote should not be present. Other than that, I did not note any other grammatical or punctuation problems. And the readability was outstanding for a statistics text.

No problems noted in this area. The text appropriately refers to a number of minority cultures.

LOVE this text! The definitions and formulas are all explained in clear terms. The examples are interesting and clearly illustrate the intended concept. The formulas are presented in both words and symbols, which is wonderful for students who do not have a strong mathematical background. The problems are interesting. I loved the Hamlet exercise that represented many statistical concepts. Extremely creative educational methods on the part of these authors. This text definitely busts the myth that statistical texts have to be dull, boring tomes that are better suited to ending insomnia. This one is interesting and vibrant, while still providing good, solid instruction. I'm sure my students would love this book and I would love to teach from it. Well done!

## Table of Contents

- Preface
- Additional Resources
- Author Acknowledgements
- Student Welcome Letter
- 1. Sampling and Data
- 2. Descriptive Statistics
- 3. Probability Topics
- 4. Discrete Random Variables
- 5. Continuous Random Variables
- 6. The Normal Distribution
- 7. The Central Limit Theorem
- 8. Confidence Intervals
- 9. Hypothesis Testing: Single Mean and Single Proportion
- 10. Hypothesis Testing: Two Means, Paired Data, Two Proportions
- 11. The Chi-Square Distribution
- 12. Linear Regression and Correlation
- 13. F Distribution and ANOVA
- 14. Appendix
- 15. Tables

## About the Book

Collaborative Statistics was written by Barbara Illowsky and Susan Dean, faculty members at De Anza College in Cupertino, California. The textbook was developed over several years and has been used in regular and honors-level classroom settings and in distance learning classes. Courses using this textbook have been articulated by the University of California for transfer of credit. The textbook contains full materials for course offerings, including expository text, examples, labs, homework, and projects. A Teacher’s Guide is currently available in print form and on the Connexions site at and supplemental course materials including additional problem sets and video lectures are available. The on-line text for each of these collections collections will meet the Section 508 standards for accessibility.

An on-line course based on the textbook was also developed by Illowsky and Dean. It has won an award as the best on-line California community college course. The on-line course will be available at a later date as a collection in Connexions, and each lesson in the on-line course will be linked to the on-line textbook chapter. The on-line course will include, in addition to expository text and examples, videos of course lectures in captioned and non-captioned format.

The original preface to the book as written by professors Illowsky and Dean, now follows:

This book is intended for introductory statistics courses being taken by students at two– and four–year colleges who are majoring in fields other than math or engineering. Intermediate algebra is the only prerequisite. The book focuses on applications of statistical knowledge rather than the theory behind it. The text is named Collaborative Statistics because students learn best by doing. In fact, they learn best by working in small groups. The old saying “two heads are better than one” truly applies here.

**Our emphasis in this text is on four main concepts:**

- thinking statistically
- incorporating technology
- working collaboratively
- writing thoughtfully

These concepts are integral to our course. Students learn the best by actively participating, not by just watching and listening. Teaching should be highly interactive. Students need to be thoroughly engaged in the learning process in order to make sense of statistical concepts. Collaborative Statistics provides techniques for students to write across the curriculum, to collaborate with their peers, to think statistically, and to incorporate technology.

This book takes students step by step. The text is interactive. Therefore, students can immediately apply what they read. Once students have completed the process of problem solving, they can tackle interesting and challenging problems relevant to today’s world. The problems require the students to apply their newly found skills. In addition, technology (TI-83 graphing calculators are highlighted) is incorporated throughout the text and the problems, as well as in the special group activities and projects. The book also contains labs that use real data and practices that lead students step by step through the problem solving process.

At De Anza, along with hundreds of other colleges across the country, the college audience involves a large number of ESL students as well as students from many disciplines. The ESL students, as well as the non-ESL students, have been especially appreciative of this text. They find it extremely readable and understandable. Collaborative Statistics has been used in classes that range from 20 to 120 students, and in regular, honor, and distance learning classes.

## About the Contributors

### Author(s)

**Barbara Illowsky** is a Professor of Mathematics & Statistics at De Anza College in Cupertino, California. PhD in Education from Capella University.

**Susan Dean **is a mathematics professor at De Anza College in Cupertino, California.