# Linear Regression Using R: An Introduction to Data Modeling

David J. Lilja, University of Minnesota

Copyright Year: 2016

ISBN 13: 9781946135001

Publisher: University of Minnesota Libraries Publishing

Language: English

## Formats Available

## Conditions of Use

Attribution-NonCommercial

CC BY-NC

## Reviews

For a introduction/tutorial to linear regressions with R, this book quickly guides a novice to building a linear model and testing it. read more

For a introduction/tutorial to linear regressions with R, this book quickly guides a novice to building a linear model and testing it.

My only problem is, the author calls variables in data sets "parameters". Within the context of linear regressions, I believe the term "parameters" should be reserved for coefficients in the model that will be estimated.

By showing linear regressions with the statistical software R, the book gives a modern and hands on approach the material.

I think the best thing about this book is it's clarity. The clear and concise language of this book makes it very friendly to readers.

By using one variable that is modeled throughout the entire book, it allows for a nice connectiveness between chapters.

The book is nicely broken down into easily digestible parts.

Topics appropriately build on each another.

I experienced no interface issues.

I detected no grammar issues.

The book is free of any cultural sensitive topics.

For the potential reader with little R programming and data science background, this book quickly allows someone to build a linear model from a given data set. Also, the book has a nice introduction to training and testing a linear model. With the authors clear and easy to read explanations, this will be a text that I will refer to people to for quickly running linear regressions in R.

There are basic functions such as class() or typeof() that should be introduced early on for any user of R. Also, A practical explanation of residual standard error or what a nonsensical model for the example used throughout the text would be... read more

There are basic functions such as class() or typeof() that should be introduced early on for any user of R. Also, A practical explanation of residual standard error or what a nonsensical model for the example used throughout the text would be helpful for a beginner.

Using vocabulary to help student differentiate between an assumed model and a prediction equation would be helpful if you are planning to use this as a classroom text. Depending on how you are used to teaching regression, you may find many problematic uses of vocabulary or you may find none.

This text could easily be updated by either replacing parts or by adding new material.

The main vocab is touched on and explained well, minus some possible misuse of terminology depending upon how one teaches regression. As for technical R vocab, the use of 'row' early on in the text to describe the header of a data frame could also be problematic since the first row of a data frame typically refers to the first row of data, not the names of the columns.

Delivery and organization is consistent throughout.

Though additional material is needed in between, what is presented is nicely laid out.

The text follows the typical presentation of a traditional look at regression, which makes for a text that is clear and well organized.

There are places where code chunks unnecessarily spill over to the next page and some figures/tables that need to be relocated so that the reader does not come to them before they have actually been referred to in the text.

No issues

I think all is fine. I wouldn't see any particular computer processor feeling like they have been misrepresented or purposely left out.

I hope this review actually goes through this time??? It is my third attempt at trying to complete this before Qualtrics times me out. Sorry that I am so slow....... Also, below is a paragraph style review. I wrote this before seeing the actual format was going to be a survey type of setup. Though well written and organized, this book may not be your "go to" resource if you're looking for a textbook or supplementary material when teaching an intro R and/or regression course. The author does use an example throughout that many can understand at least to some degree (influencers of computer performance), which exposes the reader to the useful concept that knowledge about a data set can be extremely useful. Further, the introduction of functions like attach() and update() are examples of how the author has nicely woven into the content a practical approach to how coding is part of analysis. The exploratory use of plot() to visualize the data before introducing a one-factor regression is another positive example of this. However, there are some places throughout the book that might make you seriously question whether you could teach a course using this book (either as a stand-alone resource or just a supplementary one). The wording in some places can be confusing or even contradictory depending upon how you present regression, especially as an introduction to the topic where consistent use of vocabulary can be crucial. For example, consider in Section 3.2 where the mathematical form in (3.1) is referred to as the 'simplest regression model' yet the regression equation in (3.2) is similarly referred to as the 'final regression model'. Personally, I try to differentiate these two things for students first learning these concepts by stressing 'assumed model' using y as response and 'prediction equation' using y_hat as the response. Maybe this example isn't problematic for you, nonetheless I still suggest you carefully look through the entire book before adopting it as a resource for your students.

This is a tutorial that covers basic areas and ideas of linear regression. It covers this material through carefully selected examples. R, the software used to present examples in the text, is an open source software which is appropriate and... read more

This is a tutorial that covers basic areas and ideas of linear regression. It covers this material through carefully selected examples. R, the software used to present examples in the text, is an open source software which is appropriate and convenient for an open textbook. The book provides an effective and complete index and table of content with page numbers as links to the text.

The open source software (R) used to present data is as accurate as any commercially available software. The rest of the content is accurate and error-free.

As in introductory text, the content is up-to-date. As a basic topic in regression theory, linear regression is here to stay. With the current growth of data mining it is difficult to imagine the future of data analytics without linear regression. The text is written and arranged in such a way that important updates will be easy to implement.

The text is clear and accessible to readers with standard elementary statistical background. It provides explicit guidance for R and the context for statistical terms is clear. The concepts are well explained.

The exposition is consistently clear and well-motivated by examples. The level and presentation is consistent as well. The text uses consistent, standard, and elementary terminology appropriately introduced to deal with linear regression models.

The text, not overly self-referential, is presented in eight chapters, each with a hyperlink to the text. Each chapter has short sections. In addition, each page number in the Index is a hyperlink to the text.

The topics in the text are well motivated by examples that should make the subject more interesting to the reader. The organization is excellent, making each topic clear and easy to read.

It would have been nice to have color images in the Figures. Also, Figure 4.1 (CHAPTER 4. MULTI-FACTOR REGRESSION) would be clearer if it showed only a few of the pairwise comparisons for the Int2000 data frame. But these are just two minor issues of display.

I did not find grammatical errors.

The text is not culturally insensitive or offensive in any way. It uses examples that are culturally neutral.

I would use this tutorial in any undergraduate course dealing with linear regression.

## Table of Contents

1 Introduction

- 1.1 What is a Linear Regression Model?
- 1.2 What is R?
- 1.3 What's Next?

2 Understand Your Data

- 2.1 Missing Values
- 2.2 Sanity Checking and Data Cleaning
- 2.3 The Example Data
- 2.4 Data Frames
- 2.5 Accessing a Data Frame

3 One-Factor Regression

- 3.1 Visualize the Data
- 3.2 The Linear Model Function
- 3.3 Evaluating the Quality of the Model
- 3.4 Residual Analysis

4 Multi-factor Regression

- 4.1 Visualizing the Relationships in the Data
- 4.2 Identifying Potential Predictors
- 4.3 The Backward Elimination Process
- 4.4 An Example of the Backward Elimination Process
- 4.5 Residual Analysis
- 4.6 When Things Go Wrong

5 Predicting Responses

- 5.1 Data Splitting for Training and Testing
- 5.2 Training and Testing
- 5.3 Predicting Across Data Sets

6 Reading Data into the R Environment

- 6.1 Reading CSV files

7 Summary

8 A Few Things to Try Next

Bibliography

Index

## About the Book

*Linear Regression Using R: An Introduction to Data Modeling* presents one of the fundamental data modeling techniques in an informal tutorial style. Learn how to predict system outputs from measured data using a detailed step-by-step process to develop, train, and test reliable regression models. Key modeling and programming concepts are intuitively described using the R programming language. All of the necessary resources are freely available online.

## About the Contributors

### Author

**David J. Lilja** received a Ph.D. and an M.S., both in Electrical Engineering, from the University of Illinois at Urbana-Champaign, and a B.S. in Computer Engineering from Iowa State University in Ames. He is currently the Louis John Schnell Professor of Electrical and Computer Engineering at the University of Minnesota in Minneapolis, where he also serves as a member of the graduate faculties in Computer Science, Scientific Computation, and Data Science. Previously, he served ten years as the head of the ECE department at the University of Minnesota, worked as a research assistant at the Center for Supercomputing Research and Development at the University of Illinois, and as a development engineer at Tandem Computers Incorporated in Cupertino, California. He received a Fulbright Senior Scholar Award to visit the University of Western Australia, and was awarded a McKnight Land-Grant Professorship by the Board of Regents of the University of Minnesota. He has chaired and served on the program committees of numerous conferences, and was a distinguished visitor of the IEEE Computer Society. He was elected a Fellow of the Institute of Electrical and Electronics Engineers (IEEE) and a Fellow of the American Association for the Advancement of Science (AAAS) for contributions to the statistical analysis of computer performance. He also is a member of the ACM, and is a registered Professional Engineer. His main research interests include computer architecture, parallel processing, computer systems performance analysis, approximate computing, and storage systems.