Python for Everybody: Exploring Data Using Python 3
Charles Severance, University of Michigan
Pub Date: 2016
ISBN 13: 978-1-5300511-2-0
Conditions of Use
This book is a remix of the excellent Think Python book by Allen Downey. The book keeps the clarity of the original while including examples skewed read more
This book is a remix of the excellent Think Python book by Allen Downey. The book keeps the clarity of the original while including examples skewed towards data applications, particularly text processing. The remix adds chapters on regular expressions, web services, databases and visualization. It drops topics like algorithm analysis and GUIs, and slims down the discussion of classes significantly. These changes make this a good information science textbook and less of a computer science textbook. Students are led on the path of developing web-scraping programs. Programs that can pull raw data from online sources and process it a useful way. The book does not cover data science, plotting, or Python libraries like pandas. The coverage of the Python language is generally thorough, but misses topics like list comprehensions and lambda expressions. The additions are well-thought out and provide students with a useful toolkit that they can start applying right away. The visualization chapter is the only one that is lacking. It provides three well-documented examples of web-scraping programs that use visualization. But it does not provide a general treatment of visualization tools nor a discussion of how to use them effectively.
The overview of the Python language is accurate. The discussion of applications is accurate with regards to common practices of web-scraping programs.
The use of Python 3 ensures that chapters regarding syntax and data structures will remain valid for the foreseeable future. Chapters regarding web services, databases and visualization are more at risk. The author plays it conservatively by discussing XML and JSON for web services and SQLite for databases. These are good choices because they are widely used, but increasingly XML is falling by the wayside and tasks that used to be handled with relational databases are instead being run on NoSQL systems. One of the three visualization examples is based on the Gmane interface to mailing lists, which is likely not very relevant for students and Gmane's continued existence is in doubt. These chapters may need to be updated in a few years.
The book does an excellent job of explaining the Python language, always providing a context in which topics are useful. Information is imparted, not just to be comprehensive, but to help the reader be a better programmer. The examples are well-explained and motivated. The author frequently includes interludes on understanding errors and sections on debugging, providing valuable information for a novice programmer.
The chapters have a consistent style and use of terminology. The Python in the book follows the conventions in the Style Guide for Python.
There is a limit to how modular an introductory textbook on programming can be. The book generally strikes a good balance. Chapters do build on each other, but a course could skip some chapters without encountering much loss of continuity. The later chapters that focus on building up to web-scraping programs are not particularly modular and would need to be taught in order. The chapter on visualization is unfortunately dependent on the database chapter. The book would benefit from making visualization stand more on its own.
The book is well-organized and has a coherent flow through the chapters. Some topics, such as exception handling, are introduced earlier than is typical. But these introductions are done with a light touch and with an eye towards why the topic is immediately useful.
The links to code and outside sites worked. Code downloads nicely into a directory with a helpful Readme file.
No grammatical errors were found by this reviewer.
The book doesn't make use of many cultural references. The examples of text processing are clear and straight-forward and shouldn't be an issue for readers whose first language is not English.
A clear, well-constructed book that would serve an information science curriculum well.
Table of Contents
- 1 Why should you learn to write programs?
- 2 Variables, expressions, and statements
- 3 Conditional execution
- 4 Functions
- 5 Iteration
- 6 Strings
- 7 Files
- 8 Lists
- 9 Dictionaries
- 10 Tuples
- 11 Regular expressions
- 12 Networked programs
- 13 Using Web Services
- 14 Object-Oriented Programming
- 15 Using databases and SQL
- 16 Visualizing data
- A Contributions
- B Copyright Detail
About the Book
I never seemed to find the perfect data-oriented Python book for my course, so I set out to write just such a book. Luckily at a faculty meeting three weeks before I was about to start my new book from scratch over the holiday break, Dr. Atul Prakash showed me the Think Python book which he had used to teach his Python course that semester. It is a well-written Computer Science text with a focus on short, direct explanations and ease of learning. The overall book structure has been changed to get to doing data analysis problems as quickly as possible and have a series of running examples and exercises about data analysis from the very beginning.
Chapters 2–10 are similar to the Think Python book, but there have been major changes. Number-oriented examples and exercises have been replaced with data- oriented exercises. Topics are presented in the order needed to build increasingly sophisticated data analysis solutions. Some topics like try and except are pulled forward and presented as part of the chapter on conditionals. Functions are given very light treatment until they are needed to handle program complexity rather than introduced as an early lesson in abstraction. Nearly all user-defined functions have been removed from the example code and exercises outside of Chapter 4. The word “recursion”1 does not appear in the book at all.
In chapters 1 and 11–16, all of the material is brand new, focusing on real-world uses and simple examples of Python for data analysis including regular expressions for searching and parsing, automating tasks on your computer, retrieving data across the network, scraping web pages for data, object-oriented programming, using web services, parsing XML and JSON data, creating and using databases using Structured Query Language, and visualizing data.
The ultimate goal of all of these changes is a shift from a Computer Science to an Informatics focus is to only include topics into a first technology class that can be useful even if one chooses not to become a professional programmer.
About the Contributors
Charles Severance is a Clinical Associate Professor at the University of Michigan School of Information.