Practical Statistics – Statistical methods are a key part of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.
With this book, you’ll learn:
- Why exploratory data analysis is a key preliminary step in data science
- How random sampling can reduce bias and yield a higher quality dataset, even with big data
- How the principles of experimental design yield definitive answers to questions
- How to use regression to estimate outcomes and detect anomalies
- Key classification techniques for predicting which categories a record belongs to
- Statistical machine learning methods that “learn” from data
- Unsupervised learning methods for extracting meaning from unlabeled data
Table of Contents
About this Book
Data science is a fusion of multiple disciplines, including statistics, computer science, information technology and domain specific fields. As a result, a several different terms could be used to reference a given concept. Key terms and their synonyms will be highlighted throughout the book in a sidebar within the text.
This book is aimed at the data scientist with some familiarity with the R programming language, and with some prior (perhaps spotty or ephemeral) exposure to statistics. Both of us came to the world of data science from the world of statistics, and have some appreciation of the contribution that statistics can make to the art of data science. At the same time, we are well aware of the limitations of traditional statistics instruction: statistics as a disciple is a century and a half old, and most statistics textbooks and courses are laden with the momentum and inertia worthy of an ocean liner.
Two goals underlie this book:
- To lay out, in digestible, navigable and easily referenced form, key concepts from statistics that are relevant to data science.
- To explain which concepts are important and useful from a data science perspective, which are less so, and why.
About the Author
Peter Bruce founded and grew the Institute for Statistics Education at Statistics.com, which now offers about 100 courses in statistics, roughly a third of which are aimed at the data scientist. In recruiting top authors as instructors and forging a marketing strategy to reach professional data scientists, Peter has developed both a broad view of the target market, and his own expertise to reach it.
Andrew Bruce has over 30 years of experience in statistics and data science in academia, government and business. He has a Ph.D. in statistics from the University of Washington and published numerous papers in refereed journals. He has developed statistical-based solutions to a wide range of problems faced by a variety of industries, from established financial firms to internet startups, and offers a deep understanding the practice of data science