Book Review: Data Science for Business

Data Science for Business by Foster Provost and Tom Fawcett is a very important book about data mining and data analytic thinking.  In 1971, Abbie Hoffman shocked the world when he demanded hippie readers (at the time, a likely oxymoron) “Steal This Book“.  While I wouldn’t go so far as to encourage current and future data scientists to shoplift, I will demand that they READ THIS BOOK!

When I began my Statistics career in the mid-80’s, data was difficult and expensive to come by. Today, we’re living in a world of far too much data, vast amounts of cheap computing power, and way too many poorly defined questions.  Mix them all together and you’re guaranteed to make a mess.

Going from data dearth to plethora presents substantive issues.  In business, the balance between gut feel decision-making and analysis paralysis is changing, rapidly.  Whether it moves too far from gut to paralysis, only time will tell. Through Data Science for Business, Provost and Fawcett offer practitioners a guide to equilibrium.

Data scientists are all the rage in today’s world of giga, tera, and petabyte databases.  Thirty years ago, we were called statisticians.  For today’s Internet/iPhone/Android/App obsessed “X”ers and “Y”ers, I suppose “Statistician” conjures up images of old balding men and pocket protectors.  It’s just not sexy enough. Frankly, I don’t care what you call them as long as they’re good at what they do!  It would appear that Provost and Fawcett are of a similar mindset.

Read this book and you’ll find yourself moving briskly down the road towards data analytic enlightenment. While not highly technical, the authors covers each topic with enough rigor to appreciate the tools being presented and the insights being offered.

From the outset, the authors are clear about the book’s objectives: “The primary goals of this book are to help you view business problems from a data perspective and understand principles of extracting useful knowledge from data.  There is fundamental structure to data-analytic thinking, and basic principals that should be understood.  There are also particular areas where intuition, creativity, common sense, and domain knowledge must be brought to bear…  As you get better at data-analytic thinking you will develop intuition as to how and where to apply creativity and domain knowledge.”

Bravo!  This paragraph sent chills down my spine.  I thought of all those undergrad and graduate students studying Statistics at Universities all over the world, my daughter included, who are being bombarded by one math or statistics class after another (Calculus III, Math Stat I and II, Linear Algebra, etc.).  Yet, far too often, they enter the real world lacking “data analytic thinking” or a sense of “basic principals”  They do, however, have a sense of being overwhelmed and under prepared.  Place five word problems before them at the beginning of each day, tell them they need to be solved by 5 p.m., and they’ll do pretty well. Unfortunately, that’s not how the real world works.  The epic battle between “frequentists” and “Bayesians”, takes a back seat to the what should be the real controversy in statistics departments around the world, the balance between “application” and “theory”.

I’d have gladly sacrificed the heart-stopping experience of deriving the Dirichlet Distribution in exchange for learning how to manage the process of modeling just about any real world consumer behavior.  Far too often, graduating data scientists are sent into the real world with nothing more than a cursory knowledge of how to approach, tackle, or manage even the most basic real-world problems.  It’s sort of like learning to fly by reading a book and then being asked to jump in the cock-pit.  “Room for one more!”

The book’s “primary goal” should be the walking orders of every statistics program at any college or university anywhere.  Theoretical statisticians certainly have their place, but not to the exclusion of well-prepared data scientists with “real world” experiences.  While Sheldon, the theoretical physicist, endlessly belittles the work of roomy Leonard, the experimental physicist, and Howard, the M.I.T. engineer, there’s little doubt that each requires the other in order to survive and thrive.

From the outset (page 2), the authors state,  “Data mining is a craft.  It involves the application of a substantial amount of science and technology, but the proper application still involves art as well.” Absolutely true!  It’s great to read this stuff!  This is followed by a concise discussion of CRISP-DM, a well-defined data mining process, whose concepts are elementary, essential, and integral to the responsible, proper, and successful practice of data mining.

From this point on, the authors proceed to accomplish their primary goal.  They present such topics as predictive modeling, correlation, classification, clustering, regression, logistic regression, linear discriminants, and much more.  Their presentations are user friendly, their real world examples are interesting, and their guidance and insights are extremely valuable.

My criticisms are limited to their website.  The Data Science for Business site leaves me wanting more real world examples to enjoy, access to more resources and tools of the trade, more references to peruse, and a more rigorous approach to some of the solutions.  Perhaps Data Science for Business the sequel is on the horizon?

Whether you’re a seasoned Statistician (or, Data Scientist), a young aspiring novice, or an adventurous business person looking to expand his/her horizons, Data Science for Business by Foster Provost and Tom Fawcett is well worth the price of admission and the reading time you’ll invest.

Foster Provost and Tom Fawcett state, “[i]deally, we envision a book that any data scientist would give to his collaborators…”  I’ll do them one better, I’m giving it to my daughter!

Copyright (c) 2015 Bayes, LLC.  All Right Reserved.

Leave a Reply

Your email address will not be published. Required fields are marked *