Important: The focus of this course is on math - specifically, data-analysis concepts and methods - not on Excel for its own sake. We use Excel to do our calculations, and all math formulas are given as Excel Spreadsheets, but we do not attempt to cover Excel Macros, Visual Basic, Pivot Tables, or other intermediate-to-advanced Excel functionality. This course will prepare you to design and implement realistic predictive models based on data. In the Final Project (module 6) you will assume the role of a business data analyst for a bank, and develop two different predictive models to determine which applicants for credit cards should be accepted and which rejected. Your first model will focus on minimizing default risk, and your second on maximizing bank profits. The two models should demonstrate to you in a practical, hands-on way the idea that your choice of business metric drives your choice of an optimal model.
The second big idea this course seeks to demonstrate is that your data-analysis results cannot and should not aim to eliminate all uncertainty. Your role as a data-analyst is to reduce uncertainty for decision-makers by a financially valuable increment, while quantifying how much uncertainty remains. You will learn to calculate and apply to real-world examples the most important uncertainty measures used in business, including classification error rates, entropy of information, and confidence intervals for linear regression. All the data you need is provided within the course, all assignments are designed to be done in MS Excel, and you will learn enough Excel to complete all assignments. The course will give you enough practice with Excel to become fluent in its most commonly used business functions, and you’ll be ready to learn any other Excel functionality you might need in the future (module 1). The course does not cover Visual Basic or Pivot Tables and you will not need them to complete the assignments. All advanced concepts are demonstrated in individual Excel spreadsheet templates that you can use to answer relevant questions.
The Analysis Toolpak is an Excel add-in (add-in: A supplemental program that adds custom commands or custom features to Microsoft Office.) program that is available when you install Microsoft Office or Excel. To use it in Excel, however, you need to load it first. On the Tools menu, click Add-Ins.
You will emerge with substantial vocabulary and practical knowledge of how to apply business data analysis methods based on binary classification (module 2), information theory and entropy measures (module 3), and linear regression (module 4 and 5), all using no software tools more complex than Excel. This course will prepare you to design and implement realistic predictive models based on data. In the Final Project (module 6) you will assume the role of a business data analyst for a bank, and develop two different predictive models to determine which applicants for credit cards should be accepted and which rejected.
Your first model will focus on minimizing default risk, and your second on maximizing bank profits. The two models should demonstrate to you in a practical, hands-on way the idea that your choice of business metric drives your choice of an optimal model.The second big idea this course seeks to demonstrate is that your data-analysis results cannot and should not aim to eliminate all uncertainty. Your role as a data-analyst is to reduce uncertainty for decision-makers by a financially valuable increment, while quantifying how much uncertainty remains. You will learn to calculate and apply to real-world examples the most important uncertainty measures used in business, including classification error rates, entropy of information, and confidence intervals for linear regression. All the data you need is provided within the course, and all assignments are designed to be done in MS Excel.
The course will give you enough practice with Excel to become fluent in its most commonly used business functions, and you’ll be ready to learn any other Excel functionality you might need in future (module 1). The course does not cover Visual Basic or Pivot Tables and you will not need them to complete the assignments. All advanced concepts are demonstrated in individual Excel spreadsheet templates that you can use to answer relevant questions. You will emerge with substantial vocabulary and practical knowledge of how to apply business data analysis methods based on binary classification (module 2), information theory and entropy measures (module 3), and linear regression (module 4 and 5), all using no software tools more complex than Excel. Separating collections into two categories, such as “buy this stock, don’t but that stock” or “target this customer with a special offer, but not that one” is the ultimate goal of most business data-analysis projects.
There is a specialized vocabulary of measures for comparing and optimizing the performance of the algorithms used to classify collections into two groups. You will learn how and why to apply these different metrics, including how to calculate the all-important AUC: the area under the Receiver Operating Characteristic (ROC) Curve.
In this module, you will learn how to calculate and apply the vitally useful uncertainty metric known as “entropy.” In contrast to the more familiar “probability” that represents the uncertainty that a single outcome will occur, “entropy” quantifies the aggregate uncertainty of all possible outcomes. The entropy measure provides the framework for accountability in data-analytic work. Entropy gives you the power to quantify the uncertainty of future outcomes relevant to your business twice: using the best-available estimates before you begin a project, and then again after you have built a predictive model. The difference between the two measures is the Information Gain contributed by your work.
The Linear Correlation measure is a much richer metric for evaluating associations than is commonly realized. You can use it to quantify how much a linear model reduces uncertainty. When used to forecast future outcomes, it can be converted into a “point estimate” plus a “confidence interval,” or converted into an information gain measure. You will develop a fluent knowledge of these concepts and the many valuable uses to which linear regression is put in business data analysis. This module also teaches how to use the Central Limit Theorem (CLT) to solve practical problems.
The two topics are closely related because regression and the CLT both make use of a special family of probability distributions called “Gaussians.” You will learn everything you need to know to work with Gaussians in these and other contexts. Formulate data questions, explore and visualize large datasets, and inform strategic decisions. In this Specialization, you’ll learn to frame business challenges as data questions.
You’ll use powerful tools and methods such as Excel, Tableau, and MySQL to analyze data, create forecasts and models, design visualizations, and communicate your insights. In the final Capstone Project, you’ll apply your skills to explore and justify improvements to a real-world business process. The Capstone Project focuses on optimizing revenues from residential property, and Airbnb, our Capstone’s official Sponsor, provided input on the project design. Airbnb is the world’s largest marketplace connecting property-owner hosts with travelers to facilitate short-term rental transactions. The top 10 Capstone completers each year will have the opportunity to present their work directly to senior data scientists at Airbnb live for feedback and discussion.