In an era where data drives decisions in every field, the importance of statistical reasoning and data analytics has never been greater. This book is a teaching resource that connects statistical techniques to practical business contexts in a way that is intuitive, relevant, and thoughtful. It strives to explain methods clearly and rigorously, enabling students to develop a strong conceptual foundation and a critical approach to data-driven decision-making.
Salient features:
Shubhabrata Das has been a faculty member at the Indian Institute of Management Bangalore (IIMB) since December 1999 and has served as a full professor since 2005. He has been teaching the material presented in this book across all of IIMB’s degree-granting programmes as well as in its short- and long-duration executive education programmes. He has provided training and consultancy services in the areas of Business Statistics, Business Analytics, Market Research, Business Forecasting, and Insurance for a range of government organisations and leading corporate clients in India. His academic honours include the inaugural Research Professor Award from IIMB, the IBM Faculty Award, the Best Paper Award at the APRIA conference, and multiple scholastic accolades from the West Bengal Board of Higher Secondary Education, ISI, and UNC.
Soudeep Deb is an Associate Professor in the Decision Sciences Area at the Indian Institute of Management Bangalore (IIMB). He obtained his Ph.D. in Statistics from the University of Chicago, and then worked as the Senior Lead Data Scientist at NBC Universal in New York, USA, for around two years. Dr. Deb’s core research focus is on Bayesian methods, time series, spatio-temporal modelling, machine learning approaches, and their applications in various fields, such as environmental science, finance, social studies, and sports. He primarily teaches courses on introductory statistics, inference, multivariate data analysis, as well as fun topics like sports analytics. Please refer to his webpage https://soudeepd.github.io/ for more details.
About the Authors Testimonials Foreword Preface Acknowledgements
PART A: FOUNDATIONS
Chapter 1: Introduction and Overview 1.1 Introduction to Business Data Analytics – Data to Decisions 1.2 History and Interconnections Between Branches of Analytics 1.3 An Overview of Statistical Methods 1.4 Applications of Data-driven Analytics in Business Fast-Moving Consumer Goods (FMCG) | Aggregator Industry or Platform Economy | Banking and Financial Services | Retail and E-commerce | Energy (Oil and Natural Gas, Renewable Energy) | Pharmaceuticals and Healthcare | Sports and Entertainment | Automotive | Real Estate | Tourism | Information Technology | Agriculture | Manufacturing | Logistics and Transportation | Telecommunications | Education 1.5 Role of Analytics in Management Disciplines 1.6 Structure of the Book 1.7 Software and Computational Aspects 1.8 Description of Data Sets for Running Case Studies Exercise
Chapter 2: Data Representation and Descriptive Statistics 2.1 Overview 2.2 Types of Data 2.3 Organising Data Using Arrays, Graphs, and Tables Stem and Leaf Display, Bar Charts, and Pie Charts | Frequency Distribution, Frequency Polygon, Histogram, and Ogive 2.4 Data Summarisation Quantiles, Percentiles, and Quartiles | Measures of Central Tendency | Measures of Dispersion or Variability | Skewness and Kurtosis | Important Results | Outlier Detection 2.5 Summary Statistics for Bivariate Data Cross-tabulation | Scatter Plot, Covariance, and Correlation | Measure of Dependency When One Variable is Quantitative and the Other Qualitative 2.6 Pivot Table Case Study: Supermarket Sales Analysis 2.7 Advanced Data Visualisation Time Series Data | Cross-Sectional Data | Spatial Data | Case Study: Spatial Analysis of Indian Weather 2.8 Best Practices for Data Handling and Cleaning Planning for Data Collection | Why is Data Cleaning Necessary? | Dealing with Missing Data | Data Transformation | General Guidelines for Graphical Data Representations 2.9 Case Study: Indian Start-up Funding Summary of Key Concepts and Formulae Practice Problems
Chapter 3: Vectors and Matrices 3.1 Overview 3.2 Sets, Relations, Functions 3.3 Vector Space 3.4 Matrices Introduction to Matrix Structure | Operations on Matrices | Matrices with Special Structures | Rank and Inverse | Determinants 3.5 Inner Product and Orthogonality 3.6 Eigenvalues Characteristic Roots, Eigenvalues, and Eigenvectors | Spectral Representation | Singular Value Decomposition 3.7 Linear and Quadratic Forms Linear System of Equations | Quadratic Forms | Positive Definiteness 3.8 Case Studies Unveiling Matrix Structures of Equal Correlation | Matrices in Action: Analysing Swiggy Data Summary of Key Concepts and Formulae Practice Problems
PART B: PROBABILITY AND DISTRIBUTIONS
Chapter 4: Probability 4.1 Introduction to Probability 4.2 Formalising Probability Notions Random Experiment and Events – Union, Intersection, Complementation – Mutually Exclusive (Disjoint) and Exhaustive Set of Events | Approaches to Defining Probability | Axioms of Probability – Probability Rules 4.3 Conditional Probability and the Notion of Independent Events Simpson’s Paradox 4.4 Bayes’ Theorem 4.5 Case Studies KJSS | A Tale of Two Mothers Summary of Key Concepts and Formulae Practice Problems
Chapter 5: Discrete Random Variables and Probability Distributions 5.1 Random Variables and Probability Distributions Discrete Versus Continuous Random Variable 5.2 General Discrete Distributions: Expected Value, Variance, and Other Characteristics Probability Mass/Density Function and Cumulative Distribution Function of a Discrete Random Variable | Expected Value (Mean) and Variance of a Discrete Random Variable | Higher Order Moments, Skewness, and Kurtosis | Mean (Expected Value) and Variance of a Linear Combination of Random Variables | Chebyshev’s Inequality | Simulation from a Discrete Probability Distribution | Percentiles and Other Measures of Central Tendency and Dispersion of a (Discrete) Probability Distribution 5.3 Discrete Uniform Distribution 5.4 Binomial Distribution 5.5 Poisson Distribution Poisson Approximation to Binomial Distribution | Simulation from Poisson Distribution 5.6 Joint and Conditional Distribution for Discrete Variables 5.7 Other Popular Discrete Distributions Geometric Distribution | Negative Binomial Distribution | Hypergeometric Distribution | Multinomial Distribution 5.8 Relationship between Different Discrete Distributions Summary of Key Concepts and Formulae Practice Problems
Chapter 6: Continuous Probability Distributions 6.1 Overview 6.2 General Continuous Probability Distributions and their Characteristics 6.3 Uniform Distribution 6.4 Normal (Gaussian) Distribution Standard Normal Distribution and Tables | Discussion: Stock-out at PAINTMART | Discussion: How Much to Pack? | Normal Approximation of Binomial and Poisson Distributions | Whither Approximation to Binomial/Poisson Distribution? 6.5 Exponential Distribution Memoryless Property of Exponential Distribution 6.6 Poisson Process: Inter-linkage between the Exponential and Poisson Distributions 6.7 Distributions Related to Normal Distributions Lognormal Distribution | Chi-Square Distribution | (Student’s) t Distribution | F Distribution 6.8 Joint and Conditional Distribution for Continuous Variables Joint Density Function of Continuous Variables | Marginal Density Function | Conditional Density | Independent Random Variables | Function of (Two) Random Variables and Its Expected Values | Covariance and Correlation | Illustrating the Computation in Example 6.24 6.9 Other Continuous Probability Distributions Gamma Distribution | Beta Distribution | Weibull Distribution | Pareto Distribution | Logistic Distribution 6.10 Interconnections between Different Probability Distributions Summary of Key Concepts and Formulae Practice Problems
Chapter 7: Direct Applications of Probability in Business Management 7.1 Overview Introduction to Optimisation | Simulation 7.2 Decision-making under Uncertainty in a Tree Structure Decision Analysis Without Probabilistic Assessment | Decision-Making with Expected Value Approach | Expected Value with Perfect Information and Expected Value of Perfect Information (EVPI) | Expected Value of Sample Information (EVSI) and Sampling Efficiency (SI) 7.3 Portfolio Diversification and Asset Allocation 7.4 The PERT-CPM Model 7.5 Dream11 – Probability in Fantasy Sport Choosing the Best Possible Fantasy Eleven | Which Dream11 Tournament(s) Should One Participate In? 7.6 Overbooking in Airlines Probabilistic Formulation of the Problem | Assuming Equal Chance of Cancellation at All Times | Assuming Chance of Cancellation Changes as a Function of Time 7.7 Pricing and Promotions Pricing of Airline Seats | Assessing the Cost and Benefit of a Promotion for F&H | Retail Markdown 7.8 Queuing Theory Probability and Distributions in Queuing | Categorisation of Queuing Models 7.9 Actuarial Science Summary of Key Concepts and Formulae Practice Problems
PART C: INFERENCE AND REGRESSION Chapter 8: Sampling 8.1 Introduction to Sampling History and Terminology | Sampling Versus Complete Enumeration – Sampling and Non-Sampling Error | Merits of Sampling 8.2 Random Sampling Methods Simple Random Sampling With Replacement (SRSWR) and Simple Random Sampling Without Replacement (SRSWOR) | Systematic Sampling | Stratified Sampling | Cluster Sampling | Other Random Sampling Methods 8.3 Non-random Sampling Methods Convenient Sampling | Purposive or Judgemental Sampling | Quota Sampling | Snowball Sampling | Voluntary Sampling 8.4 Questionnaire Design for Surveys 8.5 Other Issues and Challenges Related to Sampling Sampling When the Population is Infinite and/or Hypothetical in Nature | Channel of Collecting Survey Information | Process Versus Outcome | Paired Sampling | What Should the Sample Size Be? Summary of Key Concepts and Formulae Practice Problems
Chapter 9: Point Estimation and Sampling Distribution 9.1 Introduction to Estimation 9.2 Basic Properties of (Point) Estimators Unbiased Estimator | Bias of an Estimator | Standard Error of an Estimator | Comparing Two Groups or Populations 9.3 Other Desirable Properties of Estimators Minimum Variance Unbiased Estimator | Asymptotically Unbiased Estimators | Consistent Estimator 9.4 Sampling Distribution and Central Limit Theorem What is a Sampling Distribution? | Sampling Distribution of Sample Mean When the Sample Size is Large | Sampling Distribution of Sample Proportion 9.5 Estimating Population Variance and Sampling Distribution of Sample Variance 9.6 Revisiting Student’s t Distribution: Distribution of (Standardised) Sample Mean while Sampling from a Normal Population 9.7 Sampling Distribution of Difference in Sample Means based on Independent Samples Drawn from Two Populations Both Sample Sizes are Large with Known/Unknown Population Standard Deviations | Both Populations are Normal with Known Standard Deviation | Population Standard Deviations are Unknown and At Least One of Two Sample Sizes is /are Not Large – Normal Populations 9.8 Sampling Distribution of Difference in Sample Proportions based on Independent Samples drawn from Two Populations 9.9 Sampling Distribution of Ratio of Variances 9.10 Understanding CLT and Sampling Distribution Concepts Using Simulation 9.11 Case Studies Engagement and Job Fit of Retail Salespeople | Supermarket Sales Analysis | Placement of Students Summary of Key Concepts and Formulae Practice Problems
Chapter 10: Confidence Interval (CI) Estimation 10.1 Introduction 10.2 Confidence Interval Estimation for Population Proportion Appropriate Interpretation of Confidence Coefficient in Confidence Interval Estimation | Relationship Between Margin of Error, Sample Size, and Confidence Coefficient | Determining the Requisite Sample Size in the Context of CI Estimation of Population Proportion | Adjustment for SRSWOR Sampling from a Finite Population 10.3 Confidence Interval Estimation for Population Mean MOE, Sample Size Determination, Finite Population Adjustment 10.4 Confidence Interval Estimation for Standard Deviation 10.5 Confidence Interval Estimation in Two-sample Problems Framework and Notation | CI for Difference in Proportion Between Two Populations | CI for Difference Between Mean of Two Populations | CI for Ratio of Standard Deviations of Two Populations 10.6 Paired Sampling 10.7 Ancillary Discussion on Confidence Interval Estimation How Large Do the Sample Sizes Need to be in Two-Sample CI Estimation Problems? | One-Sided Confidence Interval | Verify By Simulation that Confidence Interval Works 10.8 Case Studies Supermarket Sales Analysis | Placement of Students | Indian Stock Market Data Summary of Key Concepts and Formulae Practice Problems
Chapter 11: Testing of Hypothesis 11.1 Framework of Testing of Hypothesis Null and Alternative Hypotheses | Simple and Composite Hypotheses | Type I and Type II Errors | Which Hypothesis is Null and Which Hypothesis is Alternative? | Initial Discussion on Case Studies | Test Statistics, Critical Region, and Acceptance Region | p-value (Probability Value) of a Test | Steps in Conducting a Statistical Test | Drawing Parallels with the Judicial System 11.2 One-sample Testing of Hypothesis for Population Mean Testing for l When r is Known | Testing for l When r is Unknown and n is Large | Testing l When r is Unknown and n is Small, Population is Normal 11.3 One-sample Testing of Hypothesis for Population Proportion 11.4 One-sample Testing of Hypothesis for Population Standard Deviation 11.5 Computing Probability of Type II Error and Power of One-sample Tests Power of the One-Sample Mean Test when Population Standard Deviation is Known | Power of the One-Sample Test for Population Proportion | Power of the One-Sample Test for Standard Deviation 11.6 Sample Size Determination in One-sample TOH Problems Testing for Mean | Testing for Proportion | Testing for Standard Deviation 11.7 Two-sample Testing of Hypothesis Two-Sample Testing Comparing the Proportion of Two Populations | Two-Sample TOH Comparing the Mean of Two Populations | Two-Sample TOH Comparing Variances or SDs of Two Populations 11.8 Paired Test for Mean 11.9 Ancillary Discussion on Testing of Hypothesis Implementation in Excel and R | Conducting Testing of Hypothesis from Confidence Interval Estimation | Using Simulation in Testing of Hypothesis | Testing for Mean in Non-Normal Populations Based on Small Sizes 11.10 Case Studies Placement of Students | Supermarket Sales Analysis | HR Analytics Summary of Key Concepts and Formulae Practice Problems
Chapter 12: Analysis of Variance (ANOVA) 12.1 Overview 12.2 One-way ANOVA Why Use ANOVA and Not Pairwise Comparisons? | The Model and Assumptions in ANOVA | The Sum of Squares and Algebraic Split | The Test Statistic and Null Distribution | The ANOVA Table | ANOVA: Comparison Between Three Estimates of s2 | Parameter Estimates | ANOVA as an Extension of the Two-sample T-test (Case k = 2) | Post-hoc Analysis: Paired Comparison | Case Study: Shuddho Chinton: Part I | Case Study: Shuddho Chinton: Part II 12.3 Two-way ANOVA Two-way ANOVA without Replication | Two-way ANOVA with Replication and without Interaction Between Factors | Model in Two-way ANOVA with Interaction | Implementation of Two-way ANOVA Using Excel | Implementation of Two-way ANOVA Using R | Case Study: Shuddho Chinton: Part III 12.4 Levene’s Test for Equality of Variance Across Multiple Populations Conducting Levene’s Test | Variations of Levene’s Test | Executing Levene’s Test Using Software 12.5 Case Studies HR Analytics | Supermarket Sales Analysis 12.6 Extending Anova to Multi-Way Analysis and Connections with Regression Summary of Key Concepts and Formulae Practice Problems
Chapter 13: Goodness-of-Fit Tests and Nonparametric Methods 13.1 Overview 13.2 Chi-square Goodness-of-Fit Tests For Distributions of Qualitative and Discrete Variables | For Continuous Distributions 13.3 Chi-square Test of Independence in Contingency Tables Test of Independence or Homogeneity | Test of Independence versus Test of Homogeneity | Marascuilo Procedure and Multiple Comparison as Post-hoc Analysis to Chi-square Test of Independence 13.4 Kolmogorov–Smirnov and Other Tests for Goodness of Fit How the K–S Test Works | Null Hypothesis and Test Statistic for One-Sample K–S Test | Two-Sample Kolmogorov–Smirnov Test | The Anderson–Darling Test | The Shapiro–Wilk Test 13.5 Nonparametric Methods: Introduction and Overview 13.6 Sign Test and Signed-rank Test Sign Test | Wilcoxon Signed-rank Test for Median and Other Percentiles 13.7 Wilcoxon Rank-sum Test/Mann–Whitney Test 13.8 Kruskal–Wallis Test 13.9 Run Test for Independence 13.10 Spearman’s Rank Correlation 13.11 Nonparametric Density Estimation Kernel Density Estimation (KDE) | Nearest Neighbour Density Estimation (NNDE) | Spline-based Density Estimation (SDE) | Orthogonal Series-based Density Estimation (OSDE) 13.12 Case Studies Supermarket Sales Analysis | HR Analytics Summary of Key Concepts and Formulae Practice Problems
Chapter 14: Correlation and Regression 14.1 Overview 14.2 Simple Linear Regression 14.3 Multiple Linear Regression 14.4 Inference in the Regression Problem Inference for Effects of Continuous Predictors | Inference for Effects of Categorical Predictors | Inference for Interaction Effects | Multiple R and R-squared | Analysis of Variance in Linear Regression | Prediction Interval 14.5 Regression Diagnostics Variance Inflation Factor: A Check for Multicollinearity | Visualising Residuals: A Check for Error Assumptions | Detecting Unusual Observations 14.6 Improving the Linear Regression Models Variable Selection | Outlier Management | Transformation of Variables | Advanced Modelling Techniques 14.7 Case Studies Swiggy (What Impacts the Ratings)? | Finding a Good Model for Monthly Demand of WonderWidget Summary of Key Concepts and Formulae Practice Problems
Chapter 15: Logistic Regression 15.1 Overview 15.2 Binomial Distribution and Odds 15.3 The Logistic Regression Model Logistic Regression with a Single Predictor | Logistic Regression with Multiple Predictors 15.4 Inference for Logistic Regression Inference for the Coefficients | Assessment of Model Fit | Prediction and Classification 15.5 Diagnostics and Improvement Checking Model Assumptions | Variable Selection | Other Diagnostics 15.6 Logistic Regression with Probit Link Function 15.7 Multinomial Logistic Regression 15.8 Case Studies Shark Tank India | T20 Cricket Matches Summary of Key Concepts and Formulae Practice Problems
PART D: ADVANCED ANALYTICS (Available on the Orient BlackSwan Smart App)
Chapter 16: Advanced Regression Models 16.1 Overview 16.2 Generalised Linear Model (GLM) Linear Regression as GLM | Logistic Regression as GLM | Poisson Regression as GLM | Other Types of GLM 16.3 Improvements over Ordinary Least Squares Linear Regression Generalised Least Squares | Locally Weighted Regression | Non-linear Regression Models 16.4 Shrinkage and Penalised Regression The Concept of Shrinkage | Ridge Regression | LASSO Regression | Other Types of Penalised Regression Models 16.5 Case Study Analysis of Engineering Colleges in India | Analysis of Start-up Funding in India in 2015–2020 Practice Problems
Chapter 17: Supervised Learning 17.1 Overview 17.2 Supervised versus Unsupervised Learning 17.3 Tree-based Methods CART and CHAID | Ensemble Learning and Random Forests 17.4 Some Classification Algorithms Logistic Regression in Classification | KNN Classification | Support Vector Machine (SVM) | Naïve Bayes method 17.5 Discriminant Analysis Linear Discriminant Analysis | Quadratic Discriminant Analysis 17.6 Case Studies Understanding the Musical Cure | Decoding Comments on a YouTube Channel Practice Problems
Chapter 18: Unsupervised Learning 18.1 Dimensionality Reduction Principal Component Analysis | Principal Component Regression | t-Distributed Stochastic Neighbour Embedding 18.2 Clustering Evaluating a Clustering Algorithm | k-means and k-medoids Algorithm | DBSCAN Algorithm | Hierarchical Clustering 18.3 Anomaly Detection Simple Statistical Methods | Isolation Forest 18.4 Case Studies Does Weather Affect the Demand for Bikes? | Understanding the Indian Stock Market Practice Problems
Chapter 19: Forecasting 19.1 Introduction to Forecasting Problems 19.2 Time Series Data and Forecasting Important Features of Time Series Data | Stationarity and Temporal Autocorrelation | Decomposition of Time Series 19.3 Some Simple Forecasting Approaches Naïve and Seasonal Naïve Method | Mean Method | Drift Method | Decomposition Method 19.4 Exponential Smoothing Technique 19.5 The ARIMA and SARIMA Models 19.6 Classification and Regression Models in Forecasting Case Study 1: The Ed-tech Story | Case Study 2: Where Should You Buy a House? | The SARIMAX Model | Case Study 3: Analysing the Sales of the French Bakery Practice Problems Chapter 20: Comprehensive Data Analysis and the Way Forward 20.1 ESG Insights for Policymakers – A Statistical Investigation Comprehensive Data Analysis | Scope of Advanced Analytics with GAM or Multilevel Models 20.2 Data-Driven Insights at UrbanMart Superstore Comprehensive Data Analysis | Scope of Advanced Analytics with Dashboards, Neural Networks, and Causal Inference 20.3 Air Pollution in Delhi: Can it be Managed with Analytical Tools? Comprehensive Data Analysis | Scope of Advanced Analytics with Spatio- Temporal Models and Extreme Value Theory 20.4 Decoding NIFTY 50 through Risk, Return, and Portfolio Strategy Comprehensive Data Analysis | Scope of Advanced Analytics Using GARCH Models and Other Techniques 20.5 How Can Victory Sports Increase their Sales? Comprehensive Data Analysis | Scope of Advanced Analytics with Digital Behaviour Data 20.6 Concluding Remarks
Appendix Probability Tables
Table A.1 CDF of binomial distribution, that is, P(X ≤ x), where X follows binomial distribution for some choices of n and p Table A.2 CDF of Poisson distribution, that is, P(X ≤ x), where X follows Poisson distribution for some choices of m Table A.3 Percentiles (cut-off points corresponding to left-tail area) of chi-square distribution Table A.4 Percentiles (cut-off points corresponding to left-tail area) of T distribution Table A.5 Cut-off points from right-tail area (p = 1%, 2.5%, 5%, 10%) of F distribution with numerator d.f. n1 and denominator d.f. n2 Table A.6 Mass/Density function of Mann–Whitney test statistic when m (larger sample size) is 2, 3, or 4 and n ≤ m
Index
https://www.universitiespress.com/BusinessAnalytics