Universities Press

Business Analytics: Data to Decisions

Shubhabrata Das, Soudeep Deb (Authors)

ISBN: 9789393330994 | Year: 2026 | Paperback | Pages: 1124 | Language : English

Book Size: 180 x 240 mm | Territorial Rights: World

Price: 1250.00

About the Book

In an era where data drives decisions in every field, the importance of statistical reasoning and data analytics has never been greater. This book is a teaching resource that connects statistical techniques to practical business contexts in a way that is intuitive, relevant, and thoughtful. It strives to explain methods clearly and rigorously, enabling students to develop a strong conceptual foundation and a critical approach to data-driven decision-making.

Salient features:

Designed with Indian students and practitioners in mind
Examples and exercises worked out using both Microsoft Excel and R
Includes exercises and case studies at multiple levels of complexity to bridge the gap between classroom learning and real-world application
Exercise questions classified as Part I, comprising basic numerical and conceptual questions; Part II, which includes problems requiring deeper understanding and involving larger data sets (with computer-based analysis); and Part III, which presents case-based or advanced open-ended problems
Android app with additional chapters, case studies, questions and answers to selected chapter-end exercises
Online resources available at: https://www.universitiespress.com/BusinessAnalytics

Contributors (Author(s), Editor(s), Translator(s), Illustrator(s) etc.)

Shubhabrata Das has been a faculty member at the Indian Institute of Management Bangalore (IIMB) since December 1999 and has served as a full professor since 2005. He has been teaching the material presented in this book across all of IIMB’s degree-granting programmes as well as in its short- and long-duration executive education programmes. He has provided training and consultancy services in the areas of Business Statistics, Business Analytics, Market Research, Business Forecasting, and Insurance for a range of government organisations and leading corporate clients in India. His academic honours include the inaugural Research Professor Award from IIMB, the IBM Faculty Award, the Best Paper Award at the APRIA conference, and multiple scholastic accolades from the West Bengal Board of Higher Secondary Education, ISI, and UNC.

Soudeep Deb is an Associate Professor in the Decision Sciences Area at the Indian Institute of Management Bangalore (IIMB). He obtained his Ph.D. in Statistics from the University of Chicago, and then worked as the Senior Lead Data Scientist at NBC Universal in New York, USA, for around two years. Dr. Deb’s core research focus is on Bayesian methods, time series, spatio-temporal modelling, machine learning approaches, and their applications in various fields, such as environmental science, finance, social studies, and sports. He primarily teaches courses on introductory statistics, inference, multivariate data analysis, as well as fun topics like sports analytics. Please refer to his webpage https://soudeepd.github.io/ for more details.

Table of Content

About the Authors
Testimonials
Foreword
Preface
Acknowledgements

PART A: FOUNDATIONS

Chapter 1: Introduction and Overview
1.1 Introduction to Business Data Analytics – Data to Decisions
1.2 History and Interconnections Between Branches of Analytics
1.3 An Overview of Statistical Methods
1.4 Applications of Data-driven Analytics in Business
Fast-Moving Consumer Goods (FMCG) | Aggregator Industry or Platform Economy | Banking and Financial Services | Retail and E-commerce | Energy (Oil and Natural Gas, Renewable Energy) | Pharmaceuticals and Healthcare | Sports and Entertainment | Automotive | Real Estate | Tourism | Information Technology | Agriculture | Manufacturing | Logistics and Transportation | Telecommunications | Education
1.5 Role of Analytics in Management Disciplines
1.6 Structure of the Book
1.7 Software and Computational Aspects
1.8 Description of Data Sets for Running Case Studies
Exercise

Chapter 2: Data Representation and Descriptive Statistics
2.1 Overview
2.2 Types of Data
2.3 Organising Data Using Arrays, Graphs, and Tables
Stem and Leaf Display, Bar Charts, and Pie Charts | Frequency Distribution, Frequency Polygon, Histogram, and Ogive
2.4 Data Summarisation
Quantiles, Percentiles, and Quartiles | Measures of Central Tendency | Measures of Dispersion or Variability | Skewness and Kurtosis | Important Results | Outlier Detection
2.5 Summary Statistics for Bivariate Data
Cross-tabulation | Scatter Plot, Covariance, and Correlation | Measure of Dependency When One Variable is Quantitative and the Other Qualitative
2.6 Pivot Table
Case Study: Supermarket Sales Analysis
2.7 Advanced Data Visualisation
Time Series Data | Cross-Sectional Data | Spatial Data | Case Study: Spatial Analysis of Indian Weather
2.8 Best Practices for Data Handling and Cleaning
Planning for Data Collection | Why is Data Cleaning Necessary? | Dealing with Missing Data | Data Transformation | General Guidelines for Graphical Data Representations
2.9 Case Study: Indian Start-up Funding
Summary of Key Concepts and Formulae
Practice Problems

Chapter 3: Vectors and Matrices
3.1 Overview
3.2 Sets, Relations, Functions
3.3 Vector Space
3.4 Matrices
Introduction to Matrix Structure | Operations on Matrices | Matrices with Special Structures | Rank and Inverse | Determinants
3.5 Inner Product and Orthogonality
3.6 Eigenvalues
Characteristic Roots, Eigenvalues, and Eigenvectors | Spectral Representation | Singular Value Decomposition
3.7 Linear and Quadratic Forms
Linear System of Equations | Quadratic Forms | Positive Definiteness
3.8 Case Studies
Unveiling Matrix Structures of Equal Correlation | Matrices in Action: Analysing Swiggy Data
Summary of Key Concepts and Formulae
Practice Problems

PART B: PROBABILITY AND DISTRIBUTIONS

Chapter 4: Probability
4.1 Introduction to Probability
4.2 Formalising Probability Notions
Random Experiment and Events – Union, Intersection, Complementation – Mutually Exclusive (Disjoint) and Exhaustive Set of Events | Approaches to Defining Probability | Axioms of Probability – Probability Rules
4.3 Conditional Probability and the Notion of Independent Events
Simpson’s Paradox
4.4 Bayes’ Theorem
4.5 Case Studies
KJSS | A Tale of Two Mothers
Summary of Key Concepts and Formulae
Practice Problems

Chapter 5: Discrete Random Variables and Probability Distributions
5.1 Random Variables and Probability Distributions
Discrete Versus Continuous Random Variable
5.2 General Discrete Distributions: Expected Value, Variance, and Other Characteristics
Probability Mass/Density Function and Cumulative Distribution Function of a Discrete Random Variable | Expected Value (Mean) and Variance of a Discrete Random Variable | Higher Order Moments, Skewness, and Kurtosis | Mean (Expected Value) and Variance of a Linear Combination of Random Variables | Chebyshev’s Inequality | Simulation from a Discrete Probability Distribution | Percentiles and Other Measures of Central Tendency and Dispersion of a (Discrete) Probability Distribution
5.3 Discrete Uniform Distribution
5.4 Binomial Distribution
5.5 Poisson Distribution
Poisson Approximation to Binomial Distribution | Simulation from Poisson Distribution
5.6 Joint and Conditional Distribution for Discrete Variables
5.7 Other Popular Discrete Distributions
Geometric Distribution | Negative Binomial Distribution | Hypergeometric Distribution | Multinomial Distribution
5.8 Relationship between Different Discrete Distributions
Summary of Key Concepts and Formulae
Practice Problems

Chapter 6: Continuous Probability Distributions
6.1 Overview
6.2 General Continuous Probability Distributions and their Characteristics
6.3 Uniform Distribution
6.4 Normal (Gaussian) Distribution
Standard Normal Distribution and Tables | Discussion: Stock-out at PAINTMART | Discussion: How Much to Pack? | Normal Approximation of Binomial and Poisson Distributions | Whither Approximation to Binomial/Poisson Distribution?
6.5 Exponential Distribution
Memoryless Property of Exponential Distribution
6.6 Poisson Process: Inter-linkage between the Exponential and Poisson Distributions
6.7 Distributions Related to Normal Distributions
Lognormal Distribution | Chi-Square Distribution | (Student’s) t Distribution | F Distribution
6.8 Joint and Conditional Distribution for Continuous Variables
Joint Density Function of Continuous Variables | Marginal Density Function | Conditional Density | Independent Random Variables | Function of (Two) Random Variables and Its Expected Values | Covariance and Correlation | Illustrating the Computation in Example 6.24
6.9 Other Continuous Probability Distributions
Gamma Distribution | Beta Distribution | Weibull Distribution | Pareto Distribution | Logistic Distribution
6.10 Interconnections between Different Probability Distributions
Summary of Key Concepts and Formulae
Practice Problems

Chapter 7: Direct Applications of Probability in Business Management
7.1 Overview
Introduction to Optimisation | Simulation
7.2 Decision-making under Uncertainty in a Tree Structure
Decision Analysis Without Probabilistic Assessment | Decision-Making with Expected Value Approach | Expected Value with Perfect Information and Expected Value of Perfect Information (EVPI) | Expected Value of Sample Information (EVSI) and Sampling Efficiency (SI)
7.3 Portfolio Diversification and Asset Allocation
7.4 The PERT-CPM Model
7.5 Dream11 – Probability in Fantasy Sport
Choosing the Best Possible Fantasy Eleven | Which Dream11 Tournament(s) Should One Participate In?
7.6 Overbooking in Airlines
Probabilistic Formulation of the Problem | Assuming Equal Chance of Cancellation at All Times | Assuming Chance of Cancellation Changes as a Function of Time
7.7 Pricing and Promotions
Pricing of Airline Seats | Assessing the Cost and Benefit of a Promotion for F&H | Retail Markdown
7.8 Queuing Theory
Probability and Distributions in Queuing | Categorisation of Queuing Models
7.9 Actuarial Science
Summary of Key Concepts and Formulae
Practice Problems

PART C: INFERENCE AND REGRESSION
Chapter 8: Sampling
8.1 Introduction to Sampling
History and Terminology | Sampling Versus Complete Enumeration – Sampling and Non-Sampling Error | Merits of Sampling
8.2 Random Sampling Methods
Simple Random Sampling With Replacement (SRSWR) and Simple Random Sampling Without Replacement (SRSWOR) | Systematic Sampling | Stratified Sampling | Cluster Sampling | Other Random Sampling Methods
8.3 Non-random Sampling Methods
Convenient Sampling | Purposive or Judgemental Sampling | Quota Sampling | Snowball Sampling | Voluntary Sampling
8.4 Questionnaire Design for Surveys
8.5 Other Issues and Challenges Related to Sampling
Sampling When the Population is Infinite and/or Hypothetical in Nature | Channel of Collecting Survey Information | Process Versus Outcome | Paired Sampling | What Should the Sample Size Be?
Summary of Key Concepts and Formulae
Practice Problems

Chapter 9: Point Estimation and Sampling Distribution
9.1 Introduction to Estimation
9.2 Basic Properties of (Point) Estimators
Unbiased Estimator | Bias of an Estimator | Standard Error of an Estimator | Comparing Two Groups or Populations
9.3 Other Desirable Properties of Estimators
Minimum Variance Unbiased Estimator | Asymptotically Unbiased Estimators | Consistent Estimator
9.4 Sampling Distribution and Central Limit Theorem
What is a Sampling Distribution? | Sampling Distribution of Sample Mean When the Sample Size is Large | Sampling Distribution of Sample Proportion
9.5 Estimating Population Variance and Sampling Distribution of Sample Variance
9.6 Revisiting Student’s t Distribution: Distribution of (Standardised) Sample Mean while Sampling from a Normal Population
9.7 Sampling Distribution of Difference in Sample Means based on Independent Samples Drawn from Two Populations
Both Sample Sizes are Large with Known/Unknown Population Standard Deviations | Both Populations are Normal with Known Standard Deviation | Population Standard Deviations are Unknown and At Least One of Two Sample Sizes is /are Not Large – Normal Populations
9.8 Sampling Distribution of Difference in Sample Proportions based on Independent Samples drawn from Two Populations
9.9 Sampling Distribution of Ratio of Variances
9.10 Understanding CLT and Sampling Distribution Concepts Using Simulation
9.11 Case Studies
Engagement and Job Fit of Retail Salespeople | Supermarket Sales Analysis | Placement of Students
Summary of Key Concepts and Formulae
Practice Problems

Chapter 10: Confidence Interval (CI) Estimation
10.1 Introduction
10.2 Confidence Interval Estimation for Population Proportion
Appropriate Interpretation of Confidence Coefficient in Confidence Interval Estimation | Relationship Between Margin of Error, Sample Size, and Confidence Coefficient | Determining the Requisite Sample Size in the Context of CI Estimation of Population Proportion | Adjustment for SRSWOR Sampling from a Finite Population
10.3 Confidence Interval Estimation for Population Mean
MOE, Sample Size Determination, Finite Population Adjustment
10.4 Confidence Interval Estimation for Standard Deviation
10.5 Confidence Interval Estimation in Two-sample Problems
Framework and Notation | CI for Difference in Proportion Between Two Populations | CI for Difference Between Mean of Two Populations | CI for Ratio of Standard Deviations of Two Populations
10.6 Paired Sampling
10.7 Ancillary Discussion on Confidence Interval Estimation
How Large Do the Sample Sizes Need to be in Two-Sample CI Estimation Problems? | One-Sided Confidence Interval | Verify By Simulation that Confidence Interval Works
10.8 Case Studies
Supermarket Sales Analysis | Placement of Students | Indian Stock Market Data
Summary of Key Concepts and Formulae
Practice Problems

Chapter 11: Testing of Hypothesis
11.1 Framework of Testing of Hypothesis
Null and Alternative Hypotheses | Simple and Composite Hypotheses | Type I and Type II Errors | Which Hypothesis is Null and Which Hypothesis is Alternative? | Initial Discussion on Case Studies | Test Statistics, Critical Region, and Acceptance Region | p-value (Probability Value) of a Test | Steps in Conducting a Statistical Test | Drawing Parallels with the Judicial System
11.2 One-sample Testing of Hypothesis for Population Mean
Testing for l When r is Known | Testing for l When r is Unknown and n is Large | Testing l When r is Unknown and n is Small, Population is Normal
11.3 One-sample Testing of Hypothesis for Population Proportion
11.4 One-sample Testing of Hypothesis for Population Standard Deviation
11.5 Computing Probability of Type II Error and Power of One-sample Tests
Power of the One-Sample Mean Test when Population Standard Deviation is Known | Power of the One-Sample Test for Population Proportion | Power of the One-Sample Test for Standard Deviation
11.6 Sample Size Determination in One-sample TOH Problems
Testing for Mean | Testing for Proportion | Testing for Standard Deviation
11.7 Two-sample Testing of Hypothesis
Two-Sample Testing Comparing the Proportion of Two Populations | Two-Sample TOH Comparing the Mean of Two Populations | Two-Sample TOH Comparing Variances or SDs of Two Populations
11.8 Paired Test for Mean
11.9 Ancillary Discussion on Testing of Hypothesis
Implementation in Excel and R | Conducting Testing of Hypothesis from Confidence Interval Estimation | Using Simulation in Testing of Hypothesis | Testing for Mean in Non-Normal Populations Based on Small Sizes
11.10 Case Studies
Placement of Students | Supermarket Sales Analysis | HR Analytics
Summary of Key Concepts and Formulae
Practice Problems

Chapter 12: Analysis of Variance (ANOVA)
12.1 Overview
12.2 One-way ANOVA
Why Use ANOVA and Not Pairwise Comparisons? | The Model and Assumptions in ANOVA | The Sum of Squares and Algebraic Split | The Test Statistic and Null Distribution | The ANOVA Table | ANOVA: Comparison Between Three Estimates of s2 | Parameter Estimates | ANOVA as an Extension of the Two-sample T-test (Case k = 2) | Post-hoc Analysis: Paired Comparison | Case Study: Shuddho Chinton: Part I | Case Study: Shuddho Chinton: Part II
12.3 Two-way ANOVA
Two-way ANOVA without Replication | Two-way ANOVA with Replication and without Interaction Between Factors | Model in Two-way ANOVA with Interaction | Implementation of Two-way ANOVA Using Excel | Implementation of Two-way ANOVA Using R | Case Study: Shuddho Chinton: Part III
12.4 Levene’s Test for Equality of Variance Across Multiple Populations
Conducting Levene’s Test | Variations of Levene’s Test | Executing Levene’s Test Using Software
12.5 Case Studies
HR Analytics | Supermarket Sales Analysis
12.6 Extending Anova to Multi-Way Analysis and Connections with Regression
Summary of Key Concepts and Formulae
Practice Problems

Chapter 13: Goodness-of-Fit Tests and Nonparametric Methods
13.1 Overview
13.2 Chi-square Goodness-of-Fit Tests
For Distributions of Qualitative and Discrete Variables | For Continuous Distributions
13.3 Chi-square Test of Independence in Contingency Tables
Test of Independence or Homogeneity | Test of Independence versus Test of Homogeneity | Marascuilo Procedure and Multiple Comparison as Post-hoc Analysis to Chi-square Test of Independence
13.4 Kolmogorov–Smirnov and Other Tests for Goodness of Fit
How the K–S Test Works | Null Hypothesis and Test Statistic for One-Sample K–S Test | Two-Sample Kolmogorov–Smirnov Test | The Anderson–Darling Test | The Shapiro–Wilk Test
13.5 Nonparametric Methods: Introduction and Overview
13.6 Sign Test and Signed-rank Test
Sign Test | Wilcoxon Signed-rank Test for Median and Other Percentiles
13.7 Wilcoxon Rank-sum Test/Mann–Whitney Test
13.8 Kruskal–Wallis Test
13.9 Run Test for Independence
13.10 Spearman’s Rank Correlation
13.11 Nonparametric Density Estimation
Kernel Density Estimation (KDE) | Nearest Neighbour Density Estimation (NNDE) | Spline-based Density Estimation (SDE) | Orthogonal Series-based Density Estimation (OSDE)
13.12 Case Studies
Supermarket Sales Analysis | HR Analytics
Summary of Key Concepts and Formulae
Practice Problems

Chapter 14: Correlation and Regression
14.1 Overview
14.2 Simple Linear Regression
14.3 Multiple Linear Regression
14.4 Inference in the Regression Problem
Inference for Effects of Continuous Predictors | Inference for Effects of Categorical Predictors | Inference for Interaction Effects | Multiple R and R-squared | Analysis of Variance in Linear Regression | Prediction Interval
14.5 Regression Diagnostics
Variance Inflation Factor: A Check for Multicollinearity | Visualising Residuals: A Check for Error Assumptions | Detecting Unusual Observations
14.6 Improving the Linear Regression Models
Variable Selection | Outlier Management | Transformation of Variables | Advanced Modelling Techniques
14.7 Case Studies
Swiggy (What Impacts the Ratings)? | Finding a Good Model for Monthly Demand of WonderWidget
Summary of Key Concepts and Formulae
Practice Problems

Chapter 15: Logistic Regression
15.1 Overview
15.2 Binomial Distribution and Odds
15.3 The Logistic Regression Model
Logistic Regression with a Single Predictor | Logistic Regression with Multiple Predictors
15.4 Inference for Logistic Regression
Inference for the Coefficients | Assessment of Model Fit | Prediction and Classification
15.5 Diagnostics and Improvement
Checking Model Assumptions | Variable Selection | Other Diagnostics
15.6 Logistic Regression with Probit Link Function
15.7 Multinomial Logistic Regression
15.8 Case Studies
Shark Tank India | T20 Cricket Matches
Summary of Key Concepts and Formulae
Practice Problems

PART D: ADVANCED ANALYTICS (Available on the Orient BlackSwan Smart App)

Chapter 16: Advanced Regression Models
16.1 Overview
16.2 Generalised Linear Model (GLM)
Linear Regression as GLM | Logistic Regression as GLM | Poisson Regression as GLM | Other Types of GLM
16.3 Improvements over Ordinary Least Squares Linear Regression
Generalised Least Squares | Locally Weighted Regression | Non-linear Regression Models
16.4 Shrinkage and Penalised Regression
The Concept of Shrinkage | Ridge Regression | LASSO Regression | Other Types of Penalised Regression Models
16.5 Case Study
Analysis of Engineering Colleges in India | Analysis of Start-up Funding in India in 2015–2020
Practice Problems

Chapter 17: Supervised Learning
17.1 Overview
17.2 Supervised versus Unsupervised Learning
17.3 Tree-based Methods
CART and CHAID | Ensemble Learning and Random Forests
17.4 Some Classification Algorithms
Logistic Regression in Classification | KNN Classification | Support Vector Machine (SVM) | Naïve Bayes method
17.5 Discriminant Analysis
Linear Discriminant Analysis | Quadratic Discriminant Analysis
17.6 Case Studies
Understanding the Musical Cure | Decoding Comments on a YouTube Channel
Practice Problems

Chapter 18: Unsupervised Learning
18.1 Dimensionality Reduction
Principal Component Analysis | Principal Component Regression | t-Distributed Stochastic Neighbour Embedding
18.2 Clustering
Evaluating a Clustering Algorithm | k-means and k-medoids Algorithm | DBSCAN Algorithm | Hierarchical Clustering
18.3 Anomaly Detection
Simple Statistical Methods | Isolation Forest
18.4 Case Studies
Does Weather Affect the Demand for Bikes? | Understanding the Indian Stock Market
Practice Problems

Chapter 19: Forecasting
19.1 Introduction to Forecasting Problems
19.2 Time Series Data and Forecasting
Important Features of Time Series Data | Stationarity and Temporal Autocorrelation | Decomposition of Time Series
19.3 Some Simple Forecasting Approaches
Naïve and Seasonal Naïve Method | Mean Method | Drift Method | Decomposition Method
19.4 Exponential Smoothing Technique
19.5 The ARIMA and SARIMA Models
19.6 Classification and Regression Models in Forecasting
Case Study 1: The Ed-tech Story | Case Study 2: Where Should You Buy a House? | The SARIMAX Model | Case Study 3: Analysing the Sales of the French Bakery
Practice Problems

Chapter 20: Comprehensive Data Analysis and the Way Forward
20.1 ESG Insights for Policymakers – A Statistical Investigation
Comprehensive Data Analysis | Scope of Advanced Analytics with GAM or Multilevel Models
20.2 Data-Driven Insights at UrbanMart Superstore
Comprehensive Data Analysis | Scope of Advanced Analytics with Dashboards, Neural Networks, and Causal Inference
20.3 Air Pollution in Delhi: Can it be Managed with Analytical Tools?
Comprehensive Data Analysis | Scope of Advanced Analytics with Spatio- Temporal Models and Extreme Value Theory
20.4 Decoding NIFTY 50 through Risk, Return, and Portfolio Strategy
Comprehensive Data Analysis | Scope of Advanced Analytics Using GARCH Models and Other Techniques
20.5 How Can Victory Sports Increase their Sales?
Comprehensive Data Analysis | Scope of Advanced Analytics with Digital Behaviour Data
20.6 Concluding Remarks

Appendix Probability Tables

Table A.1 CDF of binomial distribution, that is, P(X ≤ x), where X follows binomial distribution for some choices of n and p
Table A.2 CDF of Poisson distribution, that is, P(X ≤ x), where X follows Poisson distribution for some choices of m
Table A.3 Percentiles (cut-off points corresponding to left-tail area) of chi-square distribution
Table A.4 Percentiles (cut-off points corresponding to left-tail area) of T distribution
Table A.5 Cut-off points from right-tail area (p = 1%, 2.5%, 5%, 10%) of F distribution with numerator d.f. n1 and denominator d.f. n2
Table A.6 Mass/Density function of Mann–Whitney test statistic when m (larger sample size) is 2, 3, or 4 and n ≤ m

Index

Resources

https://www.universitiespress.com/BusinessAnalytics?isbn=9789393330994