Machine Learning
Preface
1
Projects
2
Machine Learning Fundamentals
2.1
definitions
2.1.1
Data Science
3
Machine Learning Fundementals
3.1
Overfitting
3.2
Underfitting
3.3
Bias-Variance Trade-off
3.3.1
Bias
3.3.2
Variance
3.3.3
Bias vs. Variance Trade-Off:
3.3.4
Bias vs. Variance Trade-Off:
4
Machine Learning
4.1
ML Algorithms Intro
4.1.1
Binary Classification:
4.1.2
Multi-Class Classification:
4.1.3
Continuous Outcome (Regression):
4.1.4
Random Forest vs Decision Trees
4.1.5
Random Forest vs Gradient Boosting
4.1.6
Overall Considerations:
4.2
ML Libraries in Python
4.2.1
TensorFlow
4.2.2
PyTorch
4.2.3
Big data solutions
4.2.4
Databricks
4.2.5
TensorFlow
4.2.6
PyTorch
4.2.7
Ensemble Learning in Machine Learning
4.2.8
Key Concepts of Ensemble Learning:
4.2.9
1. Bagging (Bootstrap Aggregating):
4.2.10
2. Boosting:
4.2.11
3. Stacking (Stacked Generalization):
4.2.12
Other Ensemble Methods:
4.2.13
Advantages of Ensemble Learning:
4.2.14
Disadvantages of Ensemble Learning:
4.2.15
Summary:
4.3
Regularization
4.3.1
Lasso Regression
4.3.2
Bayesian Models
4.3.3
Correlation Between Lasso and Bayesian Models
4.3.4
Summary
4.4
Logistic Regression: Key Concepts for Data Science Interviews
4.4.1
What You Need to Know:
4.5
Gradient Boosting Trees (GBT)
4.6
Random Forest
4.7
XGBoost: Key Concepts for Data Science Interviews
4.7.1
What You Need to Know:
4.8
Neural Networks: Key Concepts for Data Science Interviews
4.8.1
Basic Structure:
4.8.2
Activation Functions:
4.8.3
Forward and Backpropagation:
4.8.4
Loss Functions:
4.8.5
Optimization Algorithms:
4.8.6
Regularization Techniques:
4.8.7
Common Architectures:
4.8.8
Overfitting and Underfitting:
4.8.9
What You Need to Know:
4.9
Naive Bayes
4.9.1
Bayesian Classification
5
Extract-Transform-Loading
5.1
Outlier Detection
6
ML Modeling
6.1
Objective
6.2
Data Processing
6.2.1
Data collection
6.2.2
Data Cleaning
6.2.3
Feature Engineering
6.2.4
Implementation and Impact
6.2.5
Lessons Learned and Future Work
6.3
Model Selection
6.3.1
Understand the Problem Type
6.3.2
Understand the Data
6.3.3
Select Models Based on Interpretability vs. Performance Trade-Off
6.3.4
Evaluate Model Complexity and Training Time
6.3.5
Experiment and Cross-Validation
6.3.6
6.
Consider Domain Knowledge and Business Constraints
6.3.7
7.
Model Ensembling
6.3.8
8.
Evaluate and Iterate
6.3.9
9.
Deployment Considerations
6.4
Feature Selection
6.4.1
Recursive Feature Elimination (RFE)
6.4.2
LASSO regularization
6.4.3
Mutual Information
6.4.4
Mutual information vs Correlation Coefficient
6.5
Important Features
6.5.1
Feature Importance in Random Forest
6.5.2
Using Feature Importance for Selection
6.5.3
Advantages of Using Random Forest for Feature Selection
6.5.4
Summary
6.6
Fine-tuning hyperparameters
6.6.1
Key Hyperparameters for Tree-Based Models
6.6.2
Fine-Tuning Strategy
6.7
Cross Validation
6.7.1
How Cross-Validation Works
6.7.2
K-Fold Cross-Validation
6.7.3
Advantages of K-Fold Cross-Validation:
6.7.4
Choosing the Value of
\(k\)
:
6.7.5
Alternative Methods:
7
Model Evaluation
7.1
Classification Models: Evaluation
7.1.1
Thresholding
7.1.2
Confusion Matrix
7.2
ROC Curve
7.2.1
Components of the ROC Curve:
7.2.2
How to Read the ROC Curve:
7.2.3
Area Under the ROC Curve (AUC):
7.2.4
Applications of ROC Curve:
7.2.5
Using the ROC Curve in Real Examples
7.2.6
Selecting the Probability Threshold:
7.2.7
ROC Curve Example:
7.3
Overfitting
7.3.1
How Do You Overcome Overfitting?
7.3.2
Data Stratification Technique
7.3.3
Any Other Way to Simplify the Model?
7.3.4
4. Are You Using Cross-Validation Method?
7.4
Bias-Variance Tradeoff
7.4.1
Key Concepts in Bias-Variance Tradeoff
7.4.2
Error Decomposition and Tradeoff
7.4.3
Managing the Bias-Variance Tradeoff
7.4.4
Conclusion
7.4.5
Lift Chart
7.4.6
ROC Curve (Receiver Operating Characteristic Curve)
7.4.7
Summary
7.4.8
Bootstrapping
8
Interview Questions
8.0.1
tell me how do you train a model and evaluate it
8.0.2
tell me how you can use LLM in marketing/heathcare
8.0.3
objective function in logistic regression
8.1
Do you prefer R or python?
8.2
What is your main domain?
8.3
Is this work culture fast-paced? Do you deliver value quickly or what?
8.4
Are you involved in any efforts convincing business stakeholders to adept models or analysis that you do
8.5
Have you been in a situation where you feel like the model is the right way to go but either client or manager that you need to convince?
9
Interview Prep
9.1
Look alike Model walk thru
9.1.1
Situation
9.1.2
Task
9.1.3
Action
9.1.4
Result
Personal Repo Home
Machine Learning
Chapter 1
Projects
DataTab Statistics Tutorials