Machine Learning Catalog (Part 1)
A for Associations, B for Bias, C for Classification … Z for z-score
With so many new terms & new concepts I definitely need a cheat-sheet to refer again & again.
Ensemble is done. Lets move on next topic. Machine Learning Catalog. Every time I need to refer/recall/relate ML terms Dr. Google was the place. Than at the same time I need quick refresher not detailed thesis :-)
Association : Relationship between objects/variables.
Bias : An error from flawed assumptions in the algorithm. High bias can cause an algorithm to miss important relationships between features and target outputs resulting in under fitting of the model.
Bias-Variance trade off : Lookout for “sweet spot” on learning curve (sklearn.learning_curve.learning_curve) where bias & variance are low.
Box plot : Graphical display of 5 numbers : Smallest, 1st Quartile (Q1), Median (Q2), 3rd Quartile (Q3) & largest.
Co-variance : Indicates the direction of the linear relationship between variables.
Confusion Matrix : Summary of prediction results on a Classification problem.
Dataset : High quality data is the key ingredient for any data science project.
Data quality : Consistent with business problem, Accurate (selection of features), Noisy (high fluctuations), Missing values, Outliers, Bias, Variance etc (all possible combinations expected in real time/production).
Dimensionality = no. of features in a dataset
Distance : is a way to measure similarity. Euclidean, Manhattan, Jaccard, Cosine, Mahalanobis are various flavors of distances used across ML algorithms.
Exploratory Data Analysis : Understanding the length & breadth of your data.
Feature Extraction : Using algorithms to combine of the original features to generate a set of new features to be used in the model is generally less than the original number of features. PCA does feature extraction using eigen vectors as natural axis of data.
Feature Selection : Using algorithms to remove some of the features from the model such that selected features will enable the model to have better performance & there is no change in the selected features themselves.
Feature = attribute = independent variable = predictor
Gaussian Distribution : Popularly also known as Bell Curve or Normal Distribution. Continuous observations in a sample space have mean=mode=median=0 & Standard Deviation = 1.
Gradient Descent : Optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient (until local minima is achieved)
Hyperparameter : An external configuration whose value cannot be estimated from the data. Random Search & Grid search are two ways to lookout for best combination of hyperparameters in any model. Ex. Learning rate/nodes/layers in Neural network.
Hypothesis : Statement / assumption for a population not data & is not made after looking at data. Null Hypothesis (H0) is status quo-rejection leads to desired conclusion. Alternate Hypothesis(H1) is what a person believes is true or wants to prove true. All statistical conclusions are made in reference to Null Hypothesis.
H0=true=Fail to reject Null hypothesis
H0=false=Reject Null hypothesis
Kurtosis : Pointedness of a peak in the distribution curve & has 3 variations: Leptokurtic, Mesokurtic & Platykurtic
Label = target=outcome=class=dependent variable=response= y
Linear Regression : y = mx +c + bias (univariate linear model) . Simplest form. Target output is numerical. sklearn.linear_model.LinearRegression
Logistic Regression : Target output is categorical. Represents Sigmoid Curve. Representation of probability (valued between 0 and 1). Vulnerable to outliers. sklearn.linear_model.LogisticRegression
Loss : Measure of error. Loss function tells how bad the model is. Loss always focus on training data. Good loss functions are convex : only 1 local/global minima.
Parameter : An internal configuration whose value can be estimated from data. Ex. coefficients derived in linear regression.
Pearson’s co-rrelation coefficient: Measures linear association between the variables & ranges between -1 to +1. Bizzare things can be compared with co-rrelation as no unit is associated with it.
Pipeline : Sequentially apply list of transforms & a final estimator (sklearn.pipe_line.make-pipeline())
P-value : Used in hypothesis testing to support or reject null hypothesis. p-value is the evidence against null hypothesis.
Pruning : Prevents over fitting in Decision Trees. Regularization of a model.
Precision : Out of identified True Positive (TP) how many are actually +ve (TP/TP+FP)
Mean,Mode,Median : Mean is the average of data set, Mode is the number that occurs most often in a data set & Median is the middle value when a data set is ordered from least to greatest.
Recall : How many True Positive (TP) identified by model out of all actual positives ( TP / TP + FN)
Regularization : Way to reduce over fitting by providing automated way to balance between important features & over fitting of the model by adding penalty term to cost function. 2 standard types : L1 (Lasso in linear regresson) & L2 (Ridge in linear regresson)
Standard Deviation : Measures spread strength of the data set.
Std. Deviation=1 means 68% measurements of a normal distributed data are around mean.
Std. Deviation=2 means 95% (68%+27%) measurements of a normal distributed data are around mean
Std. Deviation=3 means 99.7% ((68%+27%+5%) measurements of a normal distributed data are around mean
Sampling : Selecting a subset of data for training, testing & validation. Sampling could be random or stratified (every class will have equal representation) . Seasonality , trends etc factors shall be looked.
Skewness : Lack of symmetry in the data. (+) skewness means mode<median<mean & (-)skewness means mode>median>mean
Supervised Learning : Models learn from training data that has been labeled i.e. training dataset provides correct label ( y ) against listed features. Label quality will have direct impact on model prediction. Linear Discriminant Analysis (LDA) approach to feature extraction.
Supervised Learning Categories : Regression & Classification
SMOTE : Synthetic data points creation technique to address (ex.) imbalanced data sets.
Type 1 Error : (False +) Rejecting a null hypothesis when it is true
Type II Error : (False -) Accepting null hypothesis when it is false.
Underfitting : Failure to capture important trends in training data . Hence results in high bias & poor performance in production.
Unsupervised Learning : Models learn from test data that has not been labeled. Must find patterns/clusters in data to perform downstream analysis using for ex. Principal Component Analysis (PCA), Clustering (k-means)
Variance: An error from sensitivity to small variations in the training data. High variance can cause an algorithm to model random noise in the training set, resulting in overfitting.
Z-score : Statistical measurement of a score’s relationship to the mean in a group of scores. Tells how many standard deviation adata point is away from the mean.
This is all for now. Please let me know your views / corrections/suggestions.