1. Regularization
1.1. L1 norm/Manhattan distance
1.2. L2 norm/Euclidean distance
1.3. Early stopping
1.4. Dropout
1.5. Data augmentation
2. Data processing (https://github.com/dformoso/machine-learning-mindmap)
2.1. Data types
2.2. Data exploration
2.3. Feature cleaning
2.4. Feature imputing
2.4.1. Hot-deck
2.4.2. Cold-deck
2.4.3. Mean substitution
2.4.4. Regression
2.5. Feature engineering
2.5.1. Decompose
2.5.2. Crossing
2.6. Feature selection
2.6.1. Correlation
2.6.2. Dimensionality reduction
2.6.3. Importance
2.7. Feature encoding
2.8. Feature normalization/scaling
2.9. Dataset construction
2.9.1. Training dataset
2.9.2. Test dataset
2.9.3. Validation dataset
3. Performance
3.1. Loss
3.1.1. Cross-Entropy
3.1.2. Logistic
3.1.3. Quadratic
3.1.4. D1-loss
3.2. metrics
3.2.1. classification
3.2.1.1. Accuracy
3.2.1.2. Precision
3.2.1.2.1. Precision= True_Positive/ (True_Positive+ False_Positive)
3.2.1.3. Recall
3.2.1.3.1. Recall= True_Positive/ (True_Positive+ False_Negative)
3.2.1.3.2. Sensitivity
3.2.1.4. True negative rate
3.2.1.4.1. TN/(TN+FP)
3.2.1.4.2. Specificity
3.2.1.5. F1-score
3.2.1.5.1. Combines precision and recall
3.2.1.6. ROC
3.2.1.6.1. binary classifier
3.2.1.6.2. True positive rate vs false positive rate
3.2.1.6.3. for various thresholds
3.2.1.7. AUC
3.2.1.7.1. calculates area under ROC curve
3.2.1.7.2. btw. 0 and 1
3.2.2. regression
3.2.2.1. Mean squared error
3.2.2.2. Mean absolute error
3.2.3. ranking
3.2.3.1. MRR
3.2.3.2. DCG
3.2.3.3. NDCG
3.2.4. statistical
3.2.4.1. Correlation
3.2.5. computer vision
3.2.6. nlp
3.2.7. deep learning related
3.2.7.1. Inception score
3.2.7.2. Frechet Inception distance
4. Statistics
4.1. Bayesian statistics
4.2. Hypothesis testing
4.2.1. Significance level
4.2.1.1. Significance level determines the level that we want to believe in the null hypothesis
4.2.2. P-value
4.2.2.1. The probability of the observed value is called p value
4.2.3. Population, sample, estimator
4.2.4. Probability density distribution
4.2.4.1. The probability density distribution (PDF) is used to specify the probability of the random variable falling within a particular range of values
4.2.5. Central limit theorem
4.2.5.1. Central Limit Theorem states that when the sample size is large, the sample mean of the independent random variable follows normal distribution
4.3. Statistical tests
4.3.1. Difference?
4.3.1.1. comparisons of
4.3.1.1.1. means
4.3.1.1.2. variance
4.3.2. Relationship?
4.3.2.1. Independent variable?
4.3.2.1.1. yes
4.3.2.1.2. no
4.3.3. categorical
4.3.3.1. Chi-Square
4.3.3.1.1. Chi-square tests check if distributions of categorical variables differ from each other
4.4. Gaussian process
4.4.1. gives
4.4.1.1. uncertainty estimates
4.5. Descriptive statistics
4.5.1. central tendency
4.5.1.1. mean
4.5.1.2. median
4.5.2. Spread
4.5.2.1. standard deviation
4.5.3. Percentiles
4.5.4. Skewness
4.5.5. Covariance & Correlation
4.6. parametric assumptions
4.6.1. 1. independent, unbiased samples
4.6.2. 2. normally distributed
4.6.3. 3. equal variance
5. Hyperparameter tuning
5.1. Bayesian optimization
6. Backpropagation
7. Activation functions
7.1. ReLu
7.2. Sigmoid/logistic
7.3. Binary
7.4. Tanh
7.5. Softmax
8. Neural network
8.1. Types
8.1.1. CNN
8.1.1.1. Visualize layers
8.1.2. RNN
8.1.2.1. recurrent units
8.1.2.1.1. GRU
8.1.2.1.2. LSTM
8.1.2.1.3. Nested LSTM
8.1.3. NLP
8.1.3.1. Attention
8.1.3.2. Term-document matrix
8.1.3.3. topic modelling
8.1.3.3.1. matrix decomposition techniques
8.1.3.4. Matrix factorizations
8.1.3.4.1. Singular Value Decomposition (SVD)
8.1.3.4.2. Non-negative Matrix Factorization (NMF)
8.1.3.5. Pre-processing
8.1.3.5.1. Removing words
8.1.3.5.2. getting root of word
8.1.3.5.3. Normalization of term counts
8.2. Speech recognition
8.2.1. Connectionist Temporal Classification
8.2.1.1. Align text to audio
8.2.1.2. Align text to audio
9. Monte carlo tree search
10. Methods
10.1. tree-based
10.1.1. Xgboost
10.2. Regression
11. Gradient descent
11.1. Stochastic gradient descent
11.1.1. SGD with restarts