VIDEO SEGMENTS BY TOPIC

Here is the map of machine learning

Index of Topics

-- Use your browser's "Find" to look for keywords below --

Aggregation

Overview of ensemble learning (boosting, blending, before and after the fact)

Bayesian Learning

Validity of the Bayesian approach (prior, posterior, unknown versus probabilistic)

Bias-Variance Tradeoff

Basic derivation (overfit and underfit, approximation-generalization tradeoff)
Example (sinusoidal target function)
Noisy case (Bias-variance-noise decomposition)

Bin Model

Hoeffding Inequality (law of large numbers, sample, PAC)
Relation to learning (from bin to hypothesis, training data)
Multiple bins (finite hypothesis set, learning: search for green sample)
Union Bound (uniform inequality, M factor)

Data Snooping

Definition and analysis (data contamination, model selection)

Error Measures

User-specified error function (pointwise error, CIA, supermarket)

Gradient Descent

Basic method (Batch GD) (first-order optimization)
Discussion (initialization, termination, local minima, second-order methods)
Stochastic Gradient Descent (the algorithm, SGD in action)
Initialization - Neural Networks (random weights, perfect symmetry)

Learning Curves

Definition and illustration (complex models versus simple models)
Linear Regression example (learning curves for noisy linear target)

Learning Diagram

Components of learning (target function, hypothesis set, learning algorithm)
Input probability distribution (unknown distribution, bin, Hoeffding)
Error measure (role in learning algorithm)
Noisy targets (target distribution)
Where the VC analysis fits (affected blocks in learning diagram)

Learning Paradigms

Types of learning (supervised, reinforcement, unsupervised, clustering)
Other paradigms (review, active learning, online learning)

Linear Classification

The Perceptron (linearly separable data, PLA)
Pocket algorithm (non-separable data, comparison with PLA)

Linear Regression

The algorithm (real-valued function, mean-squared error, pseudo-inverse)
Generalization behavior (learning curves for linear regression)

Logistic Regression

The model (soft threshold, sigmoid, probability estimation)
Cross entropy error (maximum likelihood)
The algorithm (gradient descent)

Netflix Competition

Movie rating (singular value decomposition, essence of machine learning)
Applying SGD (stochastic gradient descent, SVD factors)

Neural Networks

Biological inspiration (limits of inspiration)
Multilayer perceptrons (the model and its power and limitations)
Neural Network model (feedforward layers, soft threshold)
Backpropagation algorithm (SGD, delta rule)
Hidden layers (interpretation)
Regularization (weight decay, weight elimination, early stopping)

Nonlinear Transformation

Basic method (linearity in the parameters, Z space)
Illustration (non-separable data, quadratic transform)
Generalization behavior (VC dimension of a nonlinear transform)

Occam's Razor

Definition and analysis (definition of complexity, why simpler is better)

Overfitting

The phenomenon (fitting the noise)
A detailed experiment (Legendre polynomials, types of noise)
Deterministic noise (target complexity, stochastic noise)

Radial Basis Functions

Basic RBF model (exact interpolation, nearest neighbor)
K Centers (Lloyd's algorithm, unsupervised learning, pseudo-inverse)
RBF network (neural networks, local versus global, EM algorithm)
Relation to other techniques (SVM kernel, regularization)

Regularization

Introduction (putting the brakes, function approximation)
Formal derivation (Legendre polynomials, soft-order constraint, augmented error)
Weight decay (Tikhonov, smoothness, neural networks)
Augmented error (proxy for out-of-sample error, choosing a regularizer)
Regularization parameter (deterministic noise, stochastic noise)

Sampling Bias

Definition and analysis (Truman versus Dewey, matching the distributions)

Support Vector Machines

SVM basic model (hard margin, constrained optimization)
The solution (KKT conditions, Lagrange, dual problem, quadratic programming)
Soft margin (non-separable data, slack variables)
Nonlinear transform (Z space, support vector pre-images)
Kernel methods (generalized inner product, Mercer's condition, RBF kernel)

Validation

Introduction (validation versus regularization, optimistic bias)
Model selection (data contamination, validation set versus test set)
Cross Validation (leave-one-out, 10-fold cross validation)

VC Dimension

Growth function (dichotomies, Hoeffding Inequality)
Examples (growth function for simple hypothesis sets)
Break points (polynomial growth functions)
Bounding the growth function (mathematical induction, polynomial bound)
Definition of VC Dimension (shattering, distribution-free, Vapnik-Chervonenkis)
VC Dimension of Perceptrons (number of parameters, lower and upper bounds)
Interpreting the VC Dimension (degrees of freedom, Number of examples)

The 18 lectures are about 60 minutes each plus Q&A. The content of each lecture is color coded:

theory; mathematical
technique; practical
analysis; conceptual

Place the mouse on a lecture title for a short description

Lecture 1: The Learning Problem
Lecture 2: Is Learning Feasible?
Lecture 3: The Linear Model I
Lecture 4: Error and Noise
Lecture 5: Training versus Testing
Lecture 6: Theory of Generalization
Lecture 7: The VC Dimension
Lecture 8: Bias-Variance Tradeoff
Lecture 9: The Linear Model II
Lecture 10: Neural Networks
Lecture 11: Overfitting
Lecture 12: Regularization
Lecture 13: Validation
Lecture 14: Support Vector Machines
Lecture 15: Kernel Methods
Lecture 16: Radial Basis Functions
Lecture 17: Three Learning Principles
Lecture 18: Epilogue

You can also look for a particular topic within the lectures in the Machine Learning Video Library.

Place the mouse on a lecture title for a short description

Lecture 1 (The Learning Problem)
Lecture (some audio drops, sorry!) - Q&A - Slides

Lecture 2 (Is Learning Feasible?)
Review - Lecture - Q&A - Slides

Lecture 3 (The Linear Model I)
Review - Lecture - Q&A - Slides

Lecture 4 (Error and Noise)
Review - Lecture - Q&A - Slides

Lecture 5 (Training versus Testing)
Review - Lecture - Q&A - Slides

Lecture 6 (Theory of Generalization)
Review - Lecture - Q&A - Slides

Lecture 7 (The VC Dimension)
Review - Lecture - Q&A - Slides

Lecture 8 (Bias-Variance Tradeoff)
Review - Lecture - Q&A - Slides

Lecture 9 (The Linear Model II)
Review - Lecture - Q&A - Slides

Lecture 10 (Neural Networks)
Review - Lecture - Q&A - Slides

Lecture 11 (Overfitting)
Review - Lecture - Q&A - Slides

Lecture 12 (Regularization)
Review - Lecture - Q&A - Slides

Lecture 13 (Validation)
Review - Lecture - Q&A - Slides

Lecture 14 (Support Vector Machines)
Review - Lecture - Q&A - Slides

Lecture 15 (Kernel Methods)
Review - Lecture - Q&A - Slides

Lecture 16 (Radial Basis Functions)
Review - Lecture - Q&A - Slides

Lecture 17 (Three Learning Principles)
Review - Lecture - Q&A - Slides

Lecture 18 (Epilogue)
Review - Lecture - Acknowledgment - Slides

TEXTBOOK

The recommended textbook covers 14 out of the 18 lectures. The rest is covered by online material that is freely available to the book readers.

Here is the book's table of contents, and here is the notation used in the course and the book.

Big Insights with Big Data

Monday, November 24, 2014

Very useful site for learning machine learning

VIDEO SEGMENTS BY TOPIC

-- Use your browser's "Find" to look for keywords below --

TEXTBOOK

No comments:

Post a Comment