Machine Learning and Artificial Intelligence Video Lectures
I found some great Machine Learning and Artificial Intelligence (AI)
video lecture courses recently and I will share them with you this
month.
Here are lectures from Machine Learning Summer School 2003, 2005 and
2006.
Machine Learning is a foundational discipline of the Information
Sciences. It combines deep theory from areas as diverse as Statistics,
Mathematics, Engineering, and Information Technology with many
practical and relevant real life applications.
Some Mathematical Tools for Machine Learning
* Video Lectures (by Chris Burges from Microsoft Research)
Lectures contain:
1. Lagrange multipliers: Lagrange multipliers: an indirect approach
can be easier; Multiple Equality Constraints; Multiple Inequality
Constraints; Two points on a d-sphere; The Largest Parallelogram;
Resource allocation; A convex combination of numbers is maximized by
choosing the largest; The Isoperimetric problem; For fixed mean and
variance, which univariate distribution has maximum entropy? An exact
solution for an SVM living on a simplex
2. Notes on some Basic Statistics; Probabilities can be
Counter-Intuitive (Simpson's paradox; the Monty Hall puzzle);
IID-ness: Measurement Error decreases as 1/sqrt{n}; Correlation versus
Independence; The Ubiquitous Gaussian: Product of Gaussians is
Gaussian o Convolution of two Gaussians is a Gaussian; Projection of a
Gaussian is a Gaussian; Sum of Gaussian random variables is a Gaussian
random variables; Uncorrelated Gaussian variables are also
independent; Maximum Likelihood Estimates for mean and covariance
(prove required matrix identities); Aside: For 1-dim Laplacian, max.
likelihood gives the median; Using cumulative distributions to derive
densities
3. Principal Component Analysis and Generalizations: Ordering by
Variance; Does Grouping Change Things? PCA Decorrelates the Samples;
PCA gives Reconstruction with Minimal Mean Squared Error; PCA
preserves Mutual Information on Gaussian data; PCA directions lie in
the span of the data; PCA: second order moments only; The Generalized
Rayleigh Quotient; Non-orthogonal principal directions; OPCA; Fisher
Linear Discriminant; Multiple Discriminant Analysis
4. Elements of Functional Analysis: High Dimensional Spaces; Is
Winning Transitive?; Most of the Volume is Near the Surface: Cubes;
Spheres in n-dimensions; Banach Spaces, Hilbert Spaces, Compactness;
Norms; Useful Inequalities (Minkowski and Holder); Vector Norms;
Matrix Norms; The Hamming Norm; L1, L2, L_infty norms - is L0 a norm?
Example: Using a Norm as a Constraint in Kernel Algorithms
These are lectures on some fundamental mathematics underlying many
approaches and algorithms in machine learning. They are not about
particular learning algorithms; they are about the basic concepts and
tools upon which such algorithms are built. Often students feel
intimidated by such material: there is a vast amount of "classical
mathematics", and it can be hard to find the wood for the trees. The
main topics of these lectures are Lagrange multipliers, functional
analysis, some notes on matrix analysis, and convex optimization. I've
concentrated on things that are often not dwelt on in typical CS
coursework. Lots of examples are given; if it's green, it's a puzzle
for the student to think about. These lectures are far from complete:
perhaps the most significant omissions are probability theory,
statistics for learning, information theory, and graph theory. I hope
eventually to turn all this into a series of short tutorials.
Introduction to Learning Theory
* Video Lectures (by Olivier Bousquet from Max Planck Institute for
Biological Cybernetics)
Description:
The goal of this course is to introduce the key concepts of learning
theory. It will not be restricted to Statistical Learning Theory but
will mainly focus on statistical aspects. Instead of giving detailed
proofs and precise statements, this course will aim at providing some
useful conceptual tools and ideas useful for practitioners as well as
for theoretically-driven people.
An Introduction to Pattern Classification
* Video Lectures (by Elad Yom Tov from Technion)
Lectures contain:
Pattern classification algorithms, classification procedures,
supervised learning, unsupervised learning, classifier and
preprocessing algorithms, errors, classifier and computational
complexity, dimensionality reduction, approaches for dimensionality
reduction: feature reduction, feature selection; genetic programming,
whitening transform, nearest neighbor editing algorithm, voronoi
diagram, clusters, clustering techniques: agglomerative, partitional,
minimum spanning tree, aghc, kohonen maps, k-means, fuzzy k-means,
competitive learning; bayes rule, heuristic algorithms, tree based
algorithms, optimization algorithms, neural networks, training
methods, perceptons, radial-basis function networks, support-vector
machines, error estimation methods.
Statistical Learning Theory
* Video Lectures (by Olivier Bousquet from Max Planck Institute for
Biological Cybernetics)
Description:
This course will give a detailed introduction to learning theory with
a focus on the classification problem. It will be shown how to obtain
(pobabilistic) bounds on the generalization error for certain types of
algorithms. The main themes will be:
* probabilistic inequalities and concentration inequalities
* union bounds and chaining
* measuring the size of a function class
* Vapnik Chervonenkis dimension
* shattering dimension and Rademacher averages
* classification with real-valued functions
Some knowledge of probability theory would be helpful but not required
since the main tools will be introduced.
Stochastic Learning
* Video Lectures (by L�on Bottou from NEC Research)
These lectures contain:
Early learning systems, recursive adaptive algorithms, risks, batch
gradient descent, stochastic gradient descent, non differentiable loss
functions, rosenblatt's perceptrons, k-means, vector quantization,
stochastic noise, multilayer networks
Bayesian Learning
* Video Lectures (by Zoubin Ghahramani from University College
London)
Description of video course:
Bayes Rule provides a simple and powerful framework for machine
learning. This tutorial will be organised as follows:
1. Lecturer will give motivation for the Bayesian framework from the
point of view of rational coherent inference, and highlight the
important role of the marginal likelihood in Bayesian Occam's Razor.
2. He will discuss the question of how one should choose a sensible
prior. When Bayesian methods fail it is often because no thought has
gone into choosing a reasonable prior.
3. Bayesian inference usually involves solving high dimensional
integrals and sums. He will give an overview of numerical
approximation techniques (e.g. Laplace, BIC, variational bounds, MCMC,
EP...).
4. Mr. Ghahramani will talk about more recent work in non-parametric
Bayesian inference such as Gaussian processes (i.e. Bayesian kernel
"machines"), Dirichlet process mixtures, etc.
Learning on Structured Data
* Video Lectures (by Yasemin Altun from TTI)
Lectures description:
Discriminative learning framework is one of the very successful fields
of machine learning. The methods of this paradigm, such as Boosting,
and Support Vector Machines have significantly advanced the
state-of-the-art for classification by improving the accuracy and by
increasing the applicability of machine learning methods. One of the
key benefits of these methods is their ability to learn efficiently in
high dimensional feature spaces, either by the use of implicit data
representations via kernels or by explicit feature induction. However,
traditionally these methods do not exploit dependencies between class
labels where more than one label is predicted. Many real-world
classification problems involve sequential, temporal or structural
dependencies between multiple labels. We will investigate recent
research on generalizing discriminative methods to learning in
structured domains. These techniques combine the efficiency of dynamic
programming methods with the advantages of the state-of-the-art
learning methods.
Information Retrieval and Text Mining
* Video Lectures (by Thomas Hofmann from Brown University)
Description:
This four hour course will provide an overview of applications of
machine learning and statistics to problems in information retrieval
and text mining. More specifically, it will cover tasks like document
categorization, concept-based information retrieval,
question-answering, topic detection and document clustering,
information extraction, and recommender systems. The emphasis is on
showing how machine learning techniques can help to automatically
organize content and to provide efficient access to information in
textual form.
Foundations of Learning
* Video Lectures (by Steve Smale from University of California)
An introduction to grammars and parsing
* Video Lecture (by Mark Johnson from Brown Laboratory for
Linguistic Information Processing)
Video Lecture contains:
computational linguistics, its syntactic and semantic structure,
context free grammars, its derivations, probabalistics cfg's (pcfg),
dynamic programming, expectation maximization, em algorithm for
pcfg's, top-down parsing, bottom-up parsing, left-corner parsing.
Information Geometry
* Video Lectures (by Sanjoy Dasgupta from University of California)
Description:
This tutorial will focus on entropy, exponential families, and
information projection. We'll start by seeing the sense in which
entropy is the only reasonable definition of randomness. We will then
use entropy to motivate exponential families of distributions -- which
include the ubiquitous Gaussian, Poisson, and Binomial distributions,
but also very general graphical models. The task of fitting such a
distribution to data is a convex optimization problem with a geometric
interpretation as an "information projection": the projection of a
prior distribution onto a linear subspace (defined by the data) so as
to minimize a particular information-theoretic distance measure. This
projection operation, which is more familiar in other guises, is a
core optimization task in machine learning and statistics. We'll study
the geometry of this problem and discuss two popular iterative
algorithms for it.
Tutorial on Machine Learning Reductions
* Video Lectures (by John Langford from Yahoo Research)
Tutorial description:
There are several different classification problems commonly
encountered in real world applications such as 'importance weighted
classification', 'cost sensitive classification', 'reinforcement
learning', 'regression' and others. Many of these problems can be
related to each other by simple machines (reductions) that transform
problems of one type into problems of another type. Finding a
reduction from your problem to a more common problem allows the reuse
of simple learning algorithms to solve relatively complex problems. It
also induces an organization on learning problems -- problems that can
be easily reduced to each other are 'nearby' and problems which can
not be so reduced are not close.
Online Learning and Game Theory
* Video Lectures (by Adam Kalai from Toyota Technological Institute)
Description:
We consider online learning and its relationship to game theory. In an
online decision-making problem, as in Singer's lecture, one typically
makes a sequence of decisions and receives feedback immediately after
making each decision. As far back as the 1950's, game theorists gave
algorithms for these problems with strong regret guarantees. Without
making statistical assumptions, these algorithms were guaranteed to
perform nearly as well as the best single decision, where the best is
chosen with the benefit of hindsight. We discuss applications of these
algorithms to complex learning problems where one receives very little
feedback. Examples include online routing, online portfolio selection,
online advertizing, and online data structures. We also discuss
applications to learning Nash equilibria in zero-sum games and
learning correlated equilibria in general two-player games.
On the Borders of Statistics and Computer Science
* Video Lectures (by Peter Bickel from Berkley University)
Description:
Machine learning in computer science and prediction and classification
in statistics are essentially equivalent fields. I will try to
illustrate the relation between theory and practice in this huge area
by a few examples and results. In particular I will try to address an
apparent puzzle: Worst case analyses, using empirical process theory,
seem to suggest that even for moderate data dimension and reasonable
sample sizes good prediction (supervised learning) should be very
difficult. On the other hand, practice seems to indicate that even
when the number of dimensions is very much higher than the number of
observations, we can often do very well. We also discuss a new method
of dimension estimation and some features of cross validation.
Decision Maps
* Video Lecture (by Nadler Boaz from Toyota Technological Institute)
Measures of Statistical Dependence
* Video Lectures (by Arthur Gretton from Max Planck Institute for
Biological Cybernetics)
Description:
A number of important problems in signal processing depend on measures
of statistical dependence. For instance, this dependence is minimised
in the context of instantaneous ICA, in which linearly mixed signals
are separated using their (assumed) pairwise independence from each
other. A number of methods have been proposed to measure this
dependence, however they generally assume a particular parametric
model for the densities generating the observations. Recent work
suggests that kernel methods may be used to find estimates that adapt
according to the signals they compare. These methods are currently
being refined, both to yeild greater accuracy, and to permit the use
of the signal properties over time in improving signal separability.
In addition, these methods can be applied in cases where the
statistical dependence between observations must be maximised, which
is true for certain classes of clustering algorithms.
Anti-Learning
* Video Lectures (by Adam Kowalczyk from National ICT)
Description:
The Biological domain poses new challenges for statistical learning.
In the talk we shall analyze and theoretically explain some
counter-intuitive experimental and theoretical findings that
systematic reversal of classifier decisions can occur when switching
from training to independent test data (the phenomenon of
anti-learning). We demonstrate this on both natural and synthetic data
and show that it is distinct from overfitting. The natural datasets
discussed will include: prediction of response to chemo-radio-therapy
for esophageal cancer from gene expression (measured by
cDNA-microarrays); prediction of genes affecting the aryl hydrocarbon
receptor pathway in yeast. The main synthetic classification problem
will be the approximation of samples drawn from high dimensional
distributions, for which a theoretical explanation will be outlined.
Brain Computer Interfaces
* Video Lectures (by Klaus-Robert M�ller from Fraunhofer FIRST)
Description:
Brain Computer Interfacing (BCI) aims at making use of brain signals
for e.g. the control of objects, spelling, gaming and so on. This
tutorial will first provide a brief overview of the current BCI
research activities and provide details in recent developments on both
invasive and non-invasive BCI systems. In a second part -- taking a
physiologist point of view -- the necessary neurological/neurophysical
background is provided and medical applications are discussed. The
third part -- now from a machine learning and signal processing
perspective -- shows the wealth, the complexity and the difficulties
of the data available, a truely enormous challenge. In real-time a
multi-variate very noise contaminated data stream is to be processed
and classified. Main emphasis of this part of the tutorial is placed
on feature extraction/selection and preprocessing which includes among
other techniques CSP and also ICA methods. Finally, I report in more
detail about the Berlin Brain Computer (BBCI) Interface that is based
on EEG signals and take the audience all the way from the measured
signal, the preprocessing and filtering, the classification to the
respective application. BCI communication is discussed in a clincial
setting and for gaming.
Introduction to Kernel Methods
* Video Lecture (by Mikhail Belkin from University of Chicago)
Lecture contains:
Kernel-based algorithms, regression / classification, regularization,
rkhs, representer theorem, rls algorithm, svms, feature map.
Labels: ai, artificial intelligence, bayesian learning, data mining,
game theory, learning theory, machine learning, mathematics, pattern
classification, statistical learning, stochastic learning, text mining
No comments:
Post a Comment