Sunday, 10 February 2008

machine learning and artificial



Machine Learning and Artificial Intelligence Video Lectures

I found some great Machine Learning and Artificial Intelligence (AI)

video lecture courses recently and I will share them with you this

month.

Here are lectures from Machine Learning Summer School 2003, 2005 and

2006.

Machine Learning is a foundational discipline of the Information

Sciences. It combines deep theory from areas as diverse as Statistics,

Mathematics, Engineering, and Information Technology with many

practical and relevant real life applications.

Some Mathematical Tools for Machine Learning

* Video Lectures (by Chris Burges from Microsoft Research)

Lectures contain:

1. Lagrange multipliers: Lagrange multipliers: an indirect approach

can be easier; Multiple Equality Constraints; Multiple Inequality

Constraints; Two points on a d-sphere; The Largest Parallelogram;

Resource allocation; A convex combination of numbers is maximized by

choosing the largest; The Isoperimetric problem; For fixed mean and

variance, which univariate distribution has maximum entropy? An exact

solution for an SVM living on a simplex

2. Notes on some Basic Statistics; Probabilities can be

Counter-Intuitive (Simpson's paradox; the Monty Hall puzzle);

IID-ness: Measurement Error decreases as 1/sqrt{n}; Correlation versus

Independence; The Ubiquitous Gaussian: Product of Gaussians is

Gaussian o Convolution of two Gaussians is a Gaussian; Projection of a

Gaussian is a Gaussian; Sum of Gaussian random variables is a Gaussian

random variables; Uncorrelated Gaussian variables are also

independent; Maximum Likelihood Estimates for mean and covariance

(prove required matrix identities); Aside: For 1-dim Laplacian, max.

likelihood gives the median; Using cumulative distributions to derive

densities

3. Principal Component Analysis and Generalizations: Ordering by

Variance; Does Grouping Change Things? PCA Decorrelates the Samples;

PCA gives Reconstruction with Minimal Mean Squared Error; PCA

preserves Mutual Information on Gaussian data; PCA directions lie in

the span of the data; PCA: second order moments only; The Generalized

Rayleigh Quotient; Non-orthogonal principal directions; OPCA; Fisher

Linear Discriminant; Multiple Discriminant Analysis

4. Elements of Functional Analysis: High Dimensional Spaces; Is

Winning Transitive?; Most of the Volume is Near the Surface: Cubes;

Spheres in n-dimensions; Banach Spaces, Hilbert Spaces, Compactness;

Norms; Useful Inequalities (Minkowski and Holder); Vector Norms;

Matrix Norms; The Hamming Norm; L1, L2, L_infty norms - is L0 a norm?

Example: Using a Norm as a Constraint in Kernel Algorithms

These are lectures on some fundamental mathematics underlying many

approaches and algorithms in machine learning. They are not about

particular learning algorithms; they are about the basic concepts and

tools upon which such algorithms are built. Often students feel

intimidated by such material: there is a vast amount of "classical

mathematics", and it can be hard to find the wood for the trees. The

main topics of these lectures are Lagrange multipliers, functional

analysis, some notes on matrix analysis, and convex optimization. I've

concentrated on things that are often not dwelt on in typical CS

coursework. Lots of examples are given; if it's green, it's a puzzle

for the student to think about. These lectures are far from complete:

perhaps the most significant omissions are probability theory,

statistics for learning, information theory, and graph theory. I hope

eventually to turn all this into a series of short tutorials.

Introduction to Learning Theory

* Video Lectures (by Olivier Bousquet from Max Planck Institute for

Biological Cybernetics)

Description:

The goal of this course is to introduce the key concepts of learning

theory. It will not be restricted to Statistical Learning Theory but

will mainly focus on statistical aspects. Instead of giving detailed

proofs and precise statements, this course will aim at providing some

useful conceptual tools and ideas useful for practitioners as well as

for theoretically-driven people.

An Introduction to Pattern Classification

* Video Lectures (by Elad Yom Tov from Technion)

Lectures contain:

Pattern classification algorithms, classification procedures,

supervised learning, unsupervised learning, classifier and

preprocessing algorithms, errors, classifier and computational

complexity, dimensionality reduction, approaches for dimensionality

reduction: feature reduction, feature selection; genetic programming,

whitening transform, nearest neighbor editing algorithm, voronoi

diagram, clusters, clustering techniques: agglomerative, partitional,

minimum spanning tree, aghc, kohonen maps, k-means, fuzzy k-means,

competitive learning; bayes rule, heuristic algorithms, tree based

algorithms, optimization algorithms, neural networks, training

methods, perceptons, radial-basis function networks, support-vector

machines, error estimation methods.

Statistical Learning Theory

* Video Lectures (by Olivier Bousquet from Max Planck Institute for

Biological Cybernetics)

Description:

This course will give a detailed introduction to learning theory with

a focus on the classification problem. It will be shown how to obtain

(pobabilistic) bounds on the generalization error for certain types of

algorithms. The main themes will be:

* probabilistic inequalities and concentration inequalities

* union bounds and chaining

* measuring the size of a function class

* Vapnik Chervonenkis dimension

* shattering dimension and Rademacher averages

* classification with real-valued functions

Some knowledge of probability theory would be helpful but not required

since the main tools will be introduced.

Stochastic Learning

* Video Lectures (by L�on Bottou from NEC Research)

These lectures contain:

Early learning systems, recursive adaptive algorithms, risks, batch

gradient descent, stochastic gradient descent, non differentiable loss

functions, rosenblatt's perceptrons, k-means, vector quantization,

stochastic noise, multilayer networks

Bayesian Learning

* Video Lectures (by Zoubin Ghahramani from University College

London)

Description of video course:

Bayes Rule provides a simple and powerful framework for machine

learning. This tutorial will be organised as follows:

1. Lecturer will give motivation for the Bayesian framework from the

point of view of rational coherent inference, and highlight the

important role of the marginal likelihood in Bayesian Occam's Razor.

2. He will discuss the question of how one should choose a sensible

prior. When Bayesian methods fail it is often because no thought has

gone into choosing a reasonable prior.

3. Bayesian inference usually involves solving high dimensional

integrals and sums. He will give an overview of numerical

approximation techniques (e.g. Laplace, BIC, variational bounds, MCMC,

EP...).

4. Mr. Ghahramani will talk about more recent work in non-parametric

Bayesian inference such as Gaussian processes (i.e. Bayesian kernel

"machines"), Dirichlet process mixtures, etc.

Learning on Structured Data

* Video Lectures (by Yasemin Altun from TTI)

Lectures description:

Discriminative learning framework is one of the very successful fields

of machine learning. The methods of this paradigm, such as Boosting,

and Support Vector Machines have significantly advanced the

state-of-the-art for classification by improving the accuracy and by

increasing the applicability of machine learning methods. One of the

key benefits of these methods is their ability to learn efficiently in

high dimensional feature spaces, either by the use of implicit data

representations via kernels or by explicit feature induction. However,

traditionally these methods do not exploit dependencies between class

labels where more than one label is predicted. Many real-world

classification problems involve sequential, temporal or structural

dependencies between multiple labels. We will investigate recent

research on generalizing discriminative methods to learning in

structured domains. These techniques combine the efficiency of dynamic

programming methods with the advantages of the state-of-the-art

learning methods.

Information Retrieval and Text Mining

* Video Lectures (by Thomas Hofmann from Brown University)

Description:

This four hour course will provide an overview of applications of

machine learning and statistics to problems in information retrieval

and text mining. More specifically, it will cover tasks like document

categorization, concept-based information retrieval,

question-answering, topic detection and document clustering,

information extraction, and recommender systems. The emphasis is on

showing how machine learning techniques can help to automatically

organize content and to provide efficient access to information in

textual form.

Foundations of Learning

* Video Lectures (by Steve Smale from University of California)

An introduction to grammars and parsing

* Video Lecture (by Mark Johnson from Brown Laboratory for

Linguistic Information Processing)

Video Lecture contains:

computational linguistics, its syntactic and semantic structure,

context free grammars, its derivations, probabalistics cfg's (pcfg),

dynamic programming, expectation maximization, em algorithm for

pcfg's, top-down parsing, bottom-up parsing, left-corner parsing.

Information Geometry

* Video Lectures (by Sanjoy Dasgupta from University of California)

Description:

This tutorial will focus on entropy, exponential families, and

information projection. We'll start by seeing the sense in which

entropy is the only reasonable definition of randomness. We will then

use entropy to motivate exponential families of distributions -- which

include the ubiquitous Gaussian, Poisson, and Binomial distributions,

but also very general graphical models. The task of fitting such a

distribution to data is a convex optimization problem with a geometric

interpretation as an "information projection": the projection of a

prior distribution onto a linear subspace (defined by the data) so as

to minimize a particular information-theoretic distance measure. This

projection operation, which is more familiar in other guises, is a

core optimization task in machine learning and statistics. We'll study

the geometry of this problem and discuss two popular iterative

algorithms for it.

Tutorial on Machine Learning Reductions

* Video Lectures (by John Langford from Yahoo Research)

Tutorial description:

There are several different classification problems commonly

encountered in real world applications such as 'importance weighted

classification', 'cost sensitive classification', 'reinforcement

learning', 'regression' and others. Many of these problems can be

related to each other by simple machines (reductions) that transform

problems of one type into problems of another type. Finding a

reduction from your problem to a more common problem allows the reuse

of simple learning algorithms to solve relatively complex problems. It

also induces an organization on learning problems -- problems that can

be easily reduced to each other are 'nearby' and problems which can

not be so reduced are not close.

Online Learning and Game Theory

* Video Lectures (by Adam Kalai from Toyota Technological Institute)

Description:

We consider online learning and its relationship to game theory. In an

online decision-making problem, as in Singer's lecture, one typically

makes a sequence of decisions and receives feedback immediately after

making each decision. As far back as the 1950's, game theorists gave

algorithms for these problems with strong regret guarantees. Without

making statistical assumptions, these algorithms were guaranteed to

perform nearly as well as the best single decision, where the best is

chosen with the benefit of hindsight. We discuss applications of these

algorithms to complex learning problems where one receives very little

feedback. Examples include online routing, online portfolio selection,

online advertizing, and online data structures. We also discuss

applications to learning Nash equilibria in zero-sum games and

learning correlated equilibria in general two-player games.

On the Borders of Statistics and Computer Science

* Video Lectures (by Peter Bickel from Berkley University)

Description:

Machine learning in computer science and prediction and classification

in statistics are essentially equivalent fields. I will try to

illustrate the relation between theory and practice in this huge area

by a few examples and results. In particular I will try to address an

apparent puzzle: Worst case analyses, using empirical process theory,

seem to suggest that even for moderate data dimension and reasonable

sample sizes good prediction (supervised learning) should be very

difficult. On the other hand, practice seems to indicate that even

when the number of dimensions is very much higher than the number of

observations, we can often do very well. We also discuss a new method

of dimension estimation and some features of cross validation.

Decision Maps

* Video Lecture (by Nadler Boaz from Toyota Technological Institute)

Measures of Statistical Dependence

* Video Lectures (by Arthur Gretton from Max Planck Institute for

Biological Cybernetics)

Description:

A number of important problems in signal processing depend on measures

of statistical dependence. For instance, this dependence is minimised

in the context of instantaneous ICA, in which linearly mixed signals

are separated using their (assumed) pairwise independence from each

other. A number of methods have been proposed to measure this

dependence, however they generally assume a particular parametric

model for the densities generating the observations. Recent work

suggests that kernel methods may be used to find estimates that adapt

according to the signals they compare. These methods are currently

being refined, both to yeild greater accuracy, and to permit the use

of the signal properties over time in improving signal separability.

In addition, these methods can be applied in cases where the

statistical dependence between observations must be maximised, which

is true for certain classes of clustering algorithms.

Anti-Learning

* Video Lectures (by Adam Kowalczyk from National ICT)

Description:

The Biological domain poses new challenges for statistical learning.

In the talk we shall analyze and theoretically explain some

counter-intuitive experimental and theoretical findings that

systematic reversal of classifier decisions can occur when switching

from training to independent test data (the phenomenon of

anti-learning). We demonstrate this on both natural and synthetic data

and show that it is distinct from overfitting. The natural datasets

discussed will include: prediction of response to chemo-radio-therapy

for esophageal cancer from gene expression (measured by

cDNA-microarrays); prediction of genes affecting the aryl hydrocarbon

receptor pathway in yeast. The main synthetic classification problem

will be the approximation of samples drawn from high dimensional

distributions, for which a theoretical explanation will be outlined.

Brain Computer Interfaces

* Video Lectures (by Klaus-Robert M�ller from Fraunhofer FIRST)

Description:

Brain Computer Interfacing (BCI) aims at making use of brain signals

for e.g. the control of objects, spelling, gaming and so on. This

tutorial will first provide a brief overview of the current BCI

research activities and provide details in recent developments on both

invasive and non-invasive BCI systems. In a second part -- taking a

physiologist point of view -- the necessary neurological/neurophysical

background is provided and medical applications are discussed. The

third part -- now from a machine learning and signal processing

perspective -- shows the wealth, the complexity and the difficulties

of the data available, a truely enormous challenge. In real-time a

multi-variate very noise contaminated data stream is to be processed

and classified. Main emphasis of this part of the tutorial is placed

on feature extraction/selection and preprocessing which includes among

other techniques CSP and also ICA methods. Finally, I report in more

detail about the Berlin Brain Computer (BBCI) Interface that is based

on EEG signals and take the audience all the way from the measured

signal, the preprocessing and filtering, the classification to the

respective application. BCI communication is discussed in a clincial

setting and for gaming.

Introduction to Kernel Methods

* Video Lecture (by Mikhail Belkin from University of Chicago)

Lecture contains:

Kernel-based algorithms, regression / classification, regularization,

rkhs, representer theorem, rls algorithm, svms, feature map.

Labels: ai, artificial intelligence, bayesian learning, data mining,

game theory, learning theory, machine learning, mathematics, pattern

classification, statistical learning, stochastic learning, text mining


No comments: