Primer on Machine Learning
Primer on Machine Learning
Author: Susanne Lomatch
Machine learning (ML) is the capture and transformation of information (from sensors, databases) into a usable form to improve performance. Pattern/object recognition, decision making/planning and communication are three key application areas for ML. Performance can be evaluated as the ability to make accurate predictions using known data (as in supervised learning or training), or the ability to discover new information with value (as in unsupervised learning).
There are many types
of ML approaches and algorithms, and below I attempt to list as many as I could
find, organized into a short, concise descriptor for each. I have placed Wiki
links to concepts that deserve more depth, so that readers can dig deeper to
understand those concepts.
One point to be
made: optimization and filtering are not ML, though they may be used in ML
(specifically reinforcement learning in the case of optimization, and
unsupervised learning in the case of filtering). This is a common confusion
when reading through what some call ML.
I also emphasize a
second point, also made by others [1]: the term machine refers both to machines
and living organisms (or alternately, systems or agents). The same mathematical
theory of learning applies regardless of what we choose to call the learner,
whether it is artificial or biological.
Types of ML
approaches and algorithms:
• Supervised
Learning
o
Learning
an input-output relationship from examples
§ Given pairs of input and output patterns,
learn the dependencies between input and output
§ Algorithm analyzes the training data and
produces an inferred function (classifier or regression function), used to
predict an output from valid input
o
Problem/Tasks: Regression
(continuous), classification (discrete), ranking
o
Underlying disciplines: Statistical regression and classification
analysis
o
Applications: Skill estimation, behavioral cloning,
recognition (pattern, object, optical character, handwriting, speech),
information retrieval (rankers)
o
Advantages: Minimize expected error on as yet unseen data
o
Tradeoffs: Approximation (variance), generalization (bias) and quality of
training data
o
Specific algorithms: Linear regression, logistic regression, naive Bayes classifier, linear discriminant analysis, decision trees, instance learning (k-nearest neighbor), inductive logic programming, artificial neural networks (backpropagation), support vector machines, boosting learners, maximum entropy Markov model
• Reinforcement
Learning
o
Learning
from state-action-reward sequences
§ Given observations and rewards from an environment,
learn how to act in given situations
§ Algorithm finds out which actions are optimal
based on past experiences and a feedback of rewards
o
Problem/Tasks: Control, value estimation, policy learning,
optimal decision making
o
Underlying disciplines: Control theory, game theory, decision theory, operations research,
evolutionary computation
o
Applications: Learning to walk, drive, fly an airplane,
play a game; object recognition (control of search), robotic or autonomous
control, critical path analysis, planning, scheduling, pricing, trading,
natural language processing
o
Advantages: Maximize expected reward over time
o
Tradeoffs: Exploration and exploitation
o
Specific algorithms: Dynamic programming
(optimization, shortest path), Markov decision processes, Monte Carlo, temporal difference, Q-learning, genetic programming,
value-dependent learning (see special note below and [2])
o
Special notes:
§ Neuroscience researchers have found that the firing rate of dopamine neurons in
the brain appear to mimic the error function of the temporal difference
algorithm. The error function reports back the difference between the estimated
reward at any given state or time step and the actual reward received. The
larger the error function, the larger the difference between the expected and
actual reward. When this is paired with a stimulus that accurately reflects a
future reward, the error can be used to associate the stimulus with the future
reward.
§ Reinforcement learning is commonly used in
robotics and autonomous system control (e.g. traffic, airplane, etc.), and has
been proposed for the training of brain-machine interfaces (BMIs).
• Unsupervised
Learning
o
Learning
the underlying structure from examples
§ Algorithm finds hidden structure in unlabeled
information/data
o
Problem/Tasks: Cluster analysis, manifold learning, density estimation, blind signal separation (statistical FA, PCA, ICA, etc.), inference
o
Underlying disciplines: see Tasks
o
Applications: Modeling motion capture data and user
behavior, data mining, recognition (pattern, object, image, speech)
o
Advantages: Information and knowledge discovery, reasoning under uncertainty
o
Disadvantages: No error or reward signal to evaluate a
potential solution
o
Tradeoffs: Computational tractability, parameter estimation
o
Specific algorithms: Hierarchical clustering, k-means (centroid)
clustering, distribution clustering (expectation-maximization), association rule learning (Apriori), artificial neural networks (self-organizing map, adaptive resonance), Bayesian networks
(belief propagation), nonlinear dimensionality reduction (manifold learning or mapping), hidden Markov model
• Semi-supervised
Learning
o
Supervised
learning from a small amount of labeled information/data, combined with
unsupervised learning from a large amount of unlabeled information/data
o
Applications: Recognition (pattern, object, character,
image, speech), data mining, information retrieval, question-answering
o
Advantages: Improvement in learning accuracy for certain cases: co-training,
relevant unlabeled data
o
Disadvantages: Worsening of learning accuracy if
unsupervised learning leads to excessive noise
o
Tradeoffs: Cost of supervised training and unsupervised noise
o
Specific algorithms: Co-training, constrained clustering, transduction or transductive
inference
• Deep
Learning and Cortical Learning
o
The
modeling of learning processes performed in the mammalian or human brain; deep
learning has roots of inspiration from learning processes in the
mammalian/human visual system; cortical learning is inspired by cortical-thalamic
anatomy/function and the “Mountcastle principle” of a hierarchical cortical columnar
organizational structure
o
Problem/Tasks: Unsupervised learning of representations
(and features), inference; discriminative (supervised), reinforcement, semi-supervised
and multi-task learning are also utilized
o
Underlying disciplines: Computational neuroscience (much as I dislike the term – I prefer
“theoretical neuroscience”), neuromorphic engineering
o
Applications: recognition (pattern, object, image,
speech), natural language processing and understanding (communication and
dialogue), machine vision, data mining, information retrieval,
question-answering, decision making, planning, artificial or biomimetic imagination
o
Advantages: Allows for “deep learning” approach, incorporating a hierarchy of
features to efficiently represent and learn complex abstractions needed for AI
and mammal intelligence (computational and statistical efficiency); particularly
suited for multi-task learning, transfer learning, domain adaptation, self-taught learning,
and semi-supervised learning with few labels; may also be used to solve NP-complete problems
o
Disadvantages: A common algorithm may not represent all
regions in the neocortex, leading to model or algorithmic complexity
o
Tradeoffs: Complexity, efficiency
o
Specific algorithms: Hierarchical temporal memory, artificial neural networks (adaptive resonance theory, Boltzmann machines),
energy-based learning, hierarchical greedy learning for deep belief
networks
o
Deep learning: See [3], [4] and [5]
o
Cortical learning: See [6] and [7]
An example list of
open or commercial toolkits (by no means complete, and will be revised
periodically):
ML software toolkits: Torch5, APML, Shogun, SIGMA-MSFT, Google Prediction API, MALLET, Spider, Deep Learning SW
(Disclaimer: This primer is meant to inform. I encourage readers who find factual errors or deficits to contact me (click on contact link below). I also welcome constructive and friendly comments, suggestions and dialogue.)
References
and Endnotes:
[1] “Advanced Lectures on Machine Learning,” ed. O. Bousquet and G. Råtsch, Springer-Verlag, 2004.
[2] “Value-Dependent Selection in the Brain: Simulation in a Synthetic Neural Model,” K.J. Friston et al., Neuroscience, vol. 59, 1994. Link HERE.
[3] “Learning Deep Architectures for AI,” Y. Bengio, Foundations and Trends in Machine Learning, vol. 2 (1), 2009. Link HERE. See also the site dedicated to Deep Leaning: HERE.
[4] “A Fast Learning Algorithm for Deep Belief Nets,” G.E. Hinton et al., Neural Computation, vol. 18, p.1527, 2006. Link HERE.
[5] “A Tutorial on Energy-Based Learning,” Y. LeCun et al., Predicting Structured Data, 2006. Link HERE.
[6] “Learning and Inference in the Brain,” K. Friston, Neural Networks, vol. 16, 2003. Link HERE.
“A Theory of Cortical Responses,” K. Friston, Phil. Trans. R. Soc. B, vol. 360, 2005. Link HERE.
“Hierarchical Models in the Brain,” K. Friston, PLoS Computational Biology, vol. 4, 2008. Link HERE.
[7] “Towards a Mathematical Theory of
Cortical Micro-circuits,” D. George and J. Hawkins, PLoS
Computational Biology, vol. 5, 2009. Link HERE.