Artificial Intelligence ~ Artificial Cognitive Systems

Primer on Machine Learning

Author: Susanne Lomatch

Machine learning (ML) is the capture and transformation of information (from sensors, databases) into a usable form to improve performance. Pattern/object recognition, decision making/planning and communication are three key application areas for ML. Performance can be evaluated as the ability to make accurate predictions using known data (as in supervised learning or training), or the ability to discover new information with value (as in unsupervised learning).

There are many types of ML approaches and algorithms, and below I attempt to list as many as I could find, organized into a short, concise descriptor for each. I have placed Wiki links to concepts that deserve more depth, so that readers can dig deeper to understand those concepts.

One point to be made: optimization and filtering are not ML, though they may be used in ML (specifically reinforcement learning in the case of optimization, and unsupervised learning in the case of filtering). This is a common confusion when reading through what some call ML.

I also emphasize a second point, also made by others [1]: the term machine refers both to machines and living organisms (or alternately, systems or agents). The same mathematical theory of learning applies regardless of what we choose to call the learner, whether it is artificial or biological.

Types of ML approaches and algorithms:

• Supervised Learning

o Learning an input-output relationship from examples

§ Given pairs of input and output patterns, learn the dependencies between input and output

§ Algorithm analyzes the training data and produces an inferred function (classifier or regression function), used to predict an output from valid input

o Problem/Tasks: Regression (continuous), classification (discrete), ranking

o Underlying disciplines: Statistical regression and classification analysis

o Applications: Skill estimation, behavioral cloning, recognition (pattern, object, optical character, handwriting, speech), information retrieval (rankers)

o Advantages: Minimize expected error on as yet unseen data

o Tradeoffs: Approximation (variance), generalization (bias) and quality of training data

o Specific algorithms: Linear regression, logistic regression, naive Bayes classifier, linear discriminant analysis, decision trees, instance learning (k-nearest neighbor), inductive logic programming, artificial neural networks (backpropagation), support vector machines, boosting learners, maximum entropy Markov model

• Reinforcement Learning

o Learning from state-action-reward sequences

§ Given observations and rewards from an environment, learn how to act in given situations

§ Algorithm finds out which actions are optimal based on past experiences and a feedback of rewards

o Problem/Tasks: Control, value estimation, policy learning, optimal decision making

o Underlying disciplines: Control theory, game theory, decision theory, operations research, evolutionary computation

o Applications: Learning to walk, drive, fly an airplane, play a game; object recognition (control of search), robotic or autonomous control, critical path analysis, planning, scheduling, pricing, trading, natural language processing

o Advantages: Maximize expected reward over time

o Tradeoffs: Exploration and exploitation

o Specific algorithms: Dynamic programming (optimization, shortest path), Markov decision processes, Monte Carlo, temporal difference, Q-learning, genetic programming, value-dependent learning (see special note below and [2])

o Special notes:

§ Neuroscience researchers have found that the firing rate of dopamine neurons in the brain appear to mimic the error function of the temporal difference algorithm. The error function reports back the difference between the estimated reward at any given state or time step and the actual reward received. The larger the error function, the larger the difference between the expected and actual reward. When this is paired with a stimulus that accurately reflects a future reward, the error can be used to associate the stimulus with the future reward.

§ Reinforcement learning is commonly used in robotics and autonomous system control (e.g. traffic, airplane, etc.), and has been proposed for the training of brain-machine interfaces (BMIs).

• Unsupervised Learning

o Learning the underlying structure from examples

§ Algorithm finds hidden structure in unlabeled information/data

o Problem/Tasks: Cluster analysis, manifold learning, density estimation, blind signal separation (statistical FA, PCA, ICA, etc.), inference

o Underlying disciplines: see Tasks

o Applications: Modeling motion capture data and user behavior, data mining, recognition (pattern, object, image, speech)

o Advantages: Information and knowledge discovery, reasoning under uncertainty

o Disadvantages: No error or reward signal to evaluate a potential solution

o Tradeoffs: Computational tractability, parameter estimation

o Specific algorithms: Hierarchical clustering, k-means (centroid) clustering, distribution clustering (expectation-maximization), association rule learning (Apriori), artificial neural networks (self-organizing map, adaptive resonance), Bayesian networks (belief propagation), nonlinear dimensionality reduction (manifold learning or mapping), hidden Markov model

• Semi-supervised Learning

o Supervised learning from a small amount of labeled information/data, combined with unsupervised learning from a large amount of unlabeled information/data

o Applications: Recognition (pattern, object, character, image, speech), data mining, information retrieval, question-answering

o Advantages: Improvement in learning accuracy for certain cases: co-training, relevant unlabeled data

o Disadvantages: Worsening of learning accuracy if unsupervised learning leads to excessive noise

o Tradeoffs: Cost of supervised training and unsupervised noise

o Specific algorithms: Co-training, constrained clustering, transduction or transductive inference

• Deep Learning and Cortical Learning

o The modeling of learning processes performed in the mammalian or human brain; deep learning has roots of inspiration from learning processes in the mammalian/human visual system; cortical learning is inspired by cortical-thalamic anatomy/function and the “Mountcastle principle” of a hierarchical cortical columnar organizational structure

o Problem/Tasks: Unsupervised learning of representations (and features), inference; discriminative (supervised), reinforcement, semi-supervised and multi-task learning are also utilized

o Underlying disciplines: Computational neuroscience (much as I dislike the term – I prefer “theoretical neuroscience”), neuromorphic engineering

o Applications: recognition (pattern, object, image, speech), natural language processing and understanding (communication and dialogue), machine vision, data mining, information retrieval, question-answering, decision making, planning, artificial or biomimetic imagination

o Advantages: Allows for “deep learning” approach, incorporating a hierarchy of features to efficiently represent and learn complex abstractions needed for AI and mammal intelligence (computational and statistical efficiency); particularly suited for multi-task learning, transfer learning, domain adaptation, self-taught learning, and semi-supervised learning with few labels; may also be used to solve NP-complete problems

o Disadvantages: A common algorithm may not represent all regions in the neocortex, leading to model or algorithmic complexity

o Tradeoffs: Complexity, efficiency

o Specific algorithms: Hierarchical temporal memory, artificial neural networks (adaptive resonance theory, Boltzmann machines), energy-based learning, hierarchical greedy learning for deep belief networks

o Deep learning: See [3], [4] and [5]

o Cortical learning: See [6] and [7]

An example list of open or commercial toolkits (by no means complete, and will be revised periodically):

ML software toolkits: Torch5, APML, Shogun, SIGMA-MSFT, Google Prediction API, MALLET, Spider, Deep Learning SW

(Disclaimer: This primer is meant to inform. I encourage readers who find factual errors or deficits to contact me (click on contact link below). I also welcome constructive and friendly comments, suggestions and dialogue.)

References and Endnotes:

[1] “Advanced Lectures on Machine Learning,” ed. O. Bousquet and G. Råtsch, Springer-Verlag, 2004.

[2] “Value-Dependent Selection in the Brain: Simulation in a Synthetic Neural Model,” K.J. Friston et al., Neuroscience, vol. 59, 1994. Link HERE.

[3] “Learning Deep Architectures for AI,” Y. Bengio, Foundations and Trends in Machine Learning, vol. 2 (1), 2009. Link HERE. See also the site dedicated to Deep Leaning: HERE.

[4] “A Fast Learning Algorithm for Deep Belief Nets,” G.E. Hinton et al., Neural Computation, vol. 18, p.1527, 2006. Link HERE.

[5] “A Tutorial on Energy-Based Learning,” Y. LeCun et al., Predicting Structured Data, 2006. Link HERE.

[6] “Learning and Inference in the Brain,” K. Friston, Neural Networks, vol. 16, 2003. Link HERE.

“A Theory of Cortical Responses,” K. Friston, Phil. Trans. R. Soc. B, vol. 360, 2005. Link HERE.

“Hierarchical Models in the Brain,” K. Friston, PLoS Computational Biology, vol. 4, 2008. Link HERE.

[7] “Towards a Mathematical Theory of Cortical Micro-circuits,” D. George and J. Hawkins, PLoS Computational Biology, vol. 5, 2009. Link HERE.

EidolonSpeak.com ~ Artificial Intelligence

AI & Artificial Cognitive Systems

External Sites

Primer on Machine Learning