Artificial Intelligence ~ Artificial Cognitive Systems

Primer on Knowledge Representations and Acquisition

Author: Susanne Lomatch

How knowledge is represented is central to an AI or cognitive system, and is used to facilitate sensing, acting, learning, reasoning, planning, decision making, problem solving, perception and thought (see Fig. 1). There are varying definitions for knowledge representations, but the one I prefer is a generalized one based on symbolic and/or structural forms or frameworks:

Knowledge Representations:

• Symbolic

• Rule-based systems

• Automata

• Logic (propositional, first-order,…)

• Functional or Operational

• Structural or Hybrid

• Decision Trees

• Classifiers or Clustering

• Ontologies

• Neural Networks

• Markov Random Fields

• Bayesian Networks

• Memory-Prediction Frameworks (MPFs)

• Cognitive Memory

This list and its taxonomy are my own and do not exhaust the number of possibilities.

Knowledge and information are used interchangeably, though information is notably viewed as the more fundamental generalization with the tenets of information theory forming a rigorous basis. I do not take such narrow view here, and I think that in considering artificial cognitive systems (ACS) one must broaden the view beyond those constructs commonly discussed in the context of information theory (most particularly coding theory) which may prejudice a designer of an ACS to architectures that are more closely associated with non-biologically based systems than that of uniquely biological systems, such as the brain.

The application of the knowledge representations listed above can get more specific, depending on the knowledge domain. For example, natural language knowledge can be represented by symbolic rule-based and corresponding automata models (e.g. the Chomsky hierarchy), or structural neural or Bayesian network models, or both (hybrids). Natural speech is commonly represented by stochastic models, such as Markov random field models. Semantic information is commonly represented by ontologies. An artificial or machine vision system may represent knowledge in a memory-prediction framework, such as hierarchical temporal memory. A cognitive system will represent knowledge in a somewhat more complex cognitive memory model, perhaps incorporating a hybrid of structural and symbolic representations, and based on biologically inspired systems (e.g., the human or mammalian neural functional architecture == brain).

Knowledge acquisition is also central to an AI or cognitive system, and I keep the scope fairly simple:

Knowledge Acquisition:

• Sensing & Acting

• Machine Learning

• Machine Reasoning

Sensing and acting are basic to any AI or cognitive system, and the extent is broad (see list in Fig. 1). I offer a separate primer on machine learning HERE. Machine reasoning is highly dependent on the type of knowledge representation used. For example, with ontologies, the relational structure enables basic embedded reasoning about the entities within the ontological domain. Likewise, logic and functional/operational representations have embedded mathematical reasoning. With neural networks, reasoning is not so straightforward, as the network explicitly encodes information or knowledge (through various machine learning modes), but does not explicitly indicate how that knowledge might be used for reasoning or predictions. Bayesian belief networks both encode information and explicitly include probabilistic reasoning. Memory-prediction frameworks are a type of Bayesian net inspired by the human-mammalian neocortex (see Part 3). Cognitive memories may be a generalization of an MPF to include hetero-associative memories and relational reasoning, important for representing and processing semantic structures and other complexities in natural language and speech.

One additional note on the importance of knowledge representations, in the form of a reiteration of a point originally made by Minsky and Papert: “no machine can learn to recognize X unless it possesses, at least potentially, some scheme for representing X.” As it turns out, this is just the beginning of the problem. Knowledge representations are integrally tied to learning and reasoning propensities, and in practice, many representations lead to intractable machine learning and reasoning. For the harder problems such as natural language and speech, the goal is to find suitable compositional representations that are “symbiotic” with tractable and efficient machine learning and reasoning, i.e., fast structural processing, transformations, mapping and inferencing – a challenging problem, given the nonlinear, deep structure nature of natural language and speech.

For those looking for more detail, I have expanded the outline at the top, and have included wiki or external links where possible for even further edification.

Knowledge Representations:

• Symbolic

• Rule-based systems

• Based on explicit representations of facts about the knowledge domain

• Reasoning: Not embedded, must be implemented separately

• Automata

• Abstract, self-acting mathematical machines

• Knowledge/information is encoded in the automaton states and transitions

• A mathematical formalization of rule-based systems

• Reasoning: Not embedded, must be implemented separately

• Logic (propositional, first-order,…)

• Knowledge/information is expressed in a mathematical logic formalism

• Reasoning: Embedded in the specific axiomatic formalism used

• Note: There is a distinction between formal logic and informal logic; the mathematical reasoning derived from formal logic is based on deduction, while reasoning in informal logic can take on various forms, deductive, inductive, abductive, analogical, etc.; informal logic is sometimes termed as “critical thinking”

• Functional or Operational

• Knowledge/information is represented in functional or operational form

• Reasoning: Can be embedded in the specific functional or operation, but may also be applied separately

• Structural or Hybrid

• Decision Trees

• Knowledge/information is represented in a graphical model of decisions and their possible outcomes

• Reasoning: Embedded in the possible outcomes, but can also be applied separately to find the optimal outcome (along with machine learning techniques)

• Classifiers or Clustering

• Classifiers represent labeled knowledge/information in groups based on statistical sampling or quantitative criteria, such as traits and characteristics, i.e., there are “supervisory” constraints in the grouping, and an inferred relationship between inputs and outputs

• Clusters represent unlabeled knowledge/information in groups of data objects according to a cluster model, i.e., in an unsupervised manner, with possible display of hidden or underlying structure

• Reasoning: Not embedded, must be implemented separately

• Ontologies

• Represents knowledge/information as a set of concepts within a domain, and the relationships between those concepts

• Many ontologies are graphical-based knowledge representation languages, and some are based on formal logic (see ontology languages)

• Specific examples: Semantic networks, description logic

• Reasoning: Embedded in the relational structure between entities in the domain

• Neural Networks (ANNs)

• Structured and taught to represent knowledge/information in the form of sequences or more complicated data structures

• Network nodes are artificial neurons

• Network connections are (synaptic) weights encoding the strength of the connection

• Each network node is associated with an activation or transfer function F(w(x)) that takes as input a set of activation weights for the node's parent variables (dendrites) and outputs a threshold value (axon) that propagates to the input of the next layer through a (synaptic) weight function

• Acquired knowledge through training using machine learning techniques is stored in the pattern of interconnection weights among components – weights are updated/changed through learning

• Specific examples: Multilayer feedforward (directed acyclic) and recurrent (directed cyclic or undirected) networks

• Reasoning: Not embedded, must be implemented separately

• Markov Random Fields

• Structured and taught to represent knowledge/information in the form of sequences or more complicated data structures

• Knowledge/information is represented by a set of random variables having a Markov property described by an undirected graph or network (a.k.a. Markov network)

• Network connections or nodal edges represent a probabilistic dependency between variables

• Markov networks can represent certain dependencies that a Bayesian network cannot (such as cyclic dependencies); it cannot represent certain dependencies that a Bayesian network can (such as induced dependencies); Markov models are (generally) noncausal, though hybrid approaches can incorporate some causality

• Specific examples: Boltzmann machines (stochastic version of Hopfield networks; also a stochastic ANN), hidden Markov random fields

• Special cases/hybrid random fields: Markov chains, hidden Markov models (HMMs), conditional random fields, maximum entropy Markov models, deep/sigmoid belief networks

• Markov chains are a special hybrid case, representing a sequence of random variables having a Markov property, usually described by a directed graph (cyclic or acyclic); connections are the probability of going from one state to the next, without dependence on states or sequence of states that preceded it; a Markov chain is also a stochastic Markov process implementation of a weighted finite-state automaton

• HMMs are sometimes classified as simple dynamic Bayesian networks, and are Markov chains with a noisy observation about the state at each time step added

• Reasoning: Embedded probabilistic inferencing: Networks calculate the conditional distribution of a set of nodes given values to another set of nodes by summing over all possible assignments (exact inference); approximate inference can be achieved through various algorithmic approaches, including belief propagation

• Hybrid random fields: Representations that use latent variables to propagate information through time can be divided into two classes: tractable models for which there is an efficient procedure for inferring the exact posterior distribution over the latent variables (such as HMMs), and intractable models for which there is no exact and efficient inference procedure, or are NP-hard (such as generalized Bayesian nets, MRFs and associated hybrids)

• Bayesian Networks

• Structured and taught to represent knowledge/information in the form of sequences or more complicated data structures

• Knowledge/information is represented by nodes and connections via a directed acyclic graph

• Network nodes are Bayesian random variables

• Network connections represent probabilistic dependence between variables, with conditional probabilities encoding the strength of the dependencies

• Each network node is associated with a conditional probability distribution function P(x|Parents(x)) that takes as input a particular set of values for the node's parent variables and gives the probability of the variable represented by the node

• Acquired knowledge is stored in the pattern of conditional probabilities, set by a priori knowledge (nodal evidence) and changed through learning or inferencing

• Bayesian networks/models are causal, and are often referred to as belief networks

• Specific example: Network could represent the probabilistic relationships between diseases and symptoms; given symptoms, the network can be used to compute the probabilities of the presence of various diseases

• Hybrid random fields (see above) are special classes of Bayesian nets

• Reasoning: Embedded probabilistic inferencing, specifically evidential reasoning: Networks calculate posterior probabilities of an event as output, given a priori nodal evidence; probabilistic inferencing in generalized fully Bayesian nets can be intractable (NP-hard), requiring approximate inference algorithmic approaches (see HERE and HERE for a list of approaches)

• Bayesian and Markov nets differ from deterministic, non-stochastic neural nets:

• Bayesian/Markov nets generate a probabilistic output of event occurrences

• Network structure (connectivity, nodes) can change through inferencing (probabilistic reasoning) and learning

• Bayesian belief nets can represent causal information

• Bayesian and Markov nets are types of stochastic ANNs, and in general they together represent hybrid random fields

• Memory-Prediction Frameworks (MPFs)

• Knowledge/information is represented as a bi-directional hierarchy of memories, from which predictions can be made, given sensory inputs

• Inspired by the dense hierarchical interconnected structure of the neocortex, modeled to classify inputs, learn sequences of patterns, form a constant pattern or an “invariant representation” for a sequence, and make specific predictions

• Memories are represented auto-associatively: higher levels predict future input by matching partial sequences and projecting their expectations to the lower levels; recall is based upon partial information of stored representations

• (As such, MPFs may be suited best for vision systems, but have more limited use for natural language and speech, or for “general cognitive intelligence” where data structures beyond local sequences are required)

• Learning occurs through a mismatch between bottom-up inputs and top-down expectations: newly learned representations get “propagated” upward through the hierarchy to form a more complete, invariant representation from which future predictions can be made

• Specific examples: Hierarchical temporal memory (HTM)

• Reasoning: Embedded probabilistic reasoning and inferencing

• Cognitive Memory

• Knowledge/information is represented as closely as possible by the functional memory and “encodings” of a cognitive system, such as the human or mammalian neocortex/old brain

• Functional memory in general may include associative memory (auto and hetero) and non-associative memory, or alternatively, short-term memory and long-term memory

• “Encoding” schemes in general may include those identified with actual neural encoding schemes found in conjunction with short and long-term functional memory, and have structure beyond local sequences (i.e., constituent structures, long-term dependencies, sparse relational structures, etc.)

• (Note I qualify the term “encoding” since some in the neuroscience community persuasively argue the problems with linking artificial computational encoding schemes with those of the brain, particularly those used in higher-level processing vs. those used in early sensory processing)

• “Neurocognitive networks” are biologically inspired models incorporating the large-scale structure and dynamics of the brain, particularly the processing in the thalamocortical and corticocortical system, nominally aimed at the neural circuit/network scale – some of the efforts within this thrust can be generalized to more algorithmic approaches that would essentially be similar to the HTM model discussed above, or more complex generalizations of it

• Readers should note key considerations for artificial cognitive systems (representations and architectures), including some of the more unconventional observations and properties

• These representation models will be required for artificial cognitive systems and cognitive architectures that exhibit natural language understanding and other grand challenge functions shown in Fig. 1

• The goal is to continually develop cognitive representations and associated architectures as we learn more about the mammalian/human neural structure (functional and physical)

Readers looking for more detail on how some of these representations are used in natural language processing (NLP) and automatic speech recognition (ASR) may want to review the Primer on NLP. (I am also working on a primer on machine vision, which will also include such detail for that application, and plan a future primer on cognitive systems and architectures.) The Primer on Machine Learning may also be of side benefit.

(Disclaimer: This primer is meant to inform. I encourage readers who find factual errors or deficits to contact me (click on contact link below). I also welcome constructive and friendly comments, suggestions and dialogue.)

EidolonSpeak.com ~ Artificial Intelligence

AI & Artificial Cognitive Systems

External Sites

Primer on Knowledge Representations and Acquisition