An introduction to knowledge engineering, Simon Kendal & Malcom Creen, Springer 2005.

This book provides a gentle introduction to knowledge engineering which encompasses the acquisition, representation and management of so-called "knowledge". After reviewing the basic tools for managing knowledge-based systems, namely Expert Systems, Neural Networks, Case-Based Reasoning, Genetic Algorithms, Intelligent Agents and Data Mining, the authors develop useful concepts relating to knowledge acquisition and representation. Then, dedicated programming languages are reviewed, including expert systems shells and PROLOG, before tackling the design of common knowledge-based systems (architecture, life cycle and the like). Finally, the rest of the book is devoted to uncertain reasoning and hybrid knowledge-based systems, with a particular emphasis on probabilistic reasoning, fuzzy logic, and the integration of symbolic and connectionist systems.


Although this book should be viewed as an elementary book on such an extensive field as Knowledge-Based Systems (KBS), I shall use it as the basis for illustrating some of the classical tools tuned to Artificial Intelligence (AI) programming. The open-source statistical package R will be used in the following applications.

Other reference textbooks related to AI and KBS include, but is not limited to:

Some additional pointers can be found on the free on-line encyclopedia Wikipedia.

We will mainly focus our attention on Neural Networks, Genetic Algorithms and Data Mining. These computational frameworks will be used as our starting point for

Neural networks

Before going down to the statistical properties of the NN approach, in particular its link to the more usual regression approach, let's remind the reader some of the main properties of an artificial NN. A classical textbook on this subject obviously is [Ripley1996].

Principles of neural modeling

It can be shown that a two-layer feedforward neural network can implement any Boolean function. Figure 1 illustrate how a two-layer network can solve the XOR problem (reproduced from [Berthold2003], p. 279).

Figure 2 : A two-layer network that can solve the XOR problem

You may recall that the XOR problem, i.e. the exclusive-OR function whose truth table is given below, usually cannot be resolved by the basic artificial neuron proposed by [McCulloch1943].

Table: The XOR function
x1 x2 y
0 0 0
0 1 1
1 0 1
1 1 0

Some applications

The R package nnet (now bundled in the VR package) allows to fit a single-hidden-layer neural network to a data matrix. To illustrate how such NN can uncover the properties of a dataset, we will use the well-known Fisher Iris data (Fisher, 1936). These data are measurements of the sepal length and width and petal length and width in centimetres of fifty plants for each of three types of iris; Iris setosa, Iris versicolor and Iris virginica. They most often are used to illustrate the principles underlying discriminant analysis. Indeed, though the data were collected by Dr. Edgar Anderson, R. A. Fisher published the data on Iris setosa and Iris versicolor to demonstrate the use of discriminant functions. The Iris virginica data are used to extend Fisher's technique and to test Randolph's (1934) hypothesis that Iris versicolor is a polyploid hybrid of the two other species which is related to the fact that Iris setosa is a diploid species with 38 chromosomes, Iris virginica a tetraploid and Iris versicolor having 108 chromosomes is a hexaploid. Here is what it looks like:

> data(iris)
> pairs(iris)
Figure 1 : The Fisher's iris

The program nn_ex1.java implements a basic hopfield neural network.

Knowledge representation and management

Rule-based systems

Representing knowledge through a KBS can be handled using two kind of programming approach: Procedural and Declarative. Procedural programming refers to a set of procedures, or fixed instructions, that have to be performed in a specific order, while declarative programing mainly involves a set of rules (statements about given facts) for which the processing sequences are not defined by the engineer. In short, the statements provide information regarding the association between several objects, or entities, and the system decides, through its inference engine, when to apply selected rules.

Among others, forward and backward chaining (also see The AI Dictionary) are generally found in any rule-based system. Such a system uses the basics of propositional logic to manipulate data which in turn is stored in the system through symbols or entities related one to to each other. Relationships between entites and values are mostly represented using classical symbols, such as AND, OR, NOT, IMPLIES, FOR ALL, EXIST, etc.

In forward chaining, the inferential procedure starts with a set of facts (i.e. logical assertions of the form IF fact1 THEN fact2) and processes them to reach conclusions about the domain of expertise. Forward chaining rules are fired for each new data that is presented to the system until it cannot reach any further conclusion.

On the contrary, in backward chaining, the system is initialized with a given hypothesis, and, then, the veracity of this hypothesis is proved by checking the rules within the domain. In other words, the system is driven from the goal to the data while the reverse holds in the preceding case.

Quoting the Online Computer Dictionnary:

An algorithm for proving a goal by recursively braking it down into sub-goals and trying to prove these until facts are reached. Facts are goals with no sub-goals which are therefore always true. Backward training is the program execution mechanism used by most logic programming language like Prolog.

Opposite: forward chaining.

— backward chaining

One can find an implementation in Java of such rule-based programming on the Algernon's project webpage. There is also a complete implementation of an expert system in PROLOG. Note that among Algernon code samples, there is a Mycin-like reasoning sheme example. Mycin [Buchanan1984] is a well-known example of the use of Expert System in the biomedical domain, and it was mainly used for training the becoming physician (other links: 1, 2).

The following example is taken from the Algernon example. It shows how one of the decision rules is intanciated using this KBS:

;; Rule 4  "A patient who has renal_abnormality has abnormal_urologic_anatomy"
(tell ((:add-rule Assertion
           ;; test slots of the modified or new Assertion
           ((concept ?assertion "renal_abnormality")
            (value   ?assertion :TRUE)
            (:add-instance (?assertion Assertion ) ;; Add a new assertion
                (concept ?assertion "abnormal_urologic_anatomy")
                (value   ?assertion :TRUE))

The following articles, related to various scientifical fields, should be of relevant interest:

  • S. S. Joshi and B. Guilhabert, Sequence-Learning Algorithm Based on Backward Chaining, Adaptive Behavior, 14(1): 53-71 (2006) [abstract]

  • R. Poli and W. B. Langdon, Backward-chaining evolutionary algorithms, Artificial Intelligence, 170: 953-982 (2006) [pdf paper]

  • T. Mszros and B. Vadsz, An Extension to the RETE Match Algorithm: Supporting both Forward and Backward Chaining, TEMPUS JEP3815, Budapest, Hungary (1994) [pdf paper]

  • D. H. Fisher, M. E. Edgerton, Z. Chen, L. Tang, and L. Frey, Backward Chaining Rule Induction [pdf paper]

  • L. Aversano, G. Canfora, and A. Clampi, An Algorithm for Web Service Discovery through Their Composition, IEEE International Conference on Web Services (ICWS'04): 332 [abstract]

  • R. Poli and W. B. Langdon, Backward-chaining Genetic Programming, GECCO'05, June 25-29 (2005) [pdf paper]

Semantic networks and frames

Both methodologies— semantic network and frames—can be thought as the precursors of the actual high-level programming languages, such as C++ or Java, which are fundamentally object-oriented languages.

Semantic networks are mostly a convenient way to graphically represent associations between entities in the knowledge domain. In fact, associations allow to describe the hierarchical relations between all of the entities. Such a graphical network is illustrated in the following figure (reproduced from Kendal & Creen, p. 143).

Figure 2 : A small semantic network

Relations, in particular inheritance relationship, can be represented using a simple oriented graph whose node contains the entities and link represent the relation between two entities.

However, as shown in Figure 3 (Kendal & Creen, p. 144), adding a single property to the network can drastically reduce the power of the inference that can be made about the domain. Indeed, after setting that grass snake eats meat, we now arrive at differing conclusions depending on when we start to read the graph.

Figure 3 : The same semantic network after a slight modification

For the interested reader, some links are given below:

Frames technology offer a way to circumvent some of the limitations of the semantic network approach. More precisely, frames allow

Figure 4 : Example of a set of frames

Dedicated programming language

Lisp and PROLOG are certainly the most promoted programming languages for AI applications. They were created in the 1958 and 1972 and differ from procedural language in that they allow the programmer to use declarative assertion rather than inputting a raw sequence of instructions.


  1. [Berthold2003] Berthold, M. and Hand, D. J. (2003). Intelligent Data Analysis. An Introduction. Springer.

  2. [Ripley1996] Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge.

  3. [Buchanan1984] Buchanan, B. G. and Shortliffe, E. H. (1984). Rule-Based Expert Systems, The Mycin experiments of the Stanford Heuristic Programming Project. Addison-Wesley Publishing Company. [available online at www.aaaipress.org]

  4. [McCulloch1943] McCulloch, W. S. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133.