Click below to go there. Click on any section heading to come back
here.
1. Overview
2. Introduction
3. Neural networks
3.1. What are neural networks?
3.2. Why neural networks?
3.3. Learning
3.4. Letter Learner
3.4.1. What is Letter Learner?
3.4.2. How does it work?
3.4.3. What Letter Learner can do?
3.4.4. What Letter Learner can not do?
4. Classical machine learning
4.1. Learning strategies
4.2. ID3
4.2.1. How does ID3 work?
4.2.2. The weaknesses of ID3
6. References
7. About the author
Return to tutorials menu
This paper aims to compare the connectionist approach with the classical approach. Two
applications, Letter Learner and ID3, are introduced to evaluate each approach. Discussion
and several ideas are given at the end of the paper.
The first question we should be clear about is what learning is? Simon denotes: "... changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time". We can say that a computer has learned something after it enhanced the performance of doing something without changing program.
To let a machine learn to do something is the key in the field of Artificial Intelligence. People even say that machines can only be thought to be intelligent when they are able to learn to do things by themselves, not by programmer.
In the late 1980s more attention was focused on neural networks which are based on brain-like learning. They are different from the traditional approach which is based on programming. This may mean that people have found a new possibility to solve the machine learning problem.
Neural networks, also called connectionist or parallel distributed processing, are set up based on one of the simple realistic brain models which people have known now. They consist of a large number of connected simple computational units (like neurons). Each unit examines its inputs and calculates a result activation which is transferred to other units. Each connection has a signed number weight that can determine if an activation which travels along it will affect the receiving unit to do the same calculation. The size of the weight determines the magnitude of the influence of a sending unit's activation upon the receiving cell. Therefore the output of neural networks is determined by the connections and their weight of the model.
To make a machine learn something is extremely difficult. Over 20 years, people can not find a good way to solve it in traditional ways. There are things that are very easy for humans but very difficult for the machine. Consider the task of recognizing a person. We can do it whether the person has had his hair cut or not, in new clothes or not. But it is quite difficult for the computer to have such an ability. We can sometimes read a person's thought from his/her eyes. It looks impossible for the machine to do that yet. Naturally people begin to think why not let the machine learn from the human being. That is why neural networks are attractive because they are computational models with similarities to human brain processing.
Since we have limited knowledge about how our brain works, the neural networks are not
guaranteed to solve the machine learning problems. In some areas they are able to do much
better than is currently possible using classical approaches.
3.3. Learning
The following is the main learning model in neural networks. For more information about them, please see the references.
A backpropagation network normally begins with a random set of weights. The network adjusts its weights whenever it sees an input-output pair. Two stages, a forward pass and a backward pass, are required for each pair. The operation, during the forward involves feeding a sample input to the network and allowing activations flow to the output layer. The backward pass involves comparing the network's actual output (from the forward pass) with the target output and computing error estimates for the output units. To reduce those errors, the weights connected to the output units are adjustable in order. So the error estimates of the output units can be used to derive error estimates for the units in the hidden layers. In the end, errors are propagated back to the connections coming out from the input units. The backpropagation continuously adjusts its weights, after seeing each input-output pair.
Unlike Hopfield networks, which can only obtain a local stable state, Boltzmann machines tries to find global optimal solutions to combinatorial problems.
The networks ate trained by a punishment and reward system, not by sample outputs. When the teacher gives a positive value by a real-valued judgement, it means good performance, while a negative value shows bad performance. The network tries to find a set of weights that can prevent negative reinforcement in the future.
Unlike reinforcement learning that has a teacher to feedback the real-valued judgement for learning, unsupervised learning takes place without the teacher, and therefore, no feedback for its outputs.
A Letter Learner model is a kind of neural networks, which belongs to pattern recognition. It is designed to let the machine recognize your handwriting after you have trained the machine to get familiar with your handwriting.
A Letter Learner model is a kind of neural networks, which belongs to pattern
recognition. It is designed to let the machine recognize your handwriting after you have
trained the machine to get familiar with your handwriting.
Architecture
The mode consists of three parts which are input, processing unit and output.
An input is often a 5 X 5 grid or more which you write a letter in. The processing unit is
similar to a neuron which processes the input information. The output is the result, which
here is what the computer guesses the letter you have input.
Training
To use the model, first, you should make the machine familiarise with your handwriting,
i.e. train the machine. When you draw a letter in the input box, Letter Learner sees the
letter by the numbers in each grid. If you have drawn anything at one square of the box,
the square is assigned a value of +1. Otherwise, the square gets a value of -1. These
numbers are called input vector. The series of input vector will feed into the
processing unit to calculate the value. The weight vector is a series of numbers in
the processing unit; one for each of the numbers in the input vector. Before you press the
letter button to teach the machine to recognise your letter, all the values in the weight
vector are zeroes. The key function of the processing unit is to adjust the weight vector
to get the output to the letter which you want. After the machine has learned the letter,
i.e. you have pressed the letter button, the weight vector will get its own number. You
can also reinforce the same letter training by teaching it again. The weight vector may
have a slight change. After you finish the training, you can check the machine's ability
to recognize your handwriting.
Recognize
To test what the machine has learned is the most exciting thing in using this model. Just input your handwriting, and press the Guess button, and see what will happen?
The Letter Learner starts by picking the processing unit with the weight vector closest
to the input vector for the letter you have written. First, the Letter Learner takes the
input vector for the letter you wrote and gives it as an input to the processing unit for
all the letters you have taught the Letter Learner. If the numbers in the input vector and
the weight vector are very similar, the processing unit gives a number close to +5. If the
numbers are very different, it gives a number close to -5. These numbers given by the
processing unit are only a means to tell how close the input vector is to the processing
unit weight vector. The Letter Learner guesses the letter that goes with the processing
with the highest output. If the output values are all below +2, then the Letter Learner
assumes that it has not learned that letter yet, and guesses a question mark instead.
It can recognize your handwriting no matter how informal it is, even when human beings
other than yourself can not guess it at all. The only pre-condition is that you should
train the Letter Learner. How smart the Letter Learner is, you may wonder.
The letter you input should be the same size and in the same place as the letter you
trained the Letter Learner. If it is not, no matter how likely the letter you input now is
to the letter you did before, or no matter how easy it is for humans to recognise, Letter
Learner can not get the right answer. You can also try to rotate slightly the letter you
input, it will be easy for people to guess what you have input, but for Letter Learner, it
will be impossible to guess. You will probably think how stupid Letter Learner is.
The future learning problems will probably be solved with connectionist knowledge. However, a large number of difficult problems have been tackled in symbolic systems (classical machine learning).
The following is a brief introduction of the main learning strategies:
Learning by taking advice
When the computer does something according to instructions, such as an expert
suggestion in the code, we say the machine is learning by taking advice.
Learning from examples
In this approach, the program will be training the machine with a set of positive and
negative examples. It generates the rules and induces new understanding.
Explanation-Based Learning
A lot of recent research in machine learning has abandoned the empirical,
data-intensive approach. Instead, a more analytical, knowledge-intensive approach has been
adopted. The latter is referred to as explanation based learning. This learning system
tries to learn from a single example, say, x by explaining the reason why x is an example
of the concept to be used. The explanation is developed into a general guiding rule. Due
to this knowledge, the system's performance is improved.
Discovery
Learning itself is a problem-solving process. Discovery is a restricted form of
learning in which one entity acquires knowledge without the help of a teacher.
ID3 (Interactive Dichotomizer 3) uses the learning from example strategy. It is an
algorithm, developed by J. Ross Quinlan, to find rules to describe a set of examples.
ID3 uses a tree representation for concepts. It is a program that builds decision trees
automatically over the examples, preferring simple trees over complex ones. in theory that
simple trees are more accurate classifiers of future inputs. If all the examples are
classified correctly, the algorithm halts. Otherwise, it adds a number of training
examples and the process repeats. Maybe a numbered list:
4.2.1. How does ID3 work?
The best way to illustrate how ID3 works is using an example. Let us imagine the
following simple example:
The most important factors affecting
the performance of share are:
1. The government party
- Labour
- Conservative
2. The interest rate of bank
- high
- normal
- low
Government | Interest Rate of Bank | Shares |
Labour | high | down |
Conservative | high | down |
Labour | normal | up |
Conservative | normal | down |
Labour | low | up |
Conservative | low | up |
Find rules from the above records.
The rules can be found by splitting the examples by government party:
If government = Labour and interest rate = high then down
If government = Labour and interest rate = normal then up
If government = Labour and interest rate = low then up
If government = Conservative and interest rate = high then down
If government = Conservative and interest rate = normal then down
If government = Conservative and interest rate = low then up
These rules seem to be complicated. We can try another way to get rules, i.e. splitting
the examples by the rate of bank:
If interest rate = high then down
If interest rate = low then up
If interest rate = normal and government = Labour then up
If interest rate = normal and government = Conservative then down
This time the rules look simple. ID3 always try to find the simple rules by using a measure of information called "entropy". If you are interested in that, please see the reference books. There are always lots of AI books covering ID3.
Test
If rules are successfully produced by ID3, you can test what ID3 has learned. The right answer seems always to be there. That is because the answer is indeed in its rules.
The results are over-sensitive to small alterations to the training examples. It is not
particularly robust in the face of noisy data. In some case the decision tree is built
very large because of its 100% correct rules, and the large decision tree is difficult for
people to understand.
There are lots of learning models based on different theories. We can divide them into two basic categories, i.e. the connectionist approach and the classical approach (symbolic approach). Each approach has its strengths and weaknesses.
The connectionist systems seem to be more learnable than their symbolic counterparts. Connectionist models offer several ways of maintaining multiple meanings. It can deal better with noise than the symbolic approach. You can input different types of letters for Letter Learner. It can learn from them. However, neural network learning algorithms usually involve a large number of training examples and long training periods compared to their symbolic cousins. The letter learner is just a simple example of the connectionist. You need also to train the letter each more than once, if you want a satisfied result. After a network has learned to perform a difficult task after a long training period, its knowledge is usually quite opaque like a human being. It is not like ID3 that has a clear rule to know why.
Connectionist approach seems to be more desirable than classical approach. Because connectionist models come from mimicking human brains. The cells of brains are correspond to neurons, activation's correspond to neuronal firing rates, connections correspond to synapses, and connection weights correspond to synaptic strengths. Although connectionist models are far too simple to function as realistic brain models on the cell level, they might serve as very good models for the essential information processing tasks that brains perform. This is of course open to further study because we have so little understanding of how the brain actually works. However, not every time, what we learn from the animal can help us achieve more. An example in point is that we could not have invented the air plane if we had mimicked birds' flying by beating the wings.
Somebody may try to find a "universal" learning algorithms and believe that "universal" learning algorithms will perform well on any application. But these can always be outperformed by a second class of algorithms better selected and modified for the particular application.
A good idea is to combine the two approaches. That means we can use the symbolic system rules to guide the performance of a neural network besides what it has learnt from training. In fact, people also learn something with the help of rules. We can not learn everything through practice. Using these combined approaches could shorten training time of neural networks.
Connectionist approach is still in an infant stage and it is certain that better developed learning strategies will happen in this domain. Obviously a breakthrough is very unlikely unless we acquire more and fuller knowledge about our brains.
Last Change Thu, Jun 5, 1997