Machine Learning: Connectionist Approach vs. Classical Approach

This paper aims to compare the connectionist approach with the classical approach. Two applications, Letter Learner and ID3, are introduced to evaluate each approach. Discussion and several ideas are given at the end of the paper.

2. Introduction

The first question we should be clear about is what learning is? Simon denotes: "... changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time". We can say that a computer has learned something after it enhanced the performance of doing something without changing program.

To let a machine learn to do something is the key in the field of Artificial Intelligence. People even say that machines can only be thought to be intelligent when they are able to learn to do things by themselves, not by programmer.

In the late 1980s more attention was focused on neural networks which are based on brain-like learning. They are different from the traditional approach which is based on programming. This may mean that people have found a new possibility to solve the machine learning problem.

3. Neural networks

3.1. What are neural networks?

Neural networks, also called connectionist or parallel distributed processing, are set up based on one of the simple realistic brain models which people have known now. They consist of a large number of connected simple computational units (like neurons). Each unit examines its inputs and calculates a result activation which is transferred to other units. Each connection has a signed number weight that can determine if an activation which travels along it will affect the receiving unit to do the same calculation. The size of the weight determines the magnitude of the influence of a sending unit's activation upon the receiving cell. Therefore the output of neural networks is determined by the connections and their weight of the model.

3.2. Why neural networks?

To make a machine learn something is extremely difficult. Over 20 years, people can not find a good way to solve it in traditional ways. There are things that are very easy for humans but very difficult for the machine. Consider the task of recognizing a person. We can do it whether the person has had his hair cut or not, in new clothes or not. But it is quite difficult for the computer to have such an ability. We can sometimes read a person's thought from his/her eyes. It looks impossible for the machine to do that yet. Naturally people begin to think why not let the machine learn from the human being. That is why neural networks are attractive because they are computational models with similarities to human brain processing.

Since we have limited knowledge about how our brain works, the neural networks are not guaranteed to solve the machine learning problems. In some areas they are able to do much better than is currently possible using classical approaches.

3.3. Learning

The following is the main learning model in neural networks. For more information about them, please see the references.

Backpropagation Networks

A backpropagation network normally begins with a random set of weights. The network adjusts its weights whenever it sees an input-output pair. Two stages, a forward pass and a backward pass, are required for each pair. The operation, during the forward involves feeding a sample input to the network and allowing activations flow to the output layer. The backward pass involves comparing the network's actual output (from the forward pass) with the target output and computing error estimates for the output units. To reduce those errors, the weights connected to the output units are adjustable in order. So the error estimates of the output units can be used to derive error estimates for the units in the hidden layers. In the end, errors are propagated back to the connections coming out from the input units. The backpropagation continuously adjusts its weights, after seeing each input-output pair.

Boltzmann Machines

Unlike Hopfield networks, which can only obtain a local stable state, Boltzmann machines tries to find global optimal solutions to combinatorial problems.

Reinforcement Learning

The networks ate trained by a punishment and reward system, not by sample outputs. When the teacher gives a positive value by a real-valued judgement, it means good performance, while a negative value shows bad performance. The network tries to find a set of weights that can prevent negative reinforcement in the future.

Unsupervised Learning

Unlike reinforcement learning that has a teacher to feedback the real-valued judgement for learning, unsupervised learning takes place without the teacher, and therefore, no feedback for its outputs.

3.4. Letter Learner

3.4.1. What is Letter Learner?

A Letter Learner model is a kind of neural networks, which belongs to pattern recognition. It is designed to let the machine recognize your handwriting after you have trained the machine to get familiar with your handwriting.

3.4.2. How does it work?

Architecture

The mode consists of three parts which are input, processing unit and output. An input is often a 5 X 5 grid or more which you write a letter in. The processing unit is similar to a neuron which processes the input information. The output is the result, which here is what the computer guesses the letter you have input.

Training

To use the model, first, you should make the machine familiarise with your handwriting, i.e. train the machine. When you draw a letter in the input box, Letter Learner sees the letter by the numbers in each grid. If you have drawn anything at one square of the box, the square is assigned a value of +1. Otherwise, the square gets a value of -1. These numbers are called input vector. The series of input vector will feed into the processing unit to calculate the value. The weight vector is a series of numbers in the processing unit; one for each of the numbers in the input vector. Before you press the letter button to teach the machine to recognise your letter, all the values in the weight vector are zeroes. The key function of the processing unit is to adjust the weight vector to get the output to the letter which you want. After the machine has learned the letter, i.e. you have pressed the letter button, the weight vector will get its own number. You can also reinforce the same letter training by teaching it again. The weight vector may have a slight change. After you finish the training, you can check the machine's ability to recognize your handwriting.

Recognize

To test what the machine has learned is the most exciting thing in using this model. Just input your handwriting, and press the Guess button, and see what will happen?

The Letter Learner starts by picking the processing unit with the weight vector closest to the input vector for the letter you have written. First, the Letter Learner takes the input vector for the letter you wrote and gives it as an input to the processing unit for all the letters you have taught the Letter Learner. If the numbers in the input vector and the weight vector are very similar, the processing unit gives a number close to +5. If the numbers are very different, it gives a number close to -5. These numbers given by the processing unit are only a means to tell how close the input vector is to the processing unit weight vector. The Letter Learner guesses the letter that goes with the processing with the highest output. If the output values are all below +2, then the Letter Learner assumes that it has not learned that letter yet, and guesses a question mark instead.

3.4.3. What Letter Learner can do?

It can recognize your handwriting no matter how informal it is, even when human beings other than yourself can not guess it at all. The only pre-condition is that you should train the Letter Learner. How smart the Letter Learner is, you may wonder.

3.4.4. What Letter Learner can not do?

The letter you input should be the same size and in the same place as the letter you trained the Letter Learner. If it is not, no matter how likely the letter you input now is to the letter you did before, or no matter how easy it is for humans to recognise, Letter Learner can not get the right answer. You can also try to rotate slightly the letter you input, it will be easy for people to guess what you have input, but for Letter Learner, it will be impossible to guess. You will probably think how stupid Letter Learner is.

4. Classical machine learning

The future learning problems will probably be solved with connectionist knowledge. However, a large number of difficult problems have been tackled in symbolic systems (classical machine learning).

4.1. Learning strategies

The following is a brief introduction of the main learning strategies:

Learning by taking advice

When the computer does something according to instructions, such as an expert suggestion in the code, we say the machine is learning by taking advice.

Learning from examples

In this approach, the program will be training the machine with a set of positive and negative examples. It generates the rules and induces new understanding.

Explanation-Based Learning

A lot of recent research in machine learning has abandoned the empirical, data-intensive approach. Instead, a more analytical, knowledge-intensive approach has been adopted. The latter is referred to as explanation based learning. This learning system tries to learn from a single example, say, x by explaining the reason why x is an example of the concept to be used. The explanation is developed into a general guiding rule. Due to this knowledge, the system's performance is improved.

Discovery

Learning itself is a problem-solving process. Discovery is a restricted form of learning in which one entity acquires knowledge without the help of a teacher.

4.2. ID3

ID3 (Interactive Dichotomizer 3) uses the learning from example strategy. It is an algorithm, developed by J. Ross Quinlan, to find rules to describe a set of examples.

ID3 uses a tree representation for concepts. It is a program that builds decision trees automatically over the examples, preferring simple trees over complex ones. in theory that simple trees are more accurate classifiers of future inputs. If all the examples are classified correctly, the algorithm halts. Otherwise, it adds a number of training examples and the process repeats. Maybe a numbered list:

4.2.1. How does ID3 work?

The best way to illustrate how ID3 works is using an example. Let us imagine the following simple example:

The most important factors affecting the performance of share are:

1. The government party

- Labour
- Conservative

2. The interest rate of bank

- high
- normal
- low

Here are the stock market records:

Government	Interest Rate of Bank	Shares
Labour	high	down
Conservative	high	down
Labour	normal	up
Conservative	normal	down
Labour	low	up
Conservative	low	up

Find rules from the above records.

The rules can be found by splitting the examples by government party:

If government = Labour and interest rate = high then down

If government = Labour and interest rate = normal then up

If government = Labour and interest rate = low then up

If government = Conservative and interest rate = high then down

If government = Conservative and interest rate = normal then down

If government = Conservative and interest rate = low then up

These rules seem to be complicated. We can try another way to get rules, i.e. splitting the examples by the rate of bank:

If interest rate = high then down

If interest rate = low then up

If interest rate = normal and government = Labour then up

If interest rate = normal and government = Conservative then down

This time the rules look simple. ID3 always try to find the simple rules by using a measure of information called "entropy". If you are interested in that, please see the reference books. There are always lots of AI books covering ID3.

Test

If rules are successfully produced by ID3, you can test what ID3 has learned. The right answer seems always to be there. That is because the answer is indeed in its rules.

4.2.2. The weaknesses of ID3

The results are over-sensitive to small alterations to the training examples. It is not particularly robust in the face of noisy data. In some case the decision tree is built very large because of its 100% correct rules, and the large decision tree is difficult for people to understand.

5. Conclusions

There are lots of learning models based on different theories. We can divide them into two basic categories, i.e. the connectionist approach and the classical approach (symbolic approach). Each approach has its strengths and weaknesses.

The connectionist systems seem to be more learnable than their symbolic counterparts. Connectionist models offer several ways of maintaining multiple meanings. It can deal better with noise than the symbolic approach. You can input different types of letters for Letter Learner. It can learn from them. However, neural network learning algorithms usually involve a large number of training examples and long training periods compared to their symbolic cousins. The letter learner is just a simple example of the connectionist. You need also to train the letter each more than once, if you want a satisfied result. After a network has learned to perform a difficult task after a long training period, its knowledge is usually quite opaque like a human being. It is not like ID3 that has a clear rule to know why.

Connectionist approach seems to be more desirable than classical approach. Because connectionist models come from mimicking human brains. The cells of brains are correspond to neurons, activation's correspond to neuronal firing rates, connections correspond to synapses, and connection weights correspond to synaptic strengths. Although connectionist models are far too simple to function as realistic brain models on the cell level, they might serve as very good models for the essential information processing tasks that brains perform. This is of course open to further study because we have so little understanding of how the brain actually works. However, not every time, what we learn from the animal can help us achieve more. An example in point is that we could not have invented the air plane if we had mimicked birds' flying by beating the wings.

Somebody may try to find a "universal" learning algorithms and believe that "universal" learning algorithms will perform well on any application. But these can always be outperformed by a second class of algorithms better selected and modified for the particular application.

A good idea is to combine the two approaches. That means we can use the symbolic system rules to guide the performance of a neural network besides what it has learnt from training. In fact, people also learn something with the help of rules. We can not learn everything through practice. Using these combined approaches could shorten training time of neural networks.

Connectionist approach is still in an infant stage and it is certain that better developed learning strategies will happen in this domain. Obviously a breakthrough is very unlikely unless we acquire more and fuller knowledge about our brains.

6. References

Aleksander, I. And Morton, H. (1990). An Introduction to Neural Computing. Chapman and Hall.

Simon, H. A. (1983). Why should machines learn? In Machine Learning, An Artificial Intelligence Approach, ed. R. S. Michalski, J. G. Carbonell, and T. M. Mitchell. Palo Alto, CA: Tioga Press.

Elaine Rich and Kevin Knight, (1991). Artificial Intelligence 2nd ed., McGraw-Hill, Inc.

Stephen I. Gallant, (1993). Neural Network Learning and Expert Systems, The MIT Press.

7. About the author

Author: Li Yi (liy@sbu.ac.uk) is studying an intelligent distributed object-oriented database.

Dave Inman SCISM SBU Home

Last Change Thu, Jun 5, 1997