Disclaimer: The Eqbal Ahmad Centre for Public Education (EACPE) encourages critical and independent thinking and believes in a free expression of one’s opinion. However, the views expressed in contributed articles are solely those of their respective authors and do not necessarily reflect the position or policy of the EACPE.
Intelligent systems with bounded capacities often outperform the ones that have far more means available at their disposal. In that way, a mere expansion in their structure and resources does not necessarily deliver good quality outcomes. In fact, extracting high performance from a machine learning engine is an intricate exercise that requires fine tuning of its various components. These adjustments consist of either imposing or relaxing of restrictions on different aspects of their design. Therefore, one of the major challenges faced by architects of Artificial Intelligence (AI) systems is to configure and implement optimised levels of constraints to achieve the desired outcomes.
Within the spectrum of AI solutions, neural network as well as its larger version known as deep learning system are gaining considerable attention. These are massively parallel and interconnected systems inspired by the functions of the brain. An input to them triggers a reaction from the receiving artificial neurons that pass the incoming pulse to their connected nodes. The signals are then further propagated and the activation spreads until it reaches quiescence which represents the output. Within this network, every link has a weight that is represented by a number which indicates the intensity of the connection. Therefore, the transfer of activation between nodes is either dampened or amplified depending upon the weight of their respective connections.
Although the initial weights are assigned randomly, they undergo successive modifications during the network training phase to minimise erroneous conclusions. For example, if we want the system to recognise a “car”, we would initially compile a set of labelled images of cars as well as other objects. As the network reads an input, it will initially converge to a wrong answer due to the original weights having been assigned randomly. However, every mistake triggers an automated process by which the weights are tweaked to enable a valid result. The calibration of weights is carried out repeatedly over multiple epochs to achieve a high level of perfection prior to deployment. This process typifies supervised learning in which generalisations are evolved from a set of examples and counter specimens.
With the massive advancement in hardware and cloud technologies, it is now feasible to deploy significantly larger neural networks than it ever was previously. As these giant configurations have immense capacity to manage compound relationships, they are often preferred by organisations grappling to generate insights from their big data. Unexpectedly though, despite possessing tremendous capabilities, bigger compositions do not guarantee improved outcomes when compared with relatively shallower structures. In fact, as the size of the network increases, there is a greater likelihood for it to unintentionally memorise without forming valid generalisations. The situation is analogous to learning by rote in which the system achieves high accuracy on known cases but performs poorly when presented with novel instances.
The relationship of memory with learning follows an idiom which states that too much of a good thing can be harmful.
A gigantic network by virtue of possessing massive number of neurons also has an immense capacity to store information. However, despite memory being a critical system component, its growth beyond a threshold could be detrimental for machine learning. This implies that the system output initially improves with expansion in storage, but as it increases further, the results decline beyond a tipping point. Therefore, compositions that are either miniscule or humungous could both lead to poor output, whereas, the optimum size lies between the two extremes. Hence, this is one of the main reasons why system designers carry out many trials to search for a network configuration that provides the best quality results.
The unique relationship between learning and memory can also be explained by using a simple hypothetical example. In this scenario we present 100 data items to train a network that has a capacity to store more than 100 objects. As the storage is greater than the size of the training set, the network could easily find one specific slot for each of its input. In this instance, therefore, no generalisation occurs because the system could simply memorise the fed data to achieve an effective recall.
However, let us now assume that the network’s structure was reduced to store 50 items and the same input was presented to it. This reduction in memory would compel the system to find commonalities amongst the input in order to place two items in a single slot. Hence, the imposed constraint ensures the occurrence of some form of generalisation and minimisation of rote learning.
In summary, if the configuration is tiny with very little memory, then it could make mistakes due to over-generalisation. This means that the network would wrongly classify almost all instances to the target concept. On the other hand, if the composition is massive with huge memory, then it could commit errors due to over-specialisation. This implies that the network recognises the input exclusively and incorrectly rejects everything else that falls outside of the training set.
The relationship of memory with learning follows an idiom which states that too much of a good thing can be harmful. Although storage is an important prerequisite for learning to exist, its expansion should be done cautiously since excessive levels could lead to poor results. Any hypothetical system with unlimited memory could potentially cache all that is presented to it without undergoing any form of generalisation. Therefore, effective learning can only cultivate in an environment that restricts or controls unchecked growth in memory.
It is generally believed that neural networks offer an immense opportunity to deliver innovative solutions when dealing with complex information. However, as corporations move to explore its potential, it is important to know that high performance cannot be achieved by solely expanding its configuration ad infinitum. In order to reap maximum benefits, a balanced composition is essential which is often dependent on the nature of the problems that are being addressed. Therefore, several rounds of iterative experiments are required to architect and deploy an ideal structure. This effort towards achieving an optimum is worthwhile, since aptly designed networks have the potential to live up to their lofty promises.
Vaqar Khamisani is based in London and works as a Global Director of Insights for a leading information-based analytics company.