PART 1 - NEURAL CODEThe greatest uncertainties in contemporary views of cognitive function lie in the area of neural coding.
It is a central tenet of all Computational Theory of Mind (CTM) models, including TDE, that the following statements are true-
1. brains are hardware
2. minds are software
The success of the TDE theory lies in its high level of agreement with the data from peer-reviewed empirical studies. This success then reflects back positively upon the central tenets of the theory, ie it tends to lend support to these CTM-derived statements. The TDE is not the only CTM model which shows high degree of (qualitative as well as quantitative) agreement with empirical data- ACT-R (Adaptive Control of Thought - Rational) is a cognitive architecture  which also presents the mind as a so-called 'production system'(PS). The key difference between ACT-R and TDE is that ACT-R assumes the entire mind, both conscious and non-conscious aspects, uses a PS as the basis of its 'processor', while the TDE only posits a PS as the correct model for the cerebellar (unconscious, automatic) mind.
The TDE uses a state-vector array of teleogical (feedback) 'loops' to think consciously, each loop controlling the movement (physical or virtual) of one affine feature. All the loops are linked by a shared state-vector array called a Situation Image, which in turn is divided into two parts- a self-oriented part and a world-oriented part. Because all the loops are linked by a common goal, consciousness is a serial computation. In effect, consciousness is a single multi-threaded cybernetic loop. Once these individual threads become sufficiently well-rehearsed, they 'know their own goals'. Their dynamics becomes self-managed, and their underlying constituent processes are 'paged out' to the cerebellum.
Data representation in computers
It best serves our purpose to first analyse computers. Conventional computers represent reasoning as boolean variables, and reality as floating point numbers. In between the computer hardware layer below and the software architecture  above, there is a thin layer of arithmetic, mathematics if you like, in which these numbers dwell. There is a dictum that applies to all complex computational machines- only use discrete representations. Binary is such a useful representation because, as it is, the '1's and '0's can be economically used to represent 'true' and 'false' with one binary digit (bit). To represent floating point numbers, which do the 'heavy lifting' when it comes to representing reality, two discrete numbers are stored, the mantissa and the exponent.
The crunch with data structures comes with the balance between useability and memory efficiency. For example dynamic arrays are gaining increasing popularity. A form of dynamic array which has optimum memory usage patterns is the association list. A popular implementation is the Hashed Array Table or HAT. The example shown is the representation of the floating point number 4.22 using a HAT.
In the early development of Lisp, association lists (A-lists) were used to resolve references to free variables in procedures. A-lists employ a type of memoization, ie an enumeration of all the possible values that a variable, or part of a variable, can adopt. They are a type of map.
The point made by this diagram may be an abstract one, but it should not be overlooked by the reader- note the use of a discrete (multi-level) representation, even when the variable being represented is an analog one (a float).The representation of a float (a number type whose native floating precision makes it error-prone) is made into a form which uses only integers ie 4x10+2x10^(-1) + 2x10^(-2), and which is therefore as error-free as any other number type used. This representation is to be contrasted with the usual computer practice of storing a mantissa and an exponent. The key is the use of a 'root alphabet' to represent all entities in this part of the system. The root alphabet used for decimal numbers is the set of positive integers (0..9], which are numerical symbols whose semantics are tightly defined. Numbers of arbitrary precision can be accurately represented by integer trees of arbitrarily deep branching. Floating point representations, for example, which have a mathematical rather than linguistic basis, usually contain some (small but implementation dependent) error.
The importance of 'pose' and robotics
In evolution, our brains evolved as the robotic controllers of a physical body. Therefore, it is to the sub-discipline of robotics that we should go when looking for a suitable candidate neural code. In robotics, the word 'pose' is often used. It means the combined position and orientation of the robot. Mathematically, we could use two points to do the same job, but there is much less meaning inherent in the two-point representation, compared to pose. (Pose is a dyad, a tensor of order 2 and rank 1).
Affine Geometry separates absolute position (place) from relative position (shape)
Robots theoretically live in a 3D world (our world) but in reality, most robots are wheeled vehicles and are constrained to move in a local 2D environment. To represent 3D position and rotation, a 3x3 matrix called a tensor is required. However, this is not enough to cope with translation. Why would we care? Surely robots move around all the time.
Well, that is precisely the point- they are a subjective frame of reference, which means that most of the computations needed to be done by their brains must use a moving 'zero point'. Somehow, we need to get rid of extraneous variables, and as elegantly as possible. The solution that has been found is called 'affine' geometry. At first, it sounds impossible- it is geometry without the notion of absolute position. There is another way of putting it- it is a specialised geometry only concerned with (statically) 'shapes' and/or (dynamically) 'differences' in position.This is exactly what the doctor ordered- any 'affine' computations performed by the robot's brain will apply no matter what the absolute frame of reference. If any geometry is a prime candidate for that used by our brains, it is affine maths.
Conveniently, the means of representation of affine variables is a system called 'homogenous' (ie "everywhere the same") coordinates, abbreviated as HC. Using HC's, planar robotics (eg wheeled vehicles or walking people) reduce to 3x2 matrices. This just represents position and rotation within the plane, and nothing else. This 'pose cell' is a very economical way of storing planar robotic data.
In 3D affine space, the situation gets even better! Using homogeneous coordinates, we can represent ALL affine transformations within the same, single 4x4 matrix. Translation in 3 dimensions is represented by a shear in 4 dimensions. We do not need to add matrices, as is the case with non-homogenous 3x3 matrices, to cope with translations or shear transformations.We can use (n+1) dimensional matrix multiplications, which can be concatenated, like lines of code in a computer program. The physical meaning is the 'chaining' or concatenation of relative (not absolute) 3D transformations, which are commutative (ie order is unimportant) in many important situations.
OK, so we represent 'places' (a word that we choose to mean 'subjective locations) robotically, as affine points. What kind of neural apparatus will do this job? The diagram below shows a neural network in which orientation and position are represented separately. It has other advantages which will be discussed later. Note that the neurode model has a threshold bias of T, and so its normalised equivalent is 1 (unity). This makes it the physically realised equivalent of the homogenous coordinate matrix for a point in projective space. In the diagram, a 'table top' is represented neurally as an intersection between four edge lines. Note that there is an extra, affine, dimension involved. You could think of it as representing the 'visibility' variable. If, for a given edge, ax +by > T, then area is visible (or invisible, depending on the sense of the inequality in T).
Affine Feature Maps
The TDE system demonstrates the use of affine geometry as a means of representing the salient features of reality as lines, not points (see ). Feature maps don't just provide a means of understanding reality, they provide a way of composing it. As noted by Barnden (1982) , "It is these properties of the drawing which allow one to plan the positions...so that no two pieces..occupy the same space", a function he calls 'spatially indexed'. Using a feature map allows one to quickly and efficiently traverse the scene in a 'high level' sense, permitting the attention to jump from feature to feature.
Features are points in the feature map, but lines representing the edges of discretionary half-planes in a conceptual, or abstract 'predicate space'. According to Gaerdenfors, a conceptual space can be defined as a collection of one or more 'quality' (presumably qualitative, rather than quantitative) dimensions. Gaerdenfors' diagram and original text are reprinted here, to indicate that his idea and the idea underlying the TDE predicate (that of a half-plane) are the same. This idea is the key to semantic grounding.
The dimensions of a conceptual space are not necessarily orthogonal. In the general case, they are not independent entities, but they are correlated in various ways since the properties of the objects modeled in the space co-vary. For example, the density and the color dimensions co-vary in the space of metals. The idea of a linear combination of predicate dimensions can be built up, as in the example of the tabletop. Once a feature is grounded, it can be grouped together in a class of other like features. This class is a symbol (see C.S. Pierce). Symbol manipulation works because it gets grounded at the final intantiation phase - it is in the compilation of feature classes that the meaning gets 'built in'.
PART 2 - NEURAL PLASTICITY
The literature on neural mechanisms seems to be divided into two broad schools of opinion-
1. those which assume LTP (or long-term depression LTD) and proceed accordingly, because every significant ANN model also makes the same implicit but unfounded assumption (the majority is never wrong!). Assumption is not evidence, needless to say.
2. those who try to find proof for the bioplausibility of some other neural signalling mechanism, eg time-based integration or differentiation of analog depolarisation waveforms, eg so called 'pulse' coded theories.
There is still no real evidence for any kind of computational-based synaptic variation at all, especially in the cerebrum, in spite of decades of well-funded, though ill-informed investigation. There is tentative evidence for the existence for some kind of LTD in the granule cell-parallel fibre-purkinje cell circuit, but it is not conclusive by any means, and is not so strong as to rule out error in the concept or the data. This is because significant doubt still exists concerning the detailed nature of the granule and climbing fibre circuit functions. Most researchers agree that the climbing fibre provides a 'teaching' signal, a discrete reinforcer for a set of parameters.
Counter Argument based on viability of fixed 'alphabetic' encoding
In fact, there are strong compelling reasons why precisely the converse is true - ie there are many strong arguments against 'plastic' or analog variation of synaptic efficacies in neural nets as a way to encode information. In TDE theory, neural nets encode information in the same way language does, by using fixed sets of 'alphabet' values with which to build up more detailed semantic pictures (see the previous 'decade' example). This tried-and-true linguistic encoding hierarchy is based upon Marr's tri-layer (called Marr's Prism in TDE documents). The bottom layer of Marr's Prism consists of an 'alphabet' of fixed semantic values, whose actual form depends on the domain, of course.
These sets of fixed values are adjusted in infancy to suit an individual's adaptational environment, much as vertical and horizontal edges in the world adaptively enable the newborn kitten's ability to correctly interpret linear visual intensity gradients as real-life steps, tabletop edges, and other potentially dangerous ground features.(see diagram above)
Counter Argument based on lack of available time
Further evidence against the viability of load-based synapse variation concerns the insufficient time intervals involved. So called 'snapshot' memory, eg the memorisation of a single image, a face, number or word, occupies a very small time,a few seconds at most, under the right conditions of high arousal and focussed attention. There are no known synaptic mechanisms which can produce the required long-term synaptic efficacy changes in such a short time. Even the fastest synapses need a timescale of minutes to hours to create permanent changes ( eg in response to Hebb's type simultaneous pre- and post-synaptic activation). To change synaptic efficacy significantly enough, many (possibly hundreds of ) copies of new receptor molecules must be synthesized in the nucleus, then moved out to the membrane. It just can't happen in the time available.
To counter this protest, synaptic variation theorists employ the use of 'memory consolidation' mechanisms, in which important 'snapshot' memories are somehow buffered or sequestered in a 'holding area' until the permanent memory is fully-formed, in the same manner as a silver-emulsion based photographic 'positive' must spend several minutes being 'developed' from the exposed 'negative', usually with the assistance a tray of liquid developer solution.
While currently plausible as a counter to the time argument in the cerebellum (currently being analysed), this mechanism still suffers from main problem of all non-discrete coding methods- the accumulation of errors due to successive per-stage amplification of analog noise.
Generalised Linguistic Model (GLM)
Because of the compelling nature of the arguments presented above, there is really only one conclusion remaining- this is that the general linguistic model (GLM), ie the idea of a language made from a finite resources (ie fixed alphabet), but with infinite combinatorial possibilities at a higher, syntactic level, is the only viable model even for a neural network. In existing terminology, this means sub-symbolic computation, in which real-time variation in analog neuro-synaptic weights is used to encode numerical and logical quantities, is not a realistic option. In the simplest animals (eg Aplysia), an exception to this rule seems to apply, but it would not surprise me at all if this work (by Eric Kandel) was later found to be flawed.
A population of neurons is produced in infancy, and selectively pruned by mechanisms similar to excitotoxicity (to cull the overactive ones which fire too much, all the time) and glial neglect (to cull the underactive ones which never fire, no matter what situation). The result is a network in which some of the neurons fire some of the time, spread as efficiently as possible over all common situations. Once this has been achieved, the result is an 'evenly coded' network, in which the same combinations of cells reliably code for the same situations. As in the retina, mutually inhibitory networks isolate individual neurodes and reduce error rates.
In such a network, temporary and permanent memory mechanisms are identical in principle, the longer-term memories occurring by virtue of a repeated activation pattern (ie a set of coded neurodes) being 'switched-in'. The analog changes in synapses are now not time-consuming multi-cycle iterations required to carry many bits of error-reduced data, but just a simple set of switching bits which can be switched fast because they must only respond to a simple question - "do I switch the neural code block I am addressing in or out for a given situation?". This is essentially the same conceptual set-up that Claude Shannon used in his seminal paper on information theory . Shannon was faced with the same problem nature has- how to send signal without noise. Shannon
showed a better way than increasing signal strength to avoid errors without wasting so much energy and time : coding.
The structure known as Marr's tri-layer, or, as I prefer to call it, Marr's Prism, makes the same point as the GLM, and is depicted in the diagram below-
1. if ever there was a word with multiple problematic meanings, it is 'architecture' (in the information context, that is). I like to think of information as 'sand' and architecture as a sandcastle, but each reader should use whatever works best for them.
2. One decade is a factor of 10 difference between two numbers (an order of magnitude difference) usually, but not always, measured on a logarithmic scale. In this case, an indexed (linked) list is used for each decade.
3. see Schulz, R., Stockwell, P., Wakabayashi, M. & Wiles, J. (2006) 'Towards a Spatial Language for Mobile robots' SITEE Uni. Queensland, Australia.
4. the key difference is between stress (the force per unit area) and strain (the elongation per unit length, due to the application of the force). The former is a cause, the latter its effect.
5. SSRI's are drugs which selectively inhibit the re-uptake rate of serotonin (5-HT) at some types of serotonin-gated post-synaptic interneurons. Increasing SSRI dose causes more 5-HT to remain within the synaptic region, and results in a higher base rate of neural firing.
6. Wiesel TN, Hubel DH. (1963) 'Effects of visual deprivation on morphology and physiology of cells in the cat's lateral geniculate body'. J Neurophysiol 26: 978–993.
7. Shannon, C.E. (1948) A Mathematical Theory of Information. The Bell System Technical Journal,
Vol. 27, pp. 379–423, 623–656
8. Barnden, J.A. (1982). A Continuum of Diagrammatic Data Structures in Human Cognition
9. P.V.C. Hough (1959) Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation.
10. Gärdenfors P. (2000): Conceptual Spaces: On the Geometry of Thought, MIT. Press, Cambridge, MA
------------------------------ Copyright 2013 Charles Dyer------------------------------