Informática 35 (2011)407-417 407
Distributed Representations Based on Geometric Algebra: The Continuous Model
Agnieszka Patyk-Lonska, Marek Czachor Gdansk University of Technology ul. Narutowicza 11/12, Gdansk 80-233, Poland E-mail: {patyk, mczachor}@pg.gda.pl, http://www.pg.gda.pl/ patyk http://www.mif.pg.gda.pl/kft/czachor.html
Diederik Aerts
Centrum Leo Apostel (CLEA), Vrije Universiteit Brüssel Krijgskundestraat 33, 1160 Brussels, Belgium E-mail: diraerts@vub.ac.be http://www.vub.ac.be/CLEA/aerts/
Keywords: distributed representation of data, geometric algebra, HRR, BSC, scaling Received: October 23, 2011
Authors revise the concept of a distributed representation of data as well as two previously developed models: Holographic Reduced Representation (HRR) and Binary Spatter Codes (BSC). A Geometric Analogue (GAc — "c" stands for continuous as opposed to its discrete version) of HRR is introduced - it employs role-filler binding based on geometric products. Atomic objects are real-valued vectors in n-dimensional Euclidean space while complex data structures belong to a hierarchy of multivectors. The paperreports on a test aimed at comparison of GAc with HRR and BSC. The test is analogous to the one proposed by Tony Plate in the mid 90s. We repeat Plate's test on GAc and compare the results with the original HRR and BSC — we concentrate on comparison of recognition percentage for the three models for comparable data size, rather than on the time taken to achieve high percentage. Results show that the best models for storing and recognizing multiple similar structures are GAc and BSC with recognition percentage highly above 90. The paper ends with remarks on perspective applications of geometric algebra to quantum algorithms.
Povzetek: Članek se ukvarja s porazdeljeno predstavitvijo podatkov, ki uporablja geometrijsko algrebro.
1 Introduction
Distributed representations of data are very different from traditional structures (e.g. trees, lists) and complex structures bare little resemblance to their components, therefore great care must be taken when composing or decomposing a complex structure. The most widely used definition of a distributed representation is due to Hinton et al. [13]. In a distributed representation of data each concept is represented over a number of units and each unit participates in the representation of some number of concepts. The size of a distributed representation is usually fixed and the units have either binary or continuous-space values. In most distributed representations only the overall pattern of activated units has a meaning.
Let us consider an example of storing the following in-
This paper is based on A. Patyk-Lonska, M. Czachor and D. Aerts Some tests on geometric analogues of Holographic Reduced Representations and Binary Spatter Codes published in the proceedings of the 1st International Workshop on Advances in Semantic Information Retrieval (part of the FedCSIS'2011 conference).
formation: "Fido bit Pat". The action in this statement is bite and the features (i.e. roles) of this action are an agent and an object, denoted biteagt and biteobj, while their fillers are Fido and Pat respectively. If we consider storing the way that the action is performed, we can add a third feature (role), e.g. biteway. If we store Fido, Pat, biteagt and biteobj as vectors, we are able to encode "Fido bit Pat" as
biteagt * Fido + biteobj * Pat.
The operation of binding, denoted by "*", takes two vectors and produces another vector, often called a chunk of a sentence. It would be ideal for the resulting vector not to be similar to the original vectors but to have the same dimensions as the original vectors. Superposition, denoted by "+", is an operation that takes any number of vectors and creates another one that is similar to the original vectors. Usually, the superimposed vectors are already the result of the binding operation.
Irrespectively of the mathematical model, the above operations are defined in a way that allows to build complex
408 Informática 35 (2011)407-417
A. Patyk-Lonska et al.
statements, such as "John saw Fido bit Pat":
John * seeagt + (biteagt * Fido + biteobj * Pat) * seeobj.
In order to decode information, we have to use the operation of unbinding — it is the inverse (an exact inverse or a pseudo-inverse) of binding enabling us to extract an information from a complex statement, provided that we have one of the bound vectors or a very similar vector as a cue. Marking the unbinding operation by "ft" we obtain the following answer to "Who bit Pat?":
(biteagt * Fido + biteobj * Pat) ft biteagt = Fido'.
We cannot definitely say that the resulting vector Fido' will be an exact copy of Fido, as even an optimal scheme will generate a considerable amount of noise. Since we cannot expect that a noisy decoded information will be identical to what was encoded, we have to rely heavily on various similarity measures — they vary mostly by time taken by computation and the accuracy.
Clean-up memory is an auto-associative collection of all atomic objects and complex statements produced by the system. Given a noisy extracted vector such structure must be able to recall the most similar item stored or indicate, that no matching object had been found.
Independently of the scheme considered, any representation should possess the following qualities
-	composition and decomposition — rules of composition and decomposition must be applicable to all elements of the domain, irrespectively of the degree of complication of a given element. Further, decomposition should support structure-sensitive processing.
-	fixed size — structures of different degree of complication should take up the same amount of space in order to facilitate generalization. In the GAC model this feature has been given up. Still, structures of different complexity will be of the same type.
-	similarity — the representation scheme should provide a quick way to compute similarity between analogous structures (e.g. Fido bit Pat Smith and Fido bit John).
-	noise reduction — decomposed statements should resemble their original counterpart.
-	productivity — the model should be able to construct complex nested structures using a set of only few rules.
As far as previously developed models are concerned, Holographic Reduced Representations (HRR), Binary Spatter Codes (BSC), and Associative-Projective Neural Networks (APNN) are distributed representations of cognitive structures where binding of role-filler codevectors
maintains predetermined data size. In HRR [23] binding is performed by means of circular convolution
n— 1
(x © y)j = xkVj — k mod n .
k=0
of real n-tuples or, in 'frequency domain', by componentwise multiplication of (complex) n-tuples,
(xi ,...,Xn) © (V1, . . . ,Vn) = (X1V1,	XnVn).
Bound n-tuples are superposed by addition, and unbinding is performed by an approximate inverse. A dual formalism, where real data are bound by componentwise multiplication, was discussed by Gayler [9]. In BSC [14, 15] one works with binary n-tuples, bound by componentwise addition mod 2,
(X1, ...,Xn) © (V1, ...,Vn) = = (X1 © V1,...,Xn © Vn),
Xj © Vj = Xj + yj mod 2,	(1)
and superposed by pointwise majority-rule addition; unbinding is performed by the same operation as binding. APNN, introduced and further developed by Kussul [16] and his collaborators [17], employ binding and superposition realized by a context-dependent thinning and bitwise disjunction, respectively. As opposed to HRR and BSC, APNN do not require an unbinding procedure to retrieve component codevectors from their bindings. A detailed comparison of HRR, BSC and APNN can be found in [24].
2 Geometric Algebra
One often reads that the above models represent data by vectors, which is not exactly true. Given two vectors one does not know how to perform, say, their convolution or componentwise multiplication since the result depends on basis that defines the components. Basis must be fixed in advance since otherwise all the above operations become ambiguous. It follows that neither of the above reduced representations can be given a true and meaningful geometric interpretation. Geometric analogues of HRR [5] can be constructed if one defines binding by the geometric product, a notion introduced in 19th century works of Grassmann [11] and Clifford [8].
The fact that a geometric analogue of HRR is intrinsically geometric may be important for various conceptual reasons — for example, the rules of geometric algebra may be regarded as a mathematical formalization of the process of understanding geometry. The use of geometric algebra distributed representations has been inspired by a well-known fact, that most people think in pictures, i.e. two-and three-dimensional shapes, not by using sequences of ones and zeroes. Mere strings of bits are not meaningful to (most) humans, no matter how technically advanced they are.
DISTRIBUTED REPRESENTATIONS ...
Informatica 35 (2011)407-417 409
In order to grasp the main ideas behind a geometric analogue of HRR let us consider an orthonormal basis b1,...,bn in some n-dimensional Euclidean space. Now consider two vectors x = E fc=i xk bk and y =
ELi yk bk. The scalar
x ■ y = y ■ x is known as the inner product. The bivector
x A y = —y A x
is the outer product and may be regarded as an oriented plane segment (alternative interpretations are also possible, cf. [7]). 1 is the identity of the algebra. The geometric product of x and y then reads
the most important difference between geometric and convolution algebras. Geometric products of different basis vectors
bki...kj = bki ... bkj,
k1 < ■ ■ ■ < kj, are called basis blades (or just blades). In n-dimensional Euclidean space there are 2n different blades. This can be seen as follows. Let {x1,..., xn} be a sequence of bits. Blades in an n-dimensional space can be written as
c	— bxi bxn
cxi...xn = b1 .. .bn
where bk = 1, which shows that blades are in a one-to-one relation with n-bit numbers. A general multivector is a linear combination of blades,
xy =
^ xk yk 1 + y"!(xfc yi _ ykxi)bkbi.
k=i
k<i
xy	xAy
Grassmann and Clifford introduced geometric product by means of the basis-independent formula involving the mul-tivector
xy = x ■ y + x A y	(2)
which implies the so-called Clifford algebra
bk bi + bi bk = 2Ski 1.
when restricted to an orthonormal basis. Inner and outer product can be defined directly from xy:
x ■ y x A y
1(xy + yx), 1(xy _ yx).
The most ingenious element of (2) is that it adds two apparently different objects, a scalar and a plane element, an operation analogous to addition of real and imaginary parts of a complex number. Geometric product for vectors x, y, 2 can be axiomatically defined by the following rules:
(xy)z x(y + z) (x + y)z
x(yz),
xy + xz,
xz + yz, |2
|x|2

^ ]	Xl...Xn CX1...X
(3)
xi ...xn = 0
with real or complex coefficients 'Xi...Xn. Clifford algebra implies that
(_1)£k<i ykXi c(X
i)®{yi...yn),
(4)
where \x \ is a positive scalar called the magnitude of x. The rules imply that x ■ y must be a scalar since
xy + yx = \x + y\2 — \x\2 — \y\2.
Geometric algebra allows us to speak of inverses of vectors: x-1 = x/\x\2. x is invertible (i.e. possesses an inverse) if its magnitude is nonzero. Geometric product of an arbitrary number of invertible vectors is also invertible. The possibility of inverting all nonzero-magnitude vectors is perhaps
where © is given by (1). Multiplication of two basis blades is thus, up to a sign, in a one-to-one relation with exclusive alternative of two binary n-tuples. Accordingly, (4) is a projective representation of the group of binary n-tuples with addition modulo 2.
The GAC model is based on binding defined by geometric product (4) of blades while superposition is just addition of blades (3). The discrete GAd [19] is a version of the GAC model obtained if 'Xi...Xn in (3) equal ±1. The first recognition tests of GAd, as compared to HRR and BSC, were described in [19]. In the present paper we go further and compare HRR and BSC with GAC, a version employing "projected products" [5] and arbitrary real 'Xi ..Xn. We also repeat Plate's scaling test ([22], [23] - Appendix I) and compare test results for GAC, HRR and BSC models.
Throughout this paper we shall use the following notation: "*" denotes binding roles and fillers by means of the geometric product and "+" denotes the superposition of sentence chunks. Additionally, "©" will denote binding performed by circular convolution used in the HRR model and a* denotes the involution of a HRR vector a. A "+"' in the superscript of x+ denotes the operation of reversing a blade or a multivector x: (bki,„kj )+ = bkj . ..bki. Asking a question (i.e. decoding) will be denoted with "ft". The size of a (multi)vector means the number of memory cells it occupies in computer's memory, while the magnitude of a (multi)vector V = {v1,... ,vn} is its Euclidean norm
VEiU v't.
For our purposes it is important that geometric calculus allows us to define in a very systematic fashion a hierarchy of associative, non-commutative, and invertible operations that can be performed on 2n-tuples. The resulting

c
n cyi...yn
Xi...X
2

x
410 Informática 35 (2011)407-417
A. Patyk-Lonska et al.
superpositions are less noisy than the ones based on convolutions, say. Such operations are in general unknown to a wider audience, which explains popularity of tensor and convolution algebras. Geometric product preserves dimensionality at the level 2"-dimensional multivectors, where n is the number of bits indexing basis vectors. Moreover, all nonzero vectors are invertible with respect to geometric product, a property absent for convolutions and important for unbinding and recognition. A detailed analysis of links between GAC, HRR and BSC can be found in [5]. In particular, it is shown that both GAC model and BSC are based on two different representations (in group theoretical sense) of the additive group of binary n-tuples with addition modulo 2. Actually, the latter observation was the starting point for studying geometric algebra forms of reduced representations [6].
3 The GAC Model
Multivector (3) associated with n-dimensional Euclidean space can be represented by the 2n-tuple ...on, • • •,	). Geometric product of two such
2n-tuples is again a 2n-tuple. In this sense geometric product is analogous to bindings employed in HRR or BSC, but we can still proceed in several inequivalent ways. For example, since a product of two basis blades is again a basis blade multiplied by ±1, we can require that ^Xl...Xn = ±1. Such a discrete version of GA HRR was tested vs. HRR and BSC in [19], and will be denoted here by GAd (discrete GA HRR).
The continuous GAC model differs greatly from GAd. First of all, we do not begin with a general 2"-dimensional multivector. Atomic objects are real-valued vectors in n-dimensional Euclidean space, in practice represented by n-tuples of components taken in some basis. A hierarchy of multivectors is reserved for complex statements, formed by binding and superposition of atomic objects. An n-dimensional vector, when seen from the multivector perspective, is a highly sparse 2"-tuple: Only n out of 2" components can be nonzero.
The procedure we employ was suggested in [5]. The space of 2"-tuples is split into subspaces corresponding to scalars (0-vectors), vectors (1-vectors), bivectors (2-
vectors), and so on. At the bottom of the hierarchy lay
1
vectors V e R", having rank 1 and being denoted as V. An object of rank 2 is created by multiplying two elements of rank 1 with the help of the geometric product. Let
1 1 3
V = {a1, a2, a3} and W = [P1 ,p2,p3} be vectors in R3.
A multivector X of rank 2 in ! elements (cf. [18])
2 1 1	«1		A"
X=V^=	«2		
	_«3_		A
comprises the following
+ «2^2 + «3^3 «1^2 - «2^1 «1^3 - «3^1 «2^3 - «3^2
the first entry in the array on the right being a scalar and the remaining three entries being 2-blades. For arbitrary vectors in R" we would have obtained one scalar (or, more
conviniently: ( ^ j scalars) and ( ^
2-blades.
2 1 Let X = |71,72,73,74} and V = {a1, a2,a3} be twomul-
3
tivectors in R3. A multivector Z of rank 3 in R3 may be
1
created in two ways: as a result of multiplying either V by
2 2 1
X or X by V. Let us concentrate on the first case
3 1 2 Z=VX=
«1 «2 «3
71	
72	
73	
74.	
«171 - "272 - «373 «172 + «271 - «374 «173 + «274 + «371 «174 - «273 + «372
Here, the first three entries in the resulting matrix are 1-blades, while the last entry is a 3-blade. For arbitrary multi-
vectors of rank 1 and 2 in 1" we would have obtained
vectors and trivectors. We cannot generate multivectors of rank higher than 3 in R3, but it is easy to check that in spaces R)>3 a multivector o) rank 4 would have ^^
scalars, ^^ bivectors and ^^ 4-blades. The number of
fc-blades in a multivector of rank r is described by Table 1. It becomes clear that a multivector of rank r over R" is ac-
L 2 J
2i + r mod 2
-dimensional
tually a vector over a space.
As an example let us consider the following roles and fillers being normalized vectors drawn randomly from Rn with Gaussian distribution N(0,1)
Pat male 66
{«1, ■ ■ ■ , an}, {bl, .. ., bn}, {ci , . . . , cn} ,
{ x 1 , ■ ■ ■ , xn},
{y1, ■ ■ ■ , yn},
{z1 , ■ ■ ■ , zn}.
PSmith, who is a 66 year old male named Pat, is created by first multiplying roles and fillers with the help of the geometric product
PSmith =
= name * Pat + sex * male + age * 66 = name ■ Pat + name A Pat + sex ■ male + sex A male + age ■ 66 + age A 66
^n=1(aixi + biyi + cizi) a1x2 — a2x1 + b1y2 — b2y1 + c1z2 — c2z1
a1x3 - a3x1 + b1y3 - b3y1 + c1z3 - c3z1
an—1xn — anxn— 1 +
— bnVn — 1 + cn—1zn — cnzn — 1
= [do, di2, ¿13, .. . ,d(
n — 1)"J
= do + di2ei2 + di3ei3 + • • • + d(
where ei,... ,en are orthonormal basis blades. In order to be decoded as much correctly as possible, PSmith
n
1

DISTRIBUTED REPRESENTATIONS ...
Informatica 35 (2011)407-417 411
Table 1: Numbers of /. -blades in multivectors of various ranks in Rn
rank	scalars	vectors	bivectors	trivectors	4-blades..		a size	
1	0	1	0	0	0..	. dat(	J)	
2	(0)	0	( 2)	0	0..			2)))
3	0 ( n \	( 1)	0 ( n \	(3)	0.. ( n )		+( + ^	
4	w	0	2	0	fi)	. <0) +	,2j	n + 4
2r	( 0.)	0	( 2)	0	( 4.)..		0 (	2i)>
2r +1	0	( 1)	0	(3)	0..	■ ^1=0	u	+1)
should have the same magnitude as vectors representing atomic objects, therefore it needs to be normalized. Finally, PSmith takes the form of
PSmith = [do,di2,di3, .. . ,d(n-i)n]T,
where di = , ( di .
V ¿^ = 0,12 dj
PSmith is now a multivector of rank 2. The decoding operation
name+ PSmith
= name+(name ■ Pat + name A Pat +sex ■ male + sex A male + age ■ 66 +age A 66)
will produce a multivector of rank 3 consisting of vectors and trivectors. However, the original Pat did not contain any trivector components — they all belong to the noise part and the only interesting blades in name+PSmith are vectors. The expected answer is a vector, therefore there is no point in calculating the whole multi-vector name+ PSmith and only then comparing it with items stored in the clean-up memory. To be efficient, one should generate only the vector-part while computing name+ PSmith and skip the noisy trivectors.
Let {)k denote the projection of a multivector on /-blades. To decode PSmith's name we need to compute
{name+ PSmith) i
= name+namePat + { name+ (name A Pat +sex ■ male + sex A male + age ■ 66 +age A 66) )i = Pat + noise = Pat'.
The resulting Pat' will still be noisy, but to a lesser degree than it would have been if the trivectors were present.
Formally, we are using a map *1,2 that transforms a mul-tivector of rank 1 (i.e. an n-tuple) and a multivector of rank 2 (i.e. a (1 +	) -tuple) into a multivector of rank
1 without computing the unnecessary blades. Let X be a multivector of rank 2
X = {X)o + {X)2 = xo + ximeiem,
l<m
where xlm = -xml. If A = (Al7... ,An) is a decoding vector (actually, an inverse of a role vector), then
A *1,2 X = xoA + Aiximem
l,m
= 53 (xAk + Axlk) eu k l
= Y, Yk eu = Y,, k
with Y = (Y1,... ,Yn) being an n-tuple, i.e. a multivector of rank 1. More explicitly,
k1
Yu = (A *1,2 X)u = xoAu + Y Alxlk -
l=i
n
E
l=k+1
Alxul.
The map *12 is an example of a projected product, introduced in [5], reconstructing the vector part of AX without computing the unnecessary parts. The projected product is basis independent, as opposed to circular convolutions. In general, *mk transforms the geometric product of two mull	k	m
tivectors A and B into a multivector C.
We now need to compare Pat' with other items stored in the clean-up memory using the dot product, and since Pat' is a vector, we need to compare only the vector part.
That means, if the clean-up memory contained a multivec-
2i+l
tor M of an odd rank, we would also need to compute
2i+l
Pat' ■ { M ) 1 while searching for the right answer.
This method of decoding suggests that items stored in the clean-up memory should hold information about their ranks, which is dangerously close to employing fixed data slots present in localist architectures. However, a rank of a clean-up memory item can be "guessed" from its size. In a distributed model we also should not "know" for sure how many parts the projected product should reject, but it can certainly reject parts spanned by blades of highest grades. Unfortunately, since the geometric product is non-commutative, questions concerning roles and fillers need to be asked on different sides of a sentence, forcing atomic objects to hold information on whether they are roles or fillers and thus, forcing them to be partly hand-generated. We can either ask question always on the same side of a sentence and be satisfied with less precise answers or always ask
412 Informática 35 (2011)407-417
A. Patyk-Lonska et al.
about only the roles or only the fillers. It becomes clear, that recognition based on the hierarchy of multivectors and the projected product is best applicable to tasks in which questions need to be asked only on one side of the sentence or in which sentences have predetermined structure.
Before providing formulas for encoding and decoding a complex statement we need to introduce additional notation for the projected product and the projection. We have already introduced the projected product *mk transforming the geometric product of two multivectors of ranks l and k into a multivector of rank m. This will not always be the case for complex statements, since we can produce a multivector that will not be of any given rank. Let *T{ai a2 ctk} denote the projected product transforming
i
the geometric product of a multivector A and a multivector B containing «i-blades, a2-blades,...and ak-blades
m
into a multivector C. In this way, the projected product *12 may be written down as *1{01}. By analogy, let {••){a1,a2,...,ak} denote the projection of a multivector on components spanned by a1-blades, a2-blades,.. .and am-blades.
Let ^ denote the normalized multivector encoding the sentence "Fido bit PSmith", i.e.
^ = biteagt * Fido + biteobj * PPSrmith .
rank 2
rank 2
rank 3
^ = b-0-o
a scalar
+ 7: Piei + £ Yijeij i=1 1=i<j
bivectors
+ £ j Cijk. 1=i<j<k
trivectors
4
More Examples of Encoding and Decoding Sentences
The following examples illustrate various ways of asking questions in the GAC architecture.
Let biteobj = {y1,... ,yn}, PSmith' will then have the form
PSmith'
(y1e1 +-----+ yne n) *1,{0,1,2,
nn
(T: Piei + £ Sijk eijk) 1=i<j<k
3}
i=1
n
n
= J2 ykpk + J2 0 b=1 b=i<j
a scalar
ij ij
bivectors
where
0ij = yiPj - yjPi + £ yAjt
Multivector tf will contain scalars, vectors, bivectors and trivectors and can be written down as the following vector
of dimension ^3=0 ^
t=i
t^ii.j}
with Sijt = Stij = -5itj. As previously, PSmith' should be compared with appropriate items from the cleanup memory to produce the most probable answer.
"What happened to PSmith?"
Asking about roles poses a problem of inverting a (multi)vector. Since not all multivectors are invertible, we have to be satisfied with reverses [5] of multivectors. We will need another type of a projected product: let
*r{ai a2 ai} k denote the projected product transforming
the geometric product of a multivector B containing a1-
k
blades, a2-blades,...and a;-blades and a multivector A
m
into a multivector C. The answer to our question will be a vector
tf $ PSmith = {tfPSmith+)1
= *{0,1,2,3},2 PSmith+ = bite'obj « biteobj ■
If we denote PSmith as
PSmith = zo + Z1ie12 +-----+ Z(n-1)ne(n-1)n
then
bite'obj = (^2 Piei + £ Sijk eijk ) i=1 1=i<j<k
*{0,1,1,3},1
(Z0 - Yl Zij eij)
1=i<j
= C1e1 +-----+ Znen
where
"Who was bitten?"
The answer to that question will be a multivector of rank 2
tf $ biteobj = {bite+bjtf){o,2}
= bitetbj *2,{0,1,2,3}
= PSmith' « PSmith■
k-1
Zk = PkZ0 - £ PiZik + £ PiZki - SijkZ i=k+1
ij
i=1
1=i<j
with Sijk = Skij = -Sikj.
'What did Fido do?"
1
DISTRIBUTED REPRESENTATIONS ...
Informatica 35 (2011)407-417 413
The last question in this example will produce an answer having the form of a vector
* ftFido = (*Fido+)1
= * *{0,1,2,3},1 Fido+
biite'agi
bite.
agi-
If Fido = {v1,..., vn}, then
bite
agi
(a + 7i2ei2 +-----+ Y( n — 1)ne(n —l)nj
*{0,1,2,3},1(v1e1 + ' -$nen,
■+ Vnen)
where
k—1	n
■&k = avk YikVi + V" YkiVi.
= 1
i=k + 1
Those tedious calculations imply that the GAC model is best applicable to sentences having a similar or identical complexity structure, otherwise it may be hard to automatize the process of asking questions and retrieving answers. Because of this limitation, this construction seems to be a promising candidate for a holographic database.
5 Overview of Plate's Scaling Test
Plate [23] describes a simulation in which approximately 5000 HRR 512-dimensional vectors were stored in the clean-up memory. The purpose of his simulation was to study efficiency of the HRR model but also to provide a counterexample to the claim that role-filler representations do not permit one component of a relation to be retrieved given the others. We will repeat Plate's test on several models and compare the results. Let us consider the following atomic objects
numx (x times, plus,
result operand.
0,
, 2500),
fillers,
.
roles
At the beginning of the scaling test, relations concerning multiplication and addition are constructed. For example, "2 • 3 = 6" is constructed as
times2,3 = times + operand * (num-2 + num.3) +result * nume-
Generally, relations are constructed in the following way
timesxy = times + operand * (numx + numy) + result * numx.y, plusx,y = plus + operand * (numx + numy) + result * numx+y.
x and y range from 0 to 50 with y < x making a total of 2501 number vectors and 2652 instances of each timesxy and plusx,y. As one can notice, the same operand role is used for both x and y to preserve commutativity of multiplication and addition.
Plate writes, that a relation can be "looked up" by supplying enough information to distinguish a specific relation from others. For example, to look up "2 • 3 = 6" one needs to find the most similar relation R to any of the following fragmentary statements
(case 1) times + operand * num2
+operand * num3, (case 2) times + operand * num2
+result * nume, (case 3) times + operand * num.3
+result * nume, (case 4) operand * num2 + operand * num.3 +result * nume-
Retrieving the missing piece of information in the first three cases can be done by asking any of the subquestions
(case 1) (case 2) (case 3)
R ft result, R ft operand, Rft operand.
Case 4 is somewhat more problematic — to look up a missing relation name (times or plus) one needs to have a separate clean-up memory containing only relation names or to use an alternative encoding in which there is a role for relation names. We will alter Plate's test by using the latter method.
Plate states that he had tried one run of the system making a query for each component missing in every relation — this amounted to 10608 queries. A further 7956 queries had been made to decode the missing component except for the relation name. Plate goes on to claim, that the system made no errors.
There appear to be two misstatements in Plate's claims. Firstly, we cannot treat subquestions regarding cases 2 and 3 separately, as there are two equally probable answers to each of these subquestions, provided that relations R2 and R3 point correctly to timesx, y. Secondly, consider a fragmentary piece of information
times + operand * numo + result * numo.
In this situation, the missing component can be any of the numbers numx where x € {0,..., 50} and thus, there are 51 atomic objects that are equally probable to be the right answer. This suggests that Plate regards several answers as valid ones, as long as the similarity of these answers exceeds some threshold. To work out the missing component, one then needs to check which of those potential answers is not in the original set used for retrieval.
Such a method of investigating scaling properties has more than a few advantages:
414 Informática 35 (2011)407-417
A. Patyk-Lonska et al.
Inaccuracies mentioned above act as a test if all atomic objects are created and treated equally. Ideally, every atomic object of the numx form should be recognized as a correct answer to the "zero problem" for 100% of the time.
subquestions concerning components missing in F°Py obtained the following (sets of) answers
and
number of trials
51
-	Prime numbers greater than 100 do not appear in any of timesxy and plusx,y relations, therefore they test if the model is immune to garbage data.
-	Numbers ranging from num0 to num1oo may be constructed in a multitude of ways by addition (num0 by multiplication) and any given sentence chunk result * numz will appear quite often in the plusxy relation. Hence, this is a great way of checking if the model deals with excessive similarity of a number of complex statements.
-	Atomic objects bound with operand and result range in variety. On the other hand, there are just two atomic objects acting as an operation — does it affect in any way the recognition of operation filler? Indeed, it will be shown in Section 7 that recognition of the operation chunk turns out to be quite interesting depending on the choice of the architecture.
6 Notation
For the purpose of explaining test results, let us introduce the following notation. Let S*y and S+y denote relations
S* y = operation * times + operand * numx +
operand * numy + result * numxy, S+y = operation * plus + operand * numx + operand * numy + result * numx+y,
for y < x. We chose to use a separate role for a relation name to enable encoding the information given only operands and the result. Let F°px y denote fragmentary statements for i G {1,2,3,4} and op G {*, +}
Fox
1, x, y
F ox
F2, x ,y
F0,x
3 , x, y
Fox
4,x,y
S0xy — result * numx op y, S0Py — operand * numx, S0Py — operand * numy, S0Py — operation * op.
nox	= NQ
nox n2,x,y	= N(Q:
ox n3,x,y	= N(Q:
nox	= NQ
ox
ox
2,x,y
ox
3,x,y
ox
ft result),
If v is an element of the clean-up memory, then let N(v) denote the closest neighbor of v, i.e. an element of the clean-up memory that is most similar to v. If v has more than one neighbor, then all subquestions during the test are asked to all of v's neighbors. In HRR, GAd (with the Hamming measure of similarity) and GAC it is extremely unlikely for an element of the clean-up memory to have more than one neighbor due to the continuous nature of data in these architectures. Let Q0Pxy = N(F°px y) for i G {1,2,3,4} and op G {*, +}. During the test we asked
We assume that a missing component is identified correctly if it is the only neighbor to appropriate answer q°Px,y or it belongs to the set of neighbors of qopxy.
7 Test Results
The software for all tests was developed by A. Patyk-Loriska in Java language. All tests were performed on an ordinary PC with dualcore AMD processor with 2 GB RAM.
Tables 2 through 4 compare scaling test results for
-	GAC and HRR, both using dot-product as a similarity measure.
-	BSC using Hamming distance as a similarity measure.
Although BSC and HRR models need only n-dimensional vectors, this is not quite the case for and GAC, which needs 1 + nin-1) numbers to represent multivectors of rank 2 over Rn. We present recognitions test results close to 100% and comment on vector length required for each model to achieve such percentage. The real number of memory cells used up by each model is given in brackets in the table headings.
The answers to subquestions Q°px y ft operand and Q3Pxyy ft operand were considered to be correct if any of the two possible operands came up as the item most similar to those subquestions. In case of other questions and subquestions only exact answers were taken into consideration.
50 runs of the test were performed on each model. Unlike in Plate's test, x and y ranged from 0 to only 20. Hence, there are 401 number vectors and 462 relation vectors.
The "zero problem" is clearly visible in each tested model, as the recognition percentage of Ql,x, barely exceeds 90%. Nevertheless, Ql,x, y almost always contains at least one of the operands from the original sentence S*x,y since the recognition percentage of qlxy reaches 100% for sufficiently large data size. On the whole, the recognition percentage of ql , x , y and qlx, , x , y does not differ greatly from the recognition percentage of q+x y and q+x y in any model.
DISTRIBUTED REPRESENTATIONS ...	Informatica 35 (2011)407-417 415
Table 2: Recognition percentage for GAC. Questions	R10	R20	R30	R40
(46)	(191)	(436)	(781)
Ql x y	89.76%	99.98%	99.99%	100.0%
' ' q*hxyy_39.44%_95.28%_99.58%_99.88%
QlXiy	91.12%	99.73%	99.98%	100.0%
' ' qjxyy_36.24%_83.86%_97.92%_99.81%
Q*3xXy	83.97%	91.15%	91.33%	91.34%
' ' qjxxyy_41.27%_84.92%_98.05%A_99.82%A
Ql x y	98.90%	99.60%	99.63%	99.59%
' ' qlxyy_42.01%_95.56%_99.24%_99.52%
Q+xy	89.39%	99.99%	100.0%	100.0%
q+x v	39.09%	95.99%	99.76%	99.95%
_1 , x,y___
Q+xy	86.96%	99.59%	99.96%	100.0%
' ' q+x , y_35.32%_83.84%_97.97%_99.79%
Q+xv	87.00%	99.63%	99.96%	100.0%
' ' q+x , y_35.12%_83.84%_97.98%_99.79%
Q+xy 99.05% 99.53% 99.51% 99.54% ' ' q+x , y_45.84%_94.73%_99.14%_99.49%
Table 3: Recognition percentage for HRR. Questions	N = 200	N = 300	N = 400	N = 500 ~
Qixy	29.1%	27.06%	26.28%	28.51%
' ' q*uxv_31.08%a_30.03%A_30.30%a_32.23%A
Q2 x y	54.72%	52.06%	53.10%	53.32%
' ' qlx y_98.99%a_99.92%a_99.98%A_100.0%A
Q3xy	50.53%	47.93%	49.80%	51.21%
' ' qjxxv_98.92%a_99.90%a_99.97%A_100.0%A
Q4 x y	89.23%	90.56%	90.51%	90.29%
' ' qjxxv_90.28%a_92.69%a_92.42%A_92.31%A
Q+x y	28.26%	29.46%	28.03%	28.81%
1 x y
q+xv_27.32%_29.37%_28.02%_28.80%
Q+xV	53.91%	54.48%	55.26%	54.68%
' ' q+x , y_98.72%a_99.90%a_99.99%A_99.99%A
Q+XV	53.73%	55.23%	55.34%	54.62%
' ' q+x , y_98.67%a_99.91%a_99.98%A_100.0%A
Q+XV 98.70% 98.75% 98.66% 98.75% ' ' q+x v_97.16%_98.55%_98.64%_98.74%
416 Informática 35 (2011)407-417
A. Patyk-Lonska et al.
Table 4: Recognition percentage for BSC. Questions	N = 200	N = 300	N = 400	N = 500 ~
QfX~y	86.71%	91.65%	93.78%	94.74%
' ' q*UXyy_82.82%_90.62%_93.87%A_94.95%A
Q2 x y	94.42%	97.60%	99.03%	99.44%
' ' qlx y_99.68%a_99.97%a_99.98%A_100.0%A
Q3xy	86.87%	89.43%	90.50%	90.97%
' ' q|, x y_99.15%A_99.47%a_99.65%A_100.0%A
Q4xy	94.39%	95.58%	95.39%	95.50%
' ' q|,x y_90.78%_94.89%_95.22%_95.44%
Q+x , y	86.38%	91.59%	93.65%	94.71%
' ' q+x y_81.71%_90.28%_93.27%_94.57%
Q+Xy	94.23%	97.77%	99.19%	99.52%
' ' q+x , y_99.36%a_99.94%a_100.0%A_100.0%A
Q+xy	94.54%	97.39%	98.77%	99.48%
' ' q+x , y_99.41%A_99.94%a_100.0%A_100.0%A
Q+x , y	95.40%	95.38%	95.65%	95.66%
q+x y	91.81%	94.27%	95.02%	95.27%
Table entries marked with a "A" indicate that despite the wrong recognition of a fragmentary sentence, the missing component has been identified correctly. In all tested models such situations arise for sentences with one of the operands missing. For HRR, however the missing item has been "accidentally" correctly identified also in cases of missing operation * times and result * timesxyy components. Such recognition did not occur in cases of missing operation * plus and result * plusxyy components, which is distressingly asymmetric.
HRR turned out to be the worst model during this experiment. The recognition percentage of Q |, x , y and Q+x , y is dangerously low when compared to other Q's. Both Q|, x,y and Q+x , y are retrieved from the clean-up memory given only two operands and the operation type. Since we have only two operation types, Q |, x, y and Q+x, y will not differ greatly from each other. This phenomenon is also observable in BSC (but not in GAC), where the recognition percentage of Qi's is only slightly lower than that of the other Q's. Apart from that weakness, BSC performs as well as GAC for adequate data size.
8 Conclusion
Authors developed a new model of distributed representations of data based on geometric algebra. Although the data representations of sentences encoded in this model may have varying lengths (as opposed to HRR and BSC), it can be justified by the fact that it is quite logical for sentences that hold more information to have larger "volume".
Tedious calculations presented in Section 3 imply that the GAC model is best applicable to sentences having a similar or identical complexity structure, otherwise it may be hard to make the process of asking questions and retrieving answers automatic. Because of this limitation, this
construction seems to be a promising candidate for a holographic database.
Although research in distributed representations has been thriving in the past decades, no one has yet developed a software tool that would employ distributed representations to implement databases with real-life contents. Of course, some attempts at scaling has been made so far, but they were rather narrowly aimed at specific tasks. Authors hope to develop such a tool in the (near) future.
9 Further Perspectives -
Quantum-like Computation Based on Geometric Algebra
Quantum algorithms [17] employ tensor product binding and thus are analogous to Smolensky's tensor product representations [25]. The peculiarity of quantum computation is in its putative implementation: hardware based on the rules of micro-world automatically guarantees parallelism of processing the entire superposition of bound objects. The same property, however, makes quantum processors extremely sensitive to noise so it is by no means evident that working devices will be practically constructed.
The question is if we really have to look for micro-world implementations of quantum computation. Replacing tensor products by geometric products one obtains a one-to-one map between quantum mechanical superpositions and multivectors [2, 4], and all elementary quantum gates have geometric analogues [3]. This proves that quantum algorithms can be, in principle, implemented in systems described by geometric algebra.
DISTRIBUTED REPRESENTATIONS ...
Informatica 35 (2011)407-417 417
Acknowledgement
This work was supported by Grant G.0405.08 of the Fund
for Scientific Research Flanders.
References
[1]	D. Aerts and M. Czachor (2004), "Quantum aspects of semantic analysis and symbolic artificial intelligence", J. Phys. A, vol. 37, pp. L123-L13.
[2]	D. Aerts and M. Czachor (2007), "Cartoon computation: Quantum-like algorithms without quantum mechanics", J. Phys. A, vol. 40, pp. F259-F266.
[3]	M. Czachor (2007), "Elementary gates for cartoon computation", J. Phys. A, vol. 40, pp. F753-F759.
[4]	D. Aerts and M. Czachor (2008), "Tensor-product versus geometric-product coding", Physical Review A, vol. 77, id. 012316.
[5]	D. Aerts, M. Czachor, and B. De Moor (2009), "Geometric Analogue of Holographic Reduced Representation", J. Math. Psychology, vol. 53, pp. 389-398.
[6]	D. Aerts, M. Czachor, and B. De Moor (2006), "On geometric-algebra representation of binary spatter codes". preprint arXiv:cs/0610075 [cs.AI].
[7]	D. Aerts, M. Czachor, and L. Orlowski (2009), "Tele-portation of geometric structures in 3D ", J. Phys. A vol. 42, 135307.
[8]	W.K. Clifford (1878), "Applications of Grassmann's extensive algebra", American Journal of Mathematics Pure and Applied, vol. 1, 350-358.
[9]	R. W. Gayler (1998), "Multiplicative binding, representation operators, and analogy", Advances in Analogy Research: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences, K. Holoyak, D. Gentner, and B. Kokinov, eds., Sofia, Bulgaria: New Bulgarian University, p. 405.
[10]	H. Grassmann (1877), "Der Ort der Hamilton'schen Quaternionen in der Ausdehnungslehre", Mathematische Annalen, vol. 3, 375-386.
[11]	G. E. Hinton, J. L. McClelland and D. E. Rumel-hart (1986), ""Parallel distributed processing: Explorations in the microstructure of cognition", vol. 1, 77U109, "Distributed representations", The MIT Press, Cambridge, MA.
[12]	P. Kanerva (1996), "Binary spatter codes of ordered k-tuples". In C. von der Malsburg et al. (Eds.), Artificial Neural Networks ICANN Proceedings, Lecture Notes in Computer Science vol. 1112, pp. 869-873.
[13]	P. Kanerva (1997), "Fully distributed representation". Proc. 1997 Real World Computing Symposium (RWCS97, Tokyo), pp. 358-365.
[14]	E. M. Kussul (1992), Associative Neuron-Like Structures. Kiev: NaukovaDumka (in Russian).
[15]	E.M. Kussul and T.N. Baidyk (1990), "Design of Neural-Like Network Architecture for Recognition of Object Shapes in Images", Soviet J. Automation and Information Sciences, vol. 23, no. 5, pp. 53-58.
[16]	N.G. Marchuk, and D.S. Shirokov (2008), "Unitary spaces on Clifford algebras", Advances in Applied Clifford Algebras, vol 18, pp. 237-254.
[17]	M.A. Nielsen and I.L. Chuang (2000), Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.
[18]	A. Patyk (2010), "Geometric Algebra Model of Distributed Representations", in Geometric Algebra Computing in Engineering and Computer Science, E. Bayro-Corrochano and G. Scheuermann, eds. Berlin: Springer. Preprint arXiv:1003.5899v1 [cs.AI].
[19]	T. Plate (1995), "Holographic Reduced Representations", IEEE Trans. Neural Networks, vol. 6, no. 3, pp. 623-641.
[20]	T. Plate (2003), Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford.
[21]	D.A. Rachkovskij (2001), "Representation and Processing of Structures with Binary Sparse Distributed Codes", IEEE Trans. Knowledge Data Engineering, vol. 13, no. 2, pp. 261-276.
[22]	P. Smolensky (1990), "Tensor product variable binding and the representation of symbolic structures in connectionist systems". Artificial Intelligence, vol. 46, pp. 159-216.