https://doi.org/10.31449/inf.v43i4.2725 Informatica 43 (2019) 485–494 485
A Solution to the Problem of the Maximal Number of Symbols for
Biomolecular Computer
Jacek Waldmajer
Institute of Computer Science, University of Opole, Oleska 48, 45-052 Opole, Poland
E-mail: jwaldmajer@uni.opole.pl
Sebastian Sakowski
Faculty of Mathematics and Computer Science, University of Lodz, Banacha 22, 90-238 Lodz, Poland
E-mail: sebastian.sakowski@wmii.uni.lodz.pl
Keywords: biomolecular computer, biomolecular systems, DNA computing
Received: March 15, 2019
The authors present a solution to the problem of generating the maximum possible number of symbols
for a biomolecular computer using restriction enzyme BbvI and ligase as the hardware, and transition
molecules built of double-stranded DNA as the software. The presented solution offers an answer to the
open question, in the algorithm form, of the maximal number of symbols for a biomolecular computer that
makes use of the restriction enzymeBbvI.
Povzetek: Razvit je nov naˇ cin izraˇ cuna najveˇ cjega števila simbolov za biomulekularni raˇ cunalnik.
1 Introduction
The beginnings of research into possibilities of applying
biomolecules to control biological systems, and also to
construct computers, are to be found in theoretical works
of the 1960s (Feynman 1961). Then, in the 1980s, Charles
Bennett (1982, Bennett and Landauer 1985) pointed to po-
tential possibilities of application of biomolecules to con-
struct energy-efﬁcient nanodevices. However, the world
had to wait to see the ﬁrst practical experiments realiz-
ing simple calculations with the use of biochemical reac-
tions until the mid-1990s, when Leonard Adleman (1994)
solved the problem of Hamilton’s path in graph, using ex-
clusively a biomolecule for this purpose. Successive re-
search revealed the possibility of spontaneous formation
of multidimensional structures built from biomolecules,
which were made with the use of the conception of self-
assembly (Whitesides et al. 1991, Seeman 2001, Gopinath
et al. 2016). The multidimensional DNA structures made
it possible to realize fractals, e.g., ones of Sierpi´ nski tri-
angle type (Rothemund 2004), which revealed a great po-
tential in calculations based on self-assembly. In 2006,
Paul Rothemund (2006) made use of self-assembling DNA
molecules to obtain different multidimensional biomolec-
ular structures. Properly prepared DNA molecules also
made it possible to carry out a theoretical simulation of Tur-
ing machine (Rothemund 1995). Prior to this, in 2001 (Be-
nenson et al. 2001) a practically acting non-deterministic
ﬁnite automaton based on such DNA molecules, restriction
enzyme FokI and DNA ligase was presented. In succes-
sive research, it was proved experimentally that such an
automaton can work without the use of ligase enzyme (Be-
nenson et al. 2003, Chen et al. 2007) and its complex-
ities were extended in practical experiments, ones under-
stood as the number of states using numerous restriction
enzymes (Sakowski et al. 2017). It is worth adding that
it was with success that laboratory experiments were car-
ried out, in which this biomolecular system was applied to
medical diagnosis and treatment (Benenson et al. 2004)
and also to simple logical inference (Ran 2009). In an-
other work which dealt with possibilities of applying DNA
molecules, a challenge was taken up to not only increase
the number of states of such an automaton (Unold et al.
2004), but also that of symbols possible for an automaton
built from DNA (Soreni et al. 2005). Moreover, presented
the notion of biomolecular automaton, informally charac-
terized in the papers of Rothemund (1995), Benenson et
al. (2001), Soreni et al (2005), was presented in a formal
way (as a mathematical model called a tailor automaton in
a new theory of tailor automata) in the paper Waldmajer et
al. (2019).
In the above-mentioned work, Soreni and co-workers
(Soreni et al. 2005) put forward a 3-state 3-symbol
biomolecular automaton which used the restriction enzyme
BbvI as well as considered the problem of determining the
maximal number of symbols for the constructed biomolec-
ular automaton. On the basis of the conducted assessment
they pointed out that it is possible to construct 40 symbols,
each of which is composed of 6 pairs of nucleotides. How-
ever, in their work, they pointed to merely 37 such symbols,
including one which was erroneously determined. Conse-
quently, they opened the following issue (p. 3937): It is
still an open question whether the maximal number of 6-
bp sequences that produce distinct 4-bp sticky ends in both
486 Informatica 43 (2019) 485–494 J. Waldmajer et al.
strands is 40. It is with reference to this open question
that the authors of the present work undertook and man-
aged to solve the problem mentioned by Soreni et al. in
their work (2005) through: (1) indicating 40 symbols (see
Tab. 4) which make the solution to the open problem, (2)
proposing the idea of working of an algorithm that enables
to generate 40 symbols for a biomolecular automaton us-
ing the restriction enzyme BbvI, and (3) formulating two
general problems in the sphere of generating symbols for
biomolecular automata which use one restriction enzyme
(among which a biomolecular automaton using the restric-
tion enzymeBbvI is a particular case) and more than one
restriction enzyme.
The second section presents the idea of constructing and
working of a 3-state 3-symbol biomolecular automaton us-
ing the restriction enzymeBbvI as presented by Soreni and
co-workers in their work (Soreni et al. 2005). In the third
section the conception of working of an algorithm generat-
ing the maximal number of symbols for a biomolecular au-
tomaton using the restriction enzymeBbvI was presented
together with a discussion of various undesired situations
which may occur in the course of working of a biomolecu-
lar automaton that makes use of one restriction enzyme (in
particular for the restriction enzymeBbvI). In the last sec-
tion, there were formulated two general problems of gener-
ating the maximal number of symbols for a certain class of
biomolecular automata using one or more than one restric-
tion enzymes.
2 Biomolecular ﬁnite automaton
and the idea of its actions
In this section, we make a presentation of the 3-state 3-
symbol biomolecular ﬁnite DNA automaton (see Fig. 1),
which was presented by Soreni and co-workers (Soreni
et al. 2005). The automaton uses the restriction enzyme
BbvI, ligase enzyme and DNA double-stranded fragments
(input molecule, set of transition molecules and set of de-
tection molecules). The double-stranded DNA fragments
include the adenine, cytosine, guanine, and thymine bases
marked as A, C, G and T, respectively.
-
      s
2
    ?
b,c
-
a
  b
      s
0
     ?
a
-
c
            s
1
     ?
a,b,c
Figure 1: Graph representing a 3-state 3-symbol determin-
istic ﬁnite automatonM
1
.
The task of the BbvI restriction enzyme is to cut the
double-stranded DNA after recognizing a speciﬁc sequence
(see Fig. 2A) in the double-stranded DNA.
The BbvI restriction enzyme will cut the double-
stranded DNA after the 8th nucleotide in the DNA strand in
the 5
0
-3
0
direction and after the 12th nucleotide in the DNA
strand in the 3
0
-5
0
direction from the recognized speciﬁc
?
6
.
.  -
8
.
.
.
.  -
12
.
.
-
base pair
(A)
5
0
- GCAGC -3
0
3
0
- CG T CG -5
0
(B)
5
0
- GCAGC T T AAA T C T GGC T T -3
0
3
0
- CG T CGAA T T T AGAC CGAA -5
0
Figure 2: (A) Speciﬁc sequence recognized by the BbvI
restriction enzyme. (B) The action of restriction endonu-
clease:BbvI.
sequence (see Fig. 2B).
The task of the ligase enzyme is to ligate the two double-
stranded DNAs having complementary sticky ends (see
Fig. 4A and 4B), where a sticky end is a single-stranded
DNA at the end of a double-stranded DNA. In the given
sense, the sticky end ‘TTTA’ of a single-stranded DNA (see
Fig. 4A) is complementary to a sticky end ‘AAAT’ of the
other double-stranded DNA (see Fig. 4B). The result of
their ligation is one double-stranded DNA (see Fig. 4C).
Both the restriction enzymeBbvI and the ligase enzyme
play the key role in the action of a biomolecular automa-
ton, determining, respectively: the operation of cutting of a
fragment of the double-stranded DNA and the operation of
ligating of two fragments of double-stranded DNAs.
The input molecule (see Fig. 3) is a double-stranded
DNA fragment in which it is possible to distinguish the
following three basic parts: the input word x consisting
of the symbols a, b and c (x = acb), the terminal sym-
bol and the base sequence. At the both ends of the input
molecule there occur additional base pairs and their occur-
rence is determined by the properties related to the action
of the restriction enzyme.
To construct an input word of the 3-state 3-symbol deter-
ministic ﬁnite automaton, the following three symbols:a,b
andc (see Fig. 5) were used. These symbols were coded by
means of six base pairs. Besides the aforementioned sym-
bols, the additional terminal symbolt was introduced. This
symbol is coded by means of the same number of base pairs
as the symbolsa,b andc. This symbol was used to acquire
an output molecule which is used to determine whether the
automaton has ﬁnished acting in the required state and has
accepted the input wordx.
The base sequence consists of a certain number of base
pairs, contains a speciﬁc sequence recognizable by the
BbvI restriction enzyme, and makes it possible to deﬁne
the start state by determining the cut place of the input
molecule by the BbvI restriction enzyme (cf. Fig. 3 and
Fig. 2B). Let us note that the term “base sequence” did
not appear in work Soreni et al. (2005). Introducing this
term is meant to clearly determine the manner of setting
the start state of a biomolecular automaton. According to
the idea contained in the work of Soreni and co-workers
A Solution to the Problem of the. . . Informatica 43 (2019) 485–494 487
| {z }
Abp
| {z }
base
sequence
| {z }
wordx
| {z }
terminal
symbol
| {z }
Abp
a
  -
c
  -
b
  -
t
  -
22
bp
21
bp
5
0
- GCAGC T T AAA T C T GGC T T GCGA T GAG T GA T G T CGC -3
0
3
0
- CG T CGAA T T T AGAC CGAACGC T AC T CAC T ACAGCG -5
0
Figure 3: Input molecule containing the input wordx =acb; Abp – Additional base pairs.
Table 1: Connection of the statess
0
,s
1
ands
2
of a biomolecular automaton with the permanent cut places of the symbols
a,b,c andt of the biomolecular automaton.
state symbola symbolb symbolc symbolt
s
0
5
0
- C T GGC T -3
0
3
0
-GAC CGA-5
0
5
0
-GAG T GA-3
0
3
0
-C T CAC T -5
0
5
0
- T GCGA T -3
0
3
0
-ACGC T A-5
0
5
0
- T G T CGC -3
0
3
0
-ACAGCG-5
0
s
1
5
0
- C T GGC T -3
0
3
0
-GAC CGA-5
0
5
0
-GAG T GA-3
0
3
0
-C T CAC T -5
0
5
0
- T GCGA T -3
0
3
0
-ACGC T A-5
0
5
0
- T G T CGC -3
0
3
0
-ACAGCG-5
0
s
2
5
0
- C T GGC T -3
0
3
0
-GAC CGA-5
0
5
0
-GAG T GA-3
0
3
0
-C T CAC T -5
0
5
0
- T GCGA T -3
0
3
0
-ACGC T A-5
0
5
0
- T G T CGC -3
0
3
0
-ACAGCG-5
0
.
.
  -
sticky end
.
.
.
.
.
.
.
5
0
- GCAGC T T -3
0
(A)
3
0
- CG T CGAA T T T A -5
0
.
.
  -
sticky end
.
.
5
0
- AAA T C T GGC T T G -3
0
(B)
3
0
- GAC CGAAC -5
0
5
0
- GCAGC T T AAA T C T GGC T T G -3
0
(C)
3
0
- CG T CGAA T T T AGAC CGAAC -5
0
Figure 4: (A) and (B) Double-stranded DNAs with the
complementary sticky ends. (C) The result of a ligation be-
tween the double-stranded DNAs with the complementary
sticky ends included in (A) and (B).
a
5
0
- C T GGC T -3
0
3
0
-GAC CGA -5
0
b
5
0
- GAG T GA -3
0
3
0
- C T CAC T -5
0
c
5
0
- T GCGA T -3
0
3
0
-ACGC T A -5
0
t
5
0
- T G T CGC -3
0
3
0
- ACAGCG -5
0
Figure 5: Symbolsa,b,c and the terminal symbolt.
(Soreni et al. 2005), the reading of a symbol in a certain
state of the automaton is identiﬁed with the cutting of the
double-stranded DNA by the BbvI restriction enzyme in
the area of a symbol, in a determined (permanent) place of
the DNA strand, in the 5
0
-3
0
direction and in a determined
(permanent) place of the DNA strand in the 3
0
-5
0
direction.
Tab. 1 presents a connection between the states and two
permanent cut places of the symbols.
In accordance with the input molecule presented on Fig.
3, the ﬁrst cutting input molecule with the use of the restric-
tion enzymeBbvI will follow in the area of the symbola,
which corresponds to the states
2
. In this way, in the state
s
2
, the symbola was read and a fragment of DNA was ob-
tained as presented on Fig. 6. In this sense, the states
2
is
a initial state. Adding to the base sequence of one or two
pairs of nucleotides can set the start state to be the follow-
ing:s
1
ors
0
, respectively.
c
  -
b
  -
t
  -
21
bp
5
0
-GGC T T GCGA T GAG T GA T G T CGC -3
0
3
0
- ACGC T AC T CAC T ACAGCG -5
0
Figure 6: Double-stranded DNA fragment obtained after
BbvI acting on the input molecule.
The set of transition molecules is used to implement a
set of transitions in the 3-state 3-symbol deterministic ﬁ-
nite automaton. We obtain transition from one state to the
other (the same or another state), upon reading a symbol,
through ligating with the use of ligase enzyme, of a DNA
fragment obtained on Fig. 6 with one of the transition
molecules. Each transition molecule contains a speciﬁc se-
quence recognizable by the BbvI restriction enzyme and
the additional base pairs. Exemplary transition molecules
are presented in Fig. 7: the transition molecule presented
on Fig. 7A enables transition from the state s
0
to that of
s
1
after reading the symbolc; the transition molecule pre-
sented on Fig. 7B enables transition from the states
1
to that
of s
1
after reading the symbol b; the transition molecule
presented on Fig. 7C enables transition from the states
2
to
that ofs
0
after reading the symbola.
For each state, one detection molecule is constructed and
thus a set of detection molecules is speciﬁed (see Fig. 8). It
should be noted that the detection molecules have different
numbers of additional base pairs, which makes it possible
to determine laboratorily the state in which the automaton
ﬁnished its action.
The beginning of the work: the BbvI, ligase enzyme,
many copies of the transition molecules and many copies
488 Informatica 43 (2019) 485–494 J. Waldmajer et al.
(A)
35
bp
5
0
- GCAGC T -3
0
3
0
- CG T CGAACGC -5
0
(B)
39
bp
5
0
- GCAGC T T -3
0
3
0
- CG T CGAA T CAC -5
0
(C)
35
bp
5
0
- GCAGC T A T T -3
0
3
0
- CG T CGA T AAC CGA -5
0
Figure 7: Selected transition molecules used in the transi-
tion function: (A)T
1
: (s
0
,c)!s
1
, (B)T
2
: (s
1
,b)!s
1
,
(C)T
3
: (s
2
,a)!s
0
.
(A)
54
bp
5
0
- -3
0
3
0
- ACAG-5
0
(B)
200
bp
5
0
- -3
0
3
0
- CAGC-5
0
(C)
300
bp
5
0
- -3
0
3
0
- AGCG-5
0
Figure 8: (A) Detection moleculeD
1
for the states
0
. (B)
Detection molecule D
2
for the state s
1
. (C) Detection
moleculeD
3
for states
2
.
of the detection molecules are placed in a laboratory tube;
the ﬁnal addition is many copies of the input molecule. Af-
ter these elements have been mixed in the test tube, the
biomolecular automaton starts its action. In successive
steps there follows reading of the symbol a in the state
s
2
(see Fig. 9a), making use of the transition molecule
shown in Fig. 7C to transition from the state s
2
to that
ofs
0
after reading the symbola (see Fig. 9b), reading of
the symbolc in the states
0
(see Fig. 9c), using the transi-
tion molecule presented in Fig. 7A to transition from the
states
0
to the states
1
after reading the symbolc (see Fig.
9d) reading the symbolb in the states
1
(see Fig. 9e), using
the transition molecule presented in Fig. 7B and reading
the terminal symbol t in the state s
1
(see. Fig. 9f-g). In
the last step there follows ligation of a fragment of double-
stranded DNA presented in Fig. 9g with one of the detec-
tion molecules (see Fig. 8B). As a result of ligation of these
DNA fragments an output molecule is formed (see Fig. 9h),
which – from the laboratory point of view – serves to de-
termine the end state of a biomolecular ﬁnite automaton.
3 Algorithm for the problem of the
maximal number of symbols
3.1 The formal aparatus used in the
description of the algorithm
Let the set   = fA, C, G, Tg and the function  , which is
bijection of the set   on   , which is deﬁned in the follow-
ing way:   (A) = T,  (T) = A,  (C) = G and  (G) = C be
given. The set   is called a set of nucleotides, the elements
of the set   are called nucleotides, and the function   is
called complementarity of nucleotides.
We call any ﬁnite sequence of nucleotides of the set   as
a word. The wordx which is the sequenceX
1
;X
2
;:::;X
j
of nucleotides of the set   (X
i
2   , 0 < i  j2 N) is
written as followsx = X
1
X
2
:::X
j
. The number of the
elements of the sequencex is called the length of the word
x (denoted symbolically:jxj), while thei-th nucleotide of
the wordx (thei-th element of the wordx) asx(i). The set
of all the words formed from the nucleotides of the set   ,
whose length is greater than zero, is denoted as   +
.
Let   +
3x =X
1
X
2
:::X
j
(X
i
2   , 0<i  j2N)
and   +
3y =Y
1
Y
2
:::Y
j
(Y
i
2   , 0<i  j2N). We
call the word X
j
:::X
2
X
1
an opposite word (we denote
symbolically: x
  1
) to the wordx. We call the wordxy =
X
1
X
2
:::X
i
Y
1
Y
2
:::Y
j
a concatenationxy of two words
x and y such that   +
3 x = X
1
X
2
:::X
i
(X
i
2   ,
0 < i2 N) and   +
3 y = Y
1
Y
2
:::Y
j
(Y
j
2   , 0 <
j2N). We say that the wordx is included in the wordy,
beginning with thek-th (1  k2 N) position (we denote
symbolically: x  k
y), ifk +jxj j yj + 1 and9u;v2
    (y = uxv^juj = k  1): The wordx is a sub-word
of y (we denote symbolically: x  y) when the wordx is
included in the wordy, beginning with a certain positionk,
i.e.,x  y,9k(x  k
y). The wordx is a preﬁx of the
wordy, whenx  1
y. The wordx is a sufﬁx of the word
y, whenx
  1
is the preﬁx of the wordy
  1
.
The introduced notion of complementarity of nu-
cleotides and the introduced denotations make it possible
to deﬁne the function which will be called complemen-
tariness of words. The mapping  :   +
!   +
deﬁned
in the following way:  ( x) = y, wherejyj = jxj and
y(i) =  (x(i)) for eachi2f1;:::;jyjg and, forx2   +
is called complementarity of words.
Let   +
3x =X
1
X
2
X
3
X
4
(X
i
2   , 0<i  j2N)
and   +
3y =Y
1
Y
2
Y
3
Y
4
(Y
i
2   , 0<i  l2N). The
wordsx andy are synthesable over the length 3, when there
exists the wordu2   +
of the length 3 being the sufﬁx of
the wordx and the preﬁx of the wordy. The concatenation
of the synthesable words x and y over the length 3 is the
wordz = [x;y]
3
, wherez =X
1
X
2
X
3
X
4
Y
4
.
3.2 Description of the algorithm
The idea of the algorithm of generating the maximal num-
ber of symbols for a biomolecular automaton using the re-
striction enzyme BbvI will be characterized through four
stages, which are distinguished in the algorithm: the initial
stage, the stage of deployment and veriﬁcation, the stage
of generation and the ﬁnal stage. At each of the indicated
stages we make use only of strands of symbols in the direc-
tion 5
0
-3
0
, since having strands of symbols in the direction
5
0
-3
0
, we can – by means of the principle of complementar-
ity of nucleotides – obtain strands of symbols in the direc-
tion 3
0
-5
0
.
Let the set A
0
of all 4-element sequences of nu-
cleotides be given: A
0
= fx : x 2   +
^jxj =
4g=fAAAA; AAAC; AAAG;:::; TTTG; TTTTg. At the
initial stage, we remove words (4-element sequences of nu-
cleotides): AATT, ACGT, AGCT, ATAT, CCGG, CATG,
CTAG, CGCG, GGCC, GATC, GTAC, GCGC, TTAA,
A Solution to the Problem of the. . . Informatica 43 (2019) 485–494 489
(a)
c
  -
b
  -
t
  -
21
bp
5
0
- GGC T T GCGA T GAG T GA T G T CGC -3
0
3
0
- ACGC T AC T CAC T ACAGCG -5
0
?
(b)
?
6
34
bp
21
bp
5
0
- GCAGC T A T T GGC T T GCGA T GAG T GA T G T CGC -3
0
3
0
- CG T CGA T AAC CGAACGC T AC T CAC T ACAGCG -5
0
?
(c)
b
  -
t
  -
21
bp
5
0
- T GCGA T GAG T GA T G T CGC -3
0
3
0
- T AC T CAC T ACAGCG -5
0
?
(d)
?
6
35
bp
21
bp
5
0
- GCAGC T T GCGA T GAG T GA T G T CGC -3
0
3
0
- CG T CGAACGC T AC T CAC T ACAGCG -5
0
?
(e)
t
  -
21
bp
5
0
- AG T GA T G T CGC -3
0
3
0
- T ACAGCG -5
0
?
(f)
?
6
39
bp
21
bp
5
0
- GCAGC T T AG T GA T G T CGC -3
0
3
0
- CG T CGAA T CAC T ACAGCG -5
0
?
(g)
21
bp
5
0
- G T CGC -3
0
3
0
- G -5
0
?
(h)
54
bp
21
bp
5
0
- G T CGC -3
0
3
0
- CAGCG -5
0
Figure 9: Control serving the reading of symbols of the word acb from the input molecule and obtaining an output
molecule in the biomolecular automaton using the enzymeBbvI.
TCGA, TGCA, TATA from the setA
0
of all 4-element se-
quences of nucleotides.
The appearance of the indicated sixteen words (4-
element sequences of nucleotides) causes a biomolecular
automaton to malfunction due to the possibility of ligation
of a transition molecule with itself – each of the transition
molecules exists in multi copies.
Let the transition molecule be given, in which we use
the sticky end: CATG (see Fig. 10A). Let us note that this
molecule occurs in many copies. Thus, as a result of action
of the biomolecular automaton and ligation of one copy
of the transition molecule T
NS1
(cf. Fig. 10A) with an-
other copy of the same transition molecule there forms the
double-stranded fragment of DNA presented in Fig. 10B.
In consequence, this causes the number of copies of the
moleculeT
NS1
, to be limited, which can be made use of in
further computations carried out by the biomolecular au-
tomaton.
So as to prevent the possibility of ligation of copies of the
same transition molecule, it is necessary to remove from
the setA
0
the words which satisfy the following condition:
(  ) x
  1
=  ( x), wherex2A
0
:
In this way, we reject sixteen words, given earlier, from the
setA
0
and as in consequence we obtain the set:
A
1
=fx :x2A
0
^x
  1
6=  ( x)g =
(A)
35
bp
5
0
- GCAGC T -3
0
3
0
- CG T CGACA T G-5
0
(B)
35
bp
35
bp
5
0
- GCAGC T G T ACAGC T GC -3
0
3
0
- CG T CGACA T G T CGACG -5
0
Figure 10: (A) Transition moleculeT
NS1
using the sticky
end: CATG. (B) Double-stranded fragment of DNA formed
as a result of ligation of two copies of the transition
molecule presented in (A).
fx :x2   +
^jxj = 4^x
  1
6=  ( x)g;
where the number of the elements of the setA
1
amounts
to 240. Availing ourselves of the elements of the setA
1
,
we form the maximal setA
2
of pairs of the elements in the
following manner:
A
2
=f(x;y) :x;y2A
1
^x
  1
=  ( y)g;
where the number of the elements of the setA
2
amounts to
240. Then, using the setA
2
, we form the setA
3
of pairs
in the following way: the setA
3
is the setA
2
, from which
we removing certain pairs according to the principle of (P).
The principle of (P): if the pairs (x;y) and (y;x) belong
to the setA
2
, then we will remove from the setA
2
a pair
whose ﬁrst element of the pair, comparing the both ﬁrst el-
490 Informatica 43 (2019) 485–494 J. Waldmajer et al.
Table 2: Part I: 120 pairs (x,y) of four-element sequences of nucleotides.
No
x y
No
x y
No
x y
1 A A A A T T T T
21 A C C C G G G T
41 A G T C G A C T
2 A A A C G T T T
22 A C C G C G G T
42 A G T G C A C T
3 A A A G C T T T
23 A C C T A G G T
43 A T A A T T A T
4 A A A T A T T T
24 A C G A T C G T
44 A T A C G T A T
5 A A C A T G T T
25 A C G C G C G T
45 A T A G C T A T
6 A A C C G G T T
26 A C G G C C G T
46 A T C A T G A T
7 A A C G C G T T
27 A C T A T A G T
47 A T C C G G A T
8 A A C T A G T T
28 A C T C G A G T
48 A T C G C G A T
9 A A G A T C T T
29 A C T G C A G T
49 A T G A T C A T
10 A A G C G C T T
30 A G A A T T C T
50 A T G C G C A T
11 A A G G C C T T
31 A G A C G T C T
51 A T G G C C A T
12 A A G T A C T T
32 A G A G C T C T
52 A T T A T A A T
13 A A T A T A T T
33 A G A T A T C T
53 A T T C G A A T
14 A A T C G A T T
34 A G C A T G C T
54 A T T G C A A T
15 A A T G C A T T
35 A G C C G G C T
55 C A A A T T T G
16 A C A A T T G T
36 A G C G C G C T
56 C A A C G T T G
17 A C A C G T G T
37 A G G A T C C T
57 C A A G C T T G
18 A C A G C T G T
38 A G G C G C C T
58 C A C A T G T G
19 A C A T A T G T
39 A G G G C C C T
59 C A C C G G T G
20 A C C A T G G T
40 A G T A T A C T
60 C A C G C G T G
ements of the pairs: (x;y) and (y;x), is lexicographically
posterior (see Examp. 1). Tables Tab. 2 and Tab. 3 present
240 elements forming 120 pairs (x,y) of the setA
3
, where
x;y2A
1
.
Example 1: Let us note that the pairs (AAAA, TTTT),
(TTTT, AAAA)2A
2
. The ﬁrst (element: AAAA) of the
ﬁrst pair (AAAA, TTTT) is lexicographically prior to the
ﬁrst element (element: TTTT) of the second pair (TTTT,
AAAA). Thus, the pair (AAAA, TTTT) belongs to the set
A
3
, and the pair (TTTT, AAAA) does not belong toA
3
.
Let us consider pair (x, y)=(AAAA, TTTT) from Tab.
2 (No 1) and two transition molecules in the biomolecular
automaton with sticky ends: AAAA and TTTT (see Fig.
11A and Fig. 11B). As a result of ligation of these transi-
tion molecules is formed the double-stranded fragment of
DNA presented in Fig. 11C. As a consequence, this causes
the number of the copies of the moleculesT
NS2
andT
NS3
,
to be limited, which may be used in further calculations
done by the biomolecular automaton. In connection with
this, in the algorithm of generating the maximal number of
symbols for a biomolecular automaton using the restriction
enzyme BbvI only one element of each of the given 120
pairs of the setA
3
should be used. Selecting individual el-
ements of the successive pairs in this manner, we obtain the
familyP(A
1
) of maximal setsB  A
1
, such that for each
x;y2B the condition holds,
(**)x
  1
6=  ( y):
The indicated condition (**) prevents the formation of
transition molecules which could ligate with one another
during the action of the biomolecular automaton.
In the next part of the algorithm, we will select elements
of the familyP(A
1
) as sets meant to serve to check the
possibilities of generating 40 symbols for the biomolecular
automaton using the restriction enzymeBbvI. Thus, let the
setC be a chosen element of the familyP(A
1
).
In the ﬁrst part of the stage of deployment and veriﬁ-
cation, we select a single assignment of 120 words being
the elements of the setC, to three setsG
1
,G
2
andG
3
(40
words to each set) from among successive possible combi-
nations of assigning the 120 words of the set C to 3 sets
consisting of 40 each.
In the second part of the stage of deployment and veriﬁ-
cation we pre-check whether we are able to form 40 words
of length 6 (whether we can create 40 strands in the direc-
tion 5
0
-3
0
). We examine this by comparing the elements of:
ﬁrst, the setsG
1
,G
2
and thenG
2
,G
3
in the following way:
1. for the setsG
1
andG
2
we check whether the number
of occurrences of each word x of length 3, being a
sufﬁx in the words of the setG
1
, is identical with the
A Solution to the Problem of the. . . Informatica 43 (2019) 485–494 491
Table 3: Part II: 120 pairs (x,y) of four-element sequences of nucleotides.
No
x y
No
x y
No
x y
61 C A G A T C T G 81 C G G A T C C G 101 G C A C G T G C
62 C A G C G C T G 82 C G G C G C C G 102 G C C A T G G C
63 C A G G C C T G 83 C G T A T A C G 103 G C C C G G G C
64 C A T A T A T G 84 C G T C G A C G 104 G C G A T C G C
65 C A T C G A T G 85 C T A A T T A G 105 G C T A T A G C
66 C C A A T T G G 86 C T A C G T A G 106 G G A A T T C C
67 C C A C G T G G 87 C T C A T G A G 107 G G A C G T C C
68 C C A G C T G G 88 C T C C G G A G 108 G G C A T G C C
69 C C C A T G G G 89 C T G A T C A G 109 G G G A T C C C
70 C C C C G G G G 90 C T G C G C A G 110 G G T A T A C C
71 C C C G C G G G 91 C T T A T A A G 111 G T A A T T A C
72 C C G A T C G G 92 C T T C G A A G 112 G T C A T G A C
73 C C G C G C G G 93 G A A A T T T C 113 G T G A T C A C
74 C C T A T A G G 94 G A A C G T T C 114 G T T A T A A C
75 C C T C G A G G 95 G A C A T G T C 115
T A A A T T T A
76 C G A A T T C G 96 G A C C G G T C 116
T A C A T G T A
77 C G A C G T C G 97 G A G A T C T C 117
T A G A T C T A
78 C G A G C T C G 98 G A G C G C T C 118
T C A A T T G A
79 C G C A T G C G 99 G A T A T A T C 119
T C C A T G G A
80 C G C C G G C G 100 G C A A T T G C 120
T G A A T T C A
(A)
35
bp
5
0
- GCAGC T -3
0
3
0
- CG T CGAAAAA -5
0
(B)
35
bp
5
0
- GCAGC T -3
0
3
0
- CG T CGA T T T T -5
0
(C)
35
bp
35
bp
5
0
- GCAGC T T T T T AGC T GC -3
0
3
0
- CG T CGAAAAA T CGACG -5
0
Figure 11: (A) Transition molecule T
NS2
using the
sticky end: AAAA. (B) Transition molecule T
NS3
using
the sticky end: TTTT. (C) Double-stranded fragment of
DNA formed as a result of ligation of the two transition
molecules presented in (A) and (B).
number of occurrences of the wordx as a preﬁx in the
words of the setG
2
.
2. for the sets G
2
and G
3
we check whether the num-
ber of occurrences of each wordx of length 3 being a
sufﬁx in the words of the setG
2
is identical with the
number of occurrences of the wordx as a preﬁx in the
words of the setG
3
.
At the stage of generating we introduce the auxiliary set
D =; and examine the possibility of forming 40 words
of length 6 (40 strands of symbols in the direction 5
0
-3
0
)
making use of the elements of the setsG
1
,G
2
andG
3
, as
well as synthesizable concatenations of words of length 3.
Each word of length 6 is obtained through a double use of
synthesizable concatenations of two words of length 3:
1. we select one word from each of the three setsG
1
,G
2
andG
3
in such a way as to make possible concatena-
tion of synthesizable wordsx2G
1
,y2G
2
of length
3 and also to enable concatenation of synthesizable
wordsy2G
2
,z2G
3
of length 3.
2. having selected the words x 2 G
1
, y 2 G
2
, z 2
G
3
which satisfy the above-mentioned condition, we
form a wordu of length 6 (a strand of symbol in the
direction 5
0
-3
0
):u = [[x;y]
3
;z]
3
(symbol assembling,
see Fig. 12),
3. the word u obtained upon satisfying the above-
presented condition is added to the setD.
In the case where it is impossible to form 40 words (40
wordsu) from the elements of the setsG
1
,G
2
andG
3
in
the way given above, we return to checking another pos-
sibility of assigning the elements of the set C to the sets
G
1
,G
2
andG
3
. In the case where all the possible assign-
ments of the elements of the setC to the setsG
1
,G
2
and
G
3
, we return to examining the next element of the family
P(A
1
). In the case where all the elements of the family
492 Informatica 43 (2019) 485–494 J. Waldmajer et al.
&%
’$
G
1
&%
’$
G
2
&%
’$
G
3
@
@
@ R
?
              +
(
Informal
symbol assembling
x = CACT
y = ACTG
z = CTGC
u = CACTGC (see Tab. 4, symbol no. 40, Strand 1)
o
Formal
symbol assembling
[x;y]
3
= [CACT, ACTG]
3
= CACTG
u = [[x;y]
3
;z]
3
= [CACTG, CTGC]
3
= CACTGC
Figure 12: Idea of symbol assembling.
P(A
1
) have been checked and it is impossible to obtain
40 symbols, the algorithm communicates: “Unable to ob-
tain 40 symbols for the biomolecular automaton using the
restriction enzymeBbvI".
If the setD has 40 words determined from the elements
of the setG
1
,G
2
andG
3
, we check whether the words of
this set (the strands of the symbols in the direction 5
0
-3
0
)
avoid each of the four, described below, undesired situa-
tions, due to the appearance of the sequence recognized by
the restriction enzymeBbvI.
The ﬁrst undesired situation concerns an inclusion of a
sequence recognized by the restriction enzymeBbvI inside
any symbol. An example to illustrate the above undesired
situation is presented in Fig. 13A. Let us note that the sec-
ond, analogous, undesired situation can occur if a sequence
recognized by the restriction enzymeBbvI is included in-
side any symbol “reversed by 180
o
". An example of the
latter is shown in Fig. 13B.
(A)
5
0
-GCAGCG-3
0
3
0
-CG T CGC-5
0
(B)
5
0
-AGC T GC -3
0
3
0
- T CGACG-5
0
Figure 13: Undesired situations: (A) a sequence recog-
nized by the restriction enzyme BbvI is included in the
symbol. (B) a sequence recognized by the restriction en-
zymeBbvI is contained in the symbol “reversed by 180
o
".
The third undesired situation concerns an inclusion of
a sequence recognized by the restriction enzyme BbvI in
connection of two symbols. An instance illustrating the
above-described situation is presented in Fig. 14A. Let
us note that the fourth, analogous, undesired situation can
occur if a sequence recognized by the restriction enzyme
BbvI is included in the connection of two symbols “re-
versed by 180
o
". An example to illustrate the above un-
desired situation is shown in Fig. 14B.
The appearance of any of the four undesired situations
can lead to the occurrence of an undesired action of a
biomolecular automaton. In connection with these situa-
tions, it is necessary to examine, respectively:
1. whether each wordx2 D satisﬁes the condition:  (e
x
  x),
2. whether each wordx2 D satisﬁes the condition:  (( ( e
x
))
  1
  x), where  (( ( e
x
))
  1
  x) means
(A)
5
0
- T AGGCAGC T T A T -3
0
3
0
-A T C CG T CGAA T A-5
0
(B)
5
0
- T C CGC T GCGGGG-3
0
3
0
-AGGCGACGC C C C-5
0
Figure 14: Undesired situations: (A) a sequence recog-
nized by the restriction enzyme BbvI is included in the
ligation of two symbols. (B) a sequence recognized by the
restriction enzymeBbvI is included in ligation of two sym-
bols “reversed by 180
o
".
that a sequence recognized by the restriction enzyme
BbvI cannot be included in the symbol “reversed by
180
o
” and relativized solely to one considered strand
in the direction 5
0
-3
0
,
3. whether the concatenationz = xy of any wordsx2
D andy2D satisﬁes the condition:  (e
x
  z),
4. whether the concatenationz = xy of any wordsx2
D andy2D satisﬁes the condition:  (( ( e
x
))
  1
  z), where  (( ( e
x
))
  1
  z) means that a sequence
recognized by the restriction enzymeBbvI cannot be
included in the ligation of two symbols “reversed by
180
  " and relativized only to one considered strand in
the direction 5
0
-3
0
.
In the case one of the four undesired situation is detected,
we return to checking another possibility of assigning the
elements of the set C to the sets G
1
, G
2
and G
3
. When
all the possible assignments of the elements of the set C
to the sets G
1
, G
2
and G
3
have been checked, we return
to examining another element of the familyP(A
1
). In
the case all the elements of the familyP(A
1
) have been
checked and it has been found that it is impossible to ob-
tain 40 symbols that do not include undesired situations,
the algorithm returns the message: ”Unable to obtain 40
symbols for a biomolecular automaton using the restriction
enzymeBbvI". If the setD has 40 words formed from the
elements of the setsG
1
,G
2
,G
3
and there does not occur
a single undesired situation, then we move on to the last
stage.
At the last stage we determine elements of 40 comple-
mentary words for each word of the setD, making use of
A Solution to the Problem of the. . . Informatica 43 (2019) 485–494 493
Table 4: List of 40 symbols (Strand 1 with its complementary Strand 2) consisting of 6 bp obtained for a biomolecular
automaton using restriction enzymeBbvI.
No Strand 1 Strand 2 No Strand 1 Strand 2
1 TCGCTA AGCGAT 21 GCCCGC CGGGCG
2 CGTTCG GCAAGC 22 CGCCAG GCGGTC
3 ATTGAT TAACTA 23 ATGGGT TACCCA
4 CGAGTA GCTCAT 24 GGTAGG CCATCC
5 CAGGGG GTCCCC 25 AGGTTA TCCAAT
6 TAGATA ATCTAT 26 GCTGTG CGACAC
7 ATAGTT TATCAA 27 GTGTAT CACATA
8 GTTTTG CAAAAC 28 TATTTA ATAAAT
9 TTGTTG AACAAC 29 TTACGA AATGCT
10 TTGGTG AACCAC 30 CGACTT GCTGAA
11 GTGCCG CACGGC 31 CTTCCG GAAGGC
12 CTAATG GATTAC 32 CCGTCT GGCAGA
13 ATGCGT TACGCA 33 TCTTAT AGAATA
14 CGTGAG GCACTC 34 TATGTC ATACAG
15 GAGCAA CTCGTT 35 GTCATC CAGTAG
16 CAAGCC GTTCGG 36 ATCGGT TAGCCA
17 GCCTTT CGGAAA 37 GGTCCT CCAGGA
18 TTTCTG AAAGAC 38 CCTCTC GGAGAG
19 CTGAAT GACTTA 39 CTCCAC GAGGTG
20 AATCCC TTAGGG 40 CACTGC GTGACG
the function   of complementarity of words. In this way we
acquire pairs of words which mean: a strand of the symbol
in the direction 5
0
-3
0
and a strand of the symbol in the di-
rection 3
0
  5
0
, respectively. On the basis of the presented
conception of the algorithm, there were generated 40 words
(strands in the direction 5
0
-3
0
) denoted as Strand 1 in Tab.
4, as well as 40 words (strands in the direction 3
0
  5
0
) de-
noted as Strand 2 in Tab. 4. At the same time, this is giving
an answer to the open question asked in 2005: It is possi-
ble to generate 40 symbols for a biomolecular automaton
using the restriction enzyme BbvI, in which the symbols
are coded by means of 6 pairs of nucleotides.
4 Conclusions
The considerations developed in this work aim, on the one
hand, to give an answer to the open question posed in
the work of Soreni and co-workers (Soreni et al. 2005),
relating to the possibility of indicating 40 symbols for a
biomolecular automaton which makes use of the restric-
tion enzymeBbvI, in which symbols are coded by means
of 6 pairs of nucleotides. On the other hand, they point
to the possibility of characterizing the idea of acting of an
algorithm which makes it possible to generate 40 symbols
for a biomolecular automaton using the restriction enzyme
BbvI. Let us note that the open question posed by Soreni
and co-workers (Soreni et al. 2005) relating to the possibil-
ity of obtaining 40 symbols for a biomolecular automaton
using the restriction enzyme BbvI can be generalized in
three possible ways: (1) as symbols coded with the use of
a different number of pairs of nucleotides, (2) as any other
restriction enzyme (used in a biomolecular automaton) and
also (3) as the possibility of using more than one restriction
enzyme in a biomolecular automaton. Thus, to point to the
possibilities of generalization of the question one can start
pondering over: (1) the possibility of generating the max-
imal number of symbols (coded byn pairs of nucleotides)
for a biomolecular automaton using one restriction enzyme,
(2) the possibility of generating the maximal number of
symbols (coded byn pairs of nucleotides) for a biomolecu-
lar automaton using more than one restriction enzyme, and
also (3) the possibility of an algorithmic approach in each
of the two indicated cases. In this way it is possible to raise
two general problems which are relativized to the num-
ber of restriction enzymes used in a biomolecular automa-
ton and require working out relevant algorithms. Problem
1: generate the maximal number of symbols (coded byn
pairs of nucleotides) for a biomolecular automaton using
one restriction enzyme. Problem 2: generate the maxi-
mal number of symbols (coded byn pairs of nucleotides)
for a biomolecular automaton using more than one restric-
tion enzyme. The above-posed problems require consid-
ering and deﬁning the conditions which must be imposed
on the relations between the restriction enzyme, symbols
and other elements which are components of a biomolec-
ular automaton. The output conditions which ought to be
considered and taken account of in the above-mentioned
relation are the conditions included in the works Krasi´ nski
et al. (2013) and Sakowski et al. (2017). Taking into ac-
count these conditions will make it possible to determine
all the indispensable conditions which serve to elaborate
on algorithms enabling to solve the both general problems
mentioned above. The solution to the mentioned general
problems make it possible to algorithms development for
the generating symbols, which are important for laboratory
implementation of biomolecular automata.
494 Informatica 43 (2019) 485–494 J. Waldmajer et al.
References
[1] Adleman, L. (1994). Molecular computation of solu-
tions to combinatorial problems. Science, 226, 1021-
1024.
https://doi.org/10.1126/science.7973651
[2] Benenson, Y ., Paz-Elizur, T., Adar, R., Keinan, E.,
Livneh, Z., & Shapiro, E. (2001). Programmable and
autonomous computing machine made of biomolecules.
Nature, 414, 430-434.
https://doi.org/10.1038/35106533
[3] Benenson, Y ., Adar, R., Paz-Elizur, T., Livneh, Z., &
Shapiro, E. (2003). DNA molecule provides a comput-
ing machine with both data and fuel. PNAS, 100, 2191-
2196.
https://doi.org/10.1073/pnas.0535624100
[4] Benenson, Y ., Gil, B., Ben-Dor, U., Adar, R., Shapiro,
E. (2004). An autonomous molecular computer for log-
ical control of gene expression. Nature, 429, 423–429.
https://doi.org/10.1038/nature02551
[5] Bennett, Ch. (1982). The Thermodynamics of compu-
tation – a Review. International Journal of Theoretical
Physics, 21(12), 905-940.
https://doi.org/10.1007/BF02084158
[6] Bennett, Ch., & Landauer, R. (1985). The fundamen-
tal physical limits of computation. Scientiﬁc American,
253, 48–56.
https://doi.org/10.1038/scientiﬁcamerican0785-48
[7] Chen, P., Jing, L., Jian, Z., Lin, H., Zhizhou, Z. (2007).
Differential dependence on DNA ligase of type II re-
striction enzymes: a practical way toward ligase-free
DNA automaton. Biochem. and Bioph. Research Com-
munications, 353, 733-737.
https://doi.org/10.1016/j.bbrc.2006.12.082
[8] Feynman, R. P. (1961). There’s plenty of room at the
bottom, In D. Gilbert (Ed.) Miniaturization, Reinhol,
282–296.
[9] Gopinath, A., Miyazono, E., Faraon, A., Rothemund,
P.W.K. (2016). Engineering and mapping nanocavity
emission via precision placement of DNA origami. Na-
ture, 535, 401-405.
https://doi.org/10.1038/nature18287
[10] Krasi´ nski, T., Sakowski, S., Waldmajer, J.,
Popławski, T. (2013). Arithmetical analysis of
biomolecular ﬁnite automaton. Fundamenta Infor-
maticae, 128, 463-474.
https://doi.org/10.3233/FI-2013-953
[11] Ran, T., Douek, Y ., Milo, L., & Shapiro, E. (2012). A
programmable NOR-based device for transcription pro-
ﬁle analysis. Scientiﬁc reports, 2, 641.
https://doi.org/10.1038/srep00641
[12] Rothemund P. W. K. (1995). DNA and restriction
enzyme implementation of Turing machines. Discrete
Mathematics and Theoretical Computer Science, 27, 75-
120.
https://doi.org/10.1090/dimacs/027/06
[13] Rothemund, P. W., Papadakis, N., & Winfree, E.
(2004). Algorithmic self-assembly of DNA Sierpinski
triangles. PLoS biology, 2(12), 2041-2053.
https://doi.org/10.1371/journal.pbio.0020424
[14] Rothemund, P.W.K (2006). Folding DNA to Create
Nanoscale Shapes and Patterns. Nature, 440, 297-302.
https://doi.org/10.1038/nature04586
[15] Seeman, N. (2001). DNA Nicks and Nodes and Nan-
otechnology. Nano Letters, 1, 22-26.
https://doi.org/10.1021/nl000182v
[16] Sakowski, S., Krasi´ nski, T., Sarnik, J., Blasiak, J.,
Waldmajer, J., Poplawski, T. (2017). A detailed exper-
imental study of a DNA computer with two endonu-
cleases. Zeitschrift für Naturforschung C, 72(7-8), 303-
313.
https://doi.org/10.1515/znc-2016-0137
[17] Sakowski, S., Krasinski, T., Waldmajer, J., Sarnik, J.,
Blasiak, J., & Poplawski, T. (2017). Biomolecular com-
puters with multiple restriction enzymes. Genetics and
molecular biology, 40(4), 860-870.
https://doi.org/10.1590/1678-4685-gmb-2016-0132
[18] Soreni, M., Yogev, S., Kossoy E., Shoham Y ., Keinan
E. (2005). Parallel biomolecular computation on sur-
faces with advanced ﬁnite automata. Journal of the
American Chemical Society 127, 3935-3943.
https://doi.org/10.1021/ja047168v
[19] Unold, O., Tro´ c, M., Dobosz, T., Trusiewicz, A.
(2004). Extended molecular computing model. WSEAS
Transactions on Biology and Biomedicine 1, 15-19.
[20] Waldmajer, J., Bonikowski, Z., Sakowski, S. (2019).
Theory of tailor automata. Theoretical Computer Sci-
ence 785, 60-82.
https://doi.org/10.1016/j.tcs.2019.02.002
[21] Whitesides, G. M., Mathias, J. P., & Seto, C. T.
(1991). Molecular self-assembly and nanochemistry: a
chemical strategy for the synthesis of nanostructures.
Science, 254(5036), 1312-1319.
https://doi.org/10.1126/science.1962191