https://doi.org/10.31449/inf.v43i4.2725 Informatica 43 (2019) 485–494 485 A Solution to the Problem of the Maximal Number of Symbols for Biomolecular Computer Jacek Waldmajer Institute of Computer Science, University of Opole, Oleska 48, 45-052 Opole, Poland E-mail: jwaldmajer@uni.opole.pl Sebastian Sakowski Faculty of Mathematics and Computer Science, University of Lodz, Banacha 22, 90-238 Lodz, Poland E-mail: sebastian.sakowski@wmii.uni.lodz.pl Keywords: biomolecular computer, biomolecular systems, DNA computing Received: March 15, 2019 The authors present a solution to the problem of generating the maximum possible number of symbols for a biomolecular computer using restriction enzyme BbvI and ligase as the hardware, and transition molecules built of double-stranded DNA as the software. The presented solution offers an answer to the open question, in the algorithm form, of the maximal number of symbols for a biomolecular computer that makes use of the restriction enzymeBbvI. Povzetek: Razvit je nov naˇ cin izraˇ cuna najveˇ cjega števila simbolov za biomulekularni raˇ cunalnik. 1 Introduction The beginnings of research into possibilities of applying biomolecules to control biological systems, and also to construct computers, are to be found in theoretical works of the 1960s (Feynman 1961). Then, in the 1980s, Charles Bennett (1982, Bennett and Landauer 1985) pointed to po- tential possibilities of application of biomolecules to con- struct energy-efficient nanodevices. However, the world had to wait to see the first practical experiments realiz- ing simple calculations with the use of biochemical reac- tions until the mid-1990s, when Leonard Adleman (1994) solved the problem of Hamilton’s path in graph, using ex- clusively a biomolecule for this purpose. Successive re- search revealed the possibility of spontaneous formation of multidimensional structures built from biomolecules, which were made with the use of the conception of self- assembly (Whitesides et al. 1991, Seeman 2001, Gopinath et al. 2016). The multidimensional DNA structures made it possible to realize fractals, e.g., ones of Sierpi´ nski tri- angle type (Rothemund 2004), which revealed a great po- tential in calculations based on self-assembly. In 2006, Paul Rothemund (2006) made use of self-assembling DNA molecules to obtain different multidimensional biomolec- ular structures. Properly prepared DNA molecules also made it possible to carry out a theoretical simulation of Tur- ing machine (Rothemund 1995). Prior to this, in 2001 (Be- nenson et al. 2001) a practically acting non-deterministic finite automaton based on such DNA molecules, restriction enzyme FokI and DNA ligase was presented. In succes- sive research, it was proved experimentally that such an automaton can work without the use of ligase enzyme (Be- nenson et al. 2003, Chen et al. 2007) and its complex- ities were extended in practical experiments, ones under- stood as the number of states using numerous restriction enzymes (Sakowski et al. 2017). It is worth adding that it was with success that laboratory experiments were car- ried out, in which this biomolecular system was applied to medical diagnosis and treatment (Benenson et al. 2004) and also to simple logical inference (Ran 2009). In an- other work which dealt with possibilities of applying DNA molecules, a challenge was taken up to not only increase the number of states of such an automaton (Unold et al. 2004), but also that of symbols possible for an automaton built from DNA (Soreni et al. 2005). Moreover, presented the notion of biomolecular automaton, informally charac- terized in the papers of Rothemund (1995), Benenson et al. (2001), Soreni et al (2005), was presented in a formal way (as a mathematical model called a tailor automaton in a new theory of tailor automata) in the paper Waldmajer et al. (2019). In the above-mentioned work, Soreni and co-workers (Soreni et al. 2005) put forward a 3-state 3-symbol biomolecular automaton which used the restriction enzyme BbvI as well as considered the problem of determining the maximal number of symbols for the constructed biomolec- ular automaton. On the basis of the conducted assessment they pointed out that it is possible to construct 40 symbols, each of which is composed of 6 pairs of nucleotides. How- ever, in their work, they pointed to merely 37 such symbols, including one which was erroneously determined. Conse- quently, they opened the following issue (p. 3937): It is still an open question whether the maximal number of 6- bp sequences that produce distinct 4-bp sticky ends in both 486 Informatica 43 (2019) 485–494 J. Waldmajer et al. strands is 40. It is with reference to this open question that the authors of the present work undertook and man- aged to solve the problem mentioned by Soreni et al. in their work (2005) through: (1) indicating 40 symbols (see Tab. 4) which make the solution to the open problem, (2) proposing the idea of working of an algorithm that enables to generate 40 symbols for a biomolecular automaton us- ing the restriction enzyme BbvI, and (3) formulating two general problems in the sphere of generating symbols for biomolecular automata which use one restriction enzyme (among which a biomolecular automaton using the restric- tion enzymeBbvI is a particular case) and more than one restriction enzyme. The second section presents the idea of constructing and working of a 3-state 3-symbol biomolecular automaton us- ing the restriction enzymeBbvI as presented by Soreni and co-workers in their work (Soreni et al. 2005). In the third section the conception of working of an algorithm generat- ing the maximal number of symbols for a biomolecular au- tomaton using the restriction enzymeBbvI was presented together with a discussion of various undesired situations which may occur in the course of working of a biomolecu- lar automaton that makes use of one restriction enzyme (in particular for the restriction enzymeBbvI). In the last sec- tion, there were formulated two general problems of gener- ating the maximal number of symbols for a certain class of biomolecular automata using one or more than one restric- tion enzymes. 2 Biomolecular finite automaton and the idea of its actions In this section, we make a presentation of the 3-state 3- symbol biomolecular finite DNA automaton (see Fig. 1), which was presented by Soreni and co-workers (Soreni et al. 2005). The automaton uses the restriction enzyme BbvI, ligase enzyme and DNA double-stranded fragments (input molecule, set of transition molecules and set of de- tection molecules). The double-stranded DNA fragments include the adenine, cytosine, guanine, and thymine bases marked as A, C, G and T, respectively. - s 2 ? b,c - a b s 0 ? a - c s 1 ? a,b,c Figure 1: Graph representing a 3-state 3-symbol determin- istic finite automatonM 1 . The task of the BbvI restriction enzyme is to cut the double-stranded DNA after recognizing a specific sequence (see Fig. 2A) in the double-stranded DNA. The BbvI restriction enzyme will cut the double- stranded DNA after the 8th nucleotide in the DNA strand in the 5 0 -3 0 direction and after the 12th nucleotide in the DNA strand in the 3 0 -5 0 direction from the recognized specific ? 6 . . - 8 . . . . - 12 . . - base pair (A) 5 0 - GCAGC -3 0 3 0 - CG T CG -5 0 (B) 5 0 - GCAGC T T AAA T C T GGC T T -3 0 3 0 - CG T CGAA T T T AGAC CGAA -5 0 Figure 2: (A) Specific sequence recognized by the BbvI restriction enzyme. (B) The action of restriction endonu- clease:BbvI. sequence (see Fig. 2B). The task of the ligase enzyme is to ligate the two double- stranded DNAs having complementary sticky ends (see Fig. 4A and 4B), where a sticky end is a single-stranded DNA at the end of a double-stranded DNA. In the given sense, the sticky end ‘TTTA’ of a single-stranded DNA (see Fig. 4A) is complementary to a sticky end ‘AAAT’ of the other double-stranded DNA (see Fig. 4B). The result of their ligation is one double-stranded DNA (see Fig. 4C). Both the restriction enzymeBbvI and the ligase enzyme play the key role in the action of a biomolecular automa- ton, determining, respectively: the operation of cutting of a fragment of the double-stranded DNA and the operation of ligating of two fragments of double-stranded DNAs. The input molecule (see Fig. 3) is a double-stranded DNA fragment in which it is possible to distinguish the following three basic parts: the input word x consisting of the symbols a, b and c (x = acb), the terminal sym- bol and the base sequence. At the both ends of the input molecule there occur additional base pairs and their occur- rence is determined by the properties related to the action of the restriction enzyme. To construct an input word of the 3-state 3-symbol deter- ministic finite automaton, the following three symbols:a,b andc (see Fig. 5) were used. These symbols were coded by means of six base pairs. Besides the aforementioned sym- bols, the additional terminal symbolt was introduced. This symbol is coded by means of the same number of base pairs as the symbolsa,b andc. This symbol was used to acquire an output molecule which is used to determine whether the automaton has finished acting in the required state and has accepted the input wordx. The base sequence consists of a certain number of base pairs, contains a specific sequence recognizable by the BbvI restriction enzyme, and makes it possible to define the start state by determining the cut place of the input molecule by the BbvI restriction enzyme (cf. Fig. 3 and Fig. 2B). Let us note that the term “base sequence” did not appear in work Soreni et al. (2005). Introducing this term is meant to clearly determine the manner of setting the start state of a biomolecular automaton. According to the idea contained in the work of Soreni and co-workers A Solution to the Problem of the. . . Informatica 43 (2019) 485–494 487 | {z } Abp | {z } base sequence | {z } wordx | {z } terminal symbol | {z } Abp a - c - b - t - 22 bp 21 bp 5 0 - GCAGC T T AAA T C T GGC T T GCGA T GAG T GA T G T CGC -3 0 3 0 - CG T CGAA T T T AGAC CGAACGC T AC T CAC T ACAGCG -5 0 Figure 3: Input molecule containing the input wordx =acb; Abp – Additional base pairs. Table 1: Connection of the statess 0 ,s 1 ands 2 of a biomolecular automaton with the permanent cut places of the symbols a,b,c andt of the biomolecular automaton. state symbola symbolb symbolc symbolt s 0 5 0 - C T GGC T -3 0 3 0 -GAC CGA-5 0 5 0 -GAG T GA-3 0 3 0 -C T CAC T -5 0 5 0 - T GCGA T -3 0 3 0 -ACGC T A-5 0 5 0 - T G T CGC -3 0 3 0 -ACAGCG-5 0 s 1 5 0 - C T GGC T -3 0 3 0 -GAC CGA-5 0 5 0 -GAG T GA-3 0 3 0 -C T CAC T -5 0 5 0 - T GCGA T -3 0 3 0 -ACGC T A-5 0 5 0 - T G T CGC -3 0 3 0 -ACAGCG-5 0 s 2 5 0 - C T GGC T -3 0 3 0 -GAC CGA-5 0 5 0 -GAG T GA-3 0 3 0 -C T CAC T -5 0 5 0 - T GCGA T -3 0 3 0 -ACGC T A-5 0 5 0 - T G T CGC -3 0 3 0 -ACAGCG-5 0 . . - sticky end . . . . . . . 5 0 - GCAGC T T -3 0 (A) 3 0 - CG T CGAA T T T A -5 0 . . - sticky end . . 5 0 - AAA T C T GGC T T G -3 0 (B) 3 0 - GAC CGAAC -5 0 5 0 - GCAGC T T AAA T C T GGC T T G -3 0 (C) 3 0 - CG T CGAA T T T AGAC CGAAC -5 0 Figure 4: (A) and (B) Double-stranded DNAs with the complementary sticky ends. (C) The result of a ligation be- tween the double-stranded DNAs with the complementary sticky ends included in (A) and (B). a 5 0 - C T GGC T -3 0 3 0 -GAC CGA -5 0 b 5 0 - GAG T GA -3 0 3 0 - C T CAC T -5 0 c 5 0 - T GCGA T -3 0 3 0 -ACGC T A -5 0 t 5 0 - T G T CGC -3 0 3 0 - ACAGCG -5 0 Figure 5: Symbolsa,b,c and the terminal symbolt. (Soreni et al. 2005), the reading of a symbol in a certain state of the automaton is identified with the cutting of the double-stranded DNA by the BbvI restriction enzyme in the area of a symbol, in a determined (permanent) place of the DNA strand, in the 5 0 -3 0 direction and in a determined (permanent) place of the DNA strand in the 3 0 -5 0 direction. Tab. 1 presents a connection between the states and two permanent cut places of the symbols. In accordance with the input molecule presented on Fig. 3, the first cutting input molecule with the use of the restric- tion enzymeBbvI will follow in the area of the symbola, which corresponds to the states 2 . In this way, in the state s 2 , the symbola was read and a fragment of DNA was ob- tained as presented on Fig. 6. In this sense, the states 2 is a initial state. Adding to the base sequence of one or two pairs of nucleotides can set the start state to be the follow- ing:s 1 ors 0 , respectively. c - b - t - 21 bp 5 0 -GGC T T GCGA T GAG T GA T G T CGC -3 0 3 0 - ACGC T AC T CAC T ACAGCG -5 0 Figure 6: Double-stranded DNA fragment obtained after BbvI acting on the input molecule. The set of transition molecules is used to implement a set of transitions in the 3-state 3-symbol deterministic fi- nite automaton. We obtain transition from one state to the other (the same or another state), upon reading a symbol, through ligating with the use of ligase enzyme, of a DNA fragment obtained on Fig. 6 with one of the transition molecules. Each transition molecule contains a specific se- quence recognizable by the BbvI restriction enzyme and the additional base pairs. Exemplary transition molecules are presented in Fig. 7: the transition molecule presented on Fig. 7A enables transition from the state s 0 to that of s 1 after reading the symbolc; the transition molecule pre- sented on Fig. 7B enables transition from the states 1 to that of s 1 after reading the symbol b; the transition molecule presented on Fig. 7C enables transition from the states 2 to that ofs 0 after reading the symbola. For each state, one detection molecule is constructed and thus a set of detection molecules is specified (see Fig. 8). It should be noted that the detection molecules have different numbers of additional base pairs, which makes it possible to determine laboratorily the state in which the automaton finished its action. The beginning of the work: the BbvI, ligase enzyme, many copies of the transition molecules and many copies 488 Informatica 43 (2019) 485–494 J. Waldmajer et al. (A) 35 bp 5 0 - GCAGC T -3 0 3 0 - CG T CGAACGC -5 0 (B) 39 bp 5 0 - GCAGC T T -3 0 3 0 - CG T CGAA T CAC -5 0 (C) 35 bp 5 0 - GCAGC T A T T -3 0 3 0 - CG T CGA T AAC CGA -5 0 Figure 7: Selected transition molecules used in the transi- tion function: (A)T 1 : (s 0 ,c)!s 1 , (B)T 2 : (s 1 ,b)!s 1 , (C)T 3 : (s 2 ,a)!s 0 . (A) 54 bp 5 0 - -3 0 3 0 - ACAG-5 0 (B) 200 bp 5 0 - -3 0 3 0 - CAGC-5 0 (C) 300 bp 5 0 - -3 0 3 0 - AGCG-5 0 Figure 8: (A) Detection moleculeD 1 for the states 0 . (B) Detection molecule D 2 for the state s 1 . (C) Detection moleculeD 3 for states 2 . of the detection molecules are placed in a laboratory tube; the final addition is many copies of the input molecule. Af- ter these elements have been mixed in the test tube, the biomolecular automaton starts its action. In successive steps there follows reading of the symbol a in the state s 2 (see Fig. 9a), making use of the transition molecule shown in Fig. 7C to transition from the state s 2 to that ofs 0 after reading the symbola (see Fig. 9b), reading of the symbolc in the states 0 (see Fig. 9c), using the transi- tion molecule presented in Fig. 7A to transition from the states 0 to the states 1 after reading the symbolc (see Fig. 9d) reading the symbolb in the states 1 (see Fig. 9e), using the transition molecule presented in Fig. 7B and reading the terminal symbol t in the state s 1 (see. Fig. 9f-g). In the last step there follows ligation of a fragment of double- stranded DNA presented in Fig. 9g with one of the detec- tion molecules (see Fig. 8B). As a result of ligation of these DNA fragments an output molecule is formed (see Fig. 9h), which – from the laboratory point of view – serves to de- termine the end state of a biomolecular finite automaton. 3 Algorithm for the problem of the maximal number of symbols 3.1 The formal aparatus used in the description of the algorithm Let the set = fA, C, G, Tg and the function , which is bijection of the set on , which is defined in the follow- ing way: (A) = T, (T) = A, (C) = G and (G) = C be given. The set is called a set of nucleotides, the elements of the set are called nucleotides, and the function is called complementarity of nucleotides. We call any finite sequence of nucleotides of the set as a word. The wordx which is the sequenceX 1 ;X 2 ;:::;X j of nucleotides of the set (X i 2 , 0 < i j2 N) is written as followsx = X 1 X 2 :::X j . The number of the elements of the sequencex is called the length of the word x (denoted symbolically:jxj), while thei-th nucleotide of the wordx (thei-th element of the wordx) asx(i). The set of all the words formed from the nucleotides of the set , whose length is greater than zero, is denoted as + . Let + 3x =X 1 X 2 :::X j (X i 2 , 0