Informatica 41 (2017) 305–315 305 
Accelerating XML Query Processing on Views 
Yin-Fu Huang and Yu-Hsien Cho 
National Yunlin University of Science and Technology 
123 University Road, Section 3, Touliu, Yunlin, Taiwan 640, R.O.C. 
E-mail: huangyf@yuntech.edu.tw, http://mdb.csie.yuntech.edu.tw 
 
Keywords: XPath, labeling schemes, materialized views, T-Bitmap, tag index, value index, navigation, twig query 
Received: April 13, 2016 
 
With the widespread use of the eXtensible Markup Language (XML), more and more applications store 
and query XML documents in XML database systems. Thus, how to efficiently process a query and find 
the specified patterns conforming the query from XML documents is a crucial issue. In this paper, some 
processing methods are employed on XML documents to improve document retrieval. First, a 
materialized view is built from an original document for each query. Then, on each materialized view, 
auxiliary structures such as T-Bitmap and indexes are also built to further accelerate query processing. 
Finally, four experiments are conducted to show the superiority of the proposed approach. 
Povzetek: Predstavljena je metoda za hitrejše iskanje po bazah XML dokumentov. 
1 Introduction 
Since XML (eXtensible Markup Language) was widely 
used to exchange data over the web, more and more 
applications store and query XML documents in XML 
database systems. Different from other data formats, an 
XML document is composed of elements and values with 
a nested structure, and could be modeled as a tree 
structure. XPath and XQuery are the standard XML 
query languages proposed by W3C. They can be used to 
describe patterns with specified predicates on multiple 
elements with tree structured relationships. However, 
how to efficiently process a query and find the specified 
patterns conforming the query from XML documents is a 
crucial issue. 
In the past, different methods have been proposed in 
querying XML documents. One of research directions 
was to build materialized views on XML documents. The 
goal is to reduce the number of visited nodes during tree 
traversing by searching from the root of a materialized 
view, rather than from the root of an original XML 
document tree. Another research direction was to 
construct index or access methods to query XML 
documents for facilitating query processing. In this 
paper, we integrate the methods from these two research 
directions as our motivation for accelerating XML query 
processing on views. The reason is that the performance 
of a materialized view is better than a non-materialized 
view because not only these data can be accessed without 
re-materialization, but also they can be fetched faster by 
building indexes on these data beforehand. Besides, a 
materialized view is usually used in accessing a large 
amount of data, such as data warehouse applications, in 
support of management’s decision-making process 
through OLAP queries, almost read operations In short, 
the motivation is for decision makers to accelerate XML 
query processing in a data warehouse. 
In summary, we highlight the contributions of this 
paper as follows: 
1) In this study, we build materialized views from an 
XML document for each query to reduce the search 
space of queries, and also build auxiliary structures 
such as T-Bitmap and indexes to further accelerate query 
processing. 
2) Comprehensive experiments are conducted to verify 
the superiority of the proposed approach. 
3) The space vs. time issue is explored when multiple 
materialized views are integrated together to save the 
space. 
The remainder of this paper is organized as follows. 
Section 2 presents the previous work proposed in 
querying XML documents. In Section 3, basic concepts 
such as query processing and materialized views on 
XML documents are introduced. In Section 4, we 
propose a system architecture consisting of view 
processing and query processing. In Section 5, four 
experiments are conducted to show the superiority of our 
approach. Finally, we make conclusions in Section 6. 
2 Previous work 
As mentioned in Section 1, one research direction on 
querying XML documents was to build materialized 
views on XML documents to reduce the number of 
visited nodes during tree traversing, thereby leading to 
faster query processing. Godfrey et al. [1], and Murthy 
and Banerjee [2] proposed SQL/XML syntax for query 
processing on views, whereas Halevy [3] and Jayavel et 
al. [4] proposed various query syntax such as join to 
handle views and focused on the problem of evaluating 
XML queries over XML views of relational data. 
However, users must be familiar with these various query 
syntax. Katsifodimos et al. [5] considered choosing the 
306 Informatica 41 (2017) 305–315 Y.-F. Huang et al. 
best views to materialize within a given space budget to 
improve the performance of a query. Roantree and Liu 
[6] approach is to segment a materialized view into 
fragments to minimize the effect of view changes. 
Bonifati et al. [7] presented an algebraic approach for 
propagating source updates to materialized views. Wu et 
al. [8, 9] proposed a bitmapped materialized views 
approach for optimizing XML queries. Gosain et al. [10] 
provided a survey of materialized view evolution 
methods, which aims at studying the materialized view 
evolution in relational databases and data warehouses as 
well as in a distributed setting. Gosain and Sachdeva [11] 
drew several conclusions about the status quo of 
materialized view selection and a future outlook is 
predicted on bridging the large gaps that were found in 
the existing methods. 
Another research direction was to construct index or 
access methods to query XML documents, also 
improving query processing. Some studies investigated 
constructing index methods to query XML documents 
[12-15]. Bruno et al. [12] and Jiang et al. [13] used a 
structure join method to determine element relationships 
based on the numbering scheme. This method has good 
performances for an ancestor-descendant axis, but it 
might fetch useless nodes for a parent-child axis, because 
all descendant nodes must be accessed to check if they 
are real children. Therefore, Huang and Wang developed 
an efficient query processing algorithm for retrieving 
XML documents [14]. Hsu et al. also proposed a path 
clustering method based on the concept of summary 
indexes for the processing of both structural and content 
queries on XML documents [15]. Karthiga and 
Gunasekaran [16] used tree-based association rules to 
mine the semantics from XML documents, which 
provide information on both the structure and the content 
of XML documents. The mined knowledge is used to 
provide the quick answers to queries and an approach 
called path based indexing is used to improve the speed 
of data retrieval. Alghamdi et al. [17], and Thi Le et al. 
[18] proposed approaches to optimizing twig queries by 
utilizing the semantics/constraints defined in XML 
schemas. Furthermore, Ordonez focused on the 
optimization of linear recursive queries in SQL [19]. 
Subramaniam and Haw [20] proposed an XML labeling 
scheme that helps quick determination of structural 
relationship among XML nodes and supports dynamic 
updates without relabeling nodes in case of update 
occurrences. Belgamwar et al. [21] follows an upside 
down approach which explicitly stores the values and 
only reconstructs the internal nodes, if needed. As a 
solution, they proposed a compressed internal storage 
format for native XML database systems where the inner 
structure of the gathered documents is virtualized. Ferro 
and Silvello [22] introduced a new paradigm where 
traditional approaches based on traversing trees are 
replaced by a brand new one based on basic set 
operations which directly return the desired subtree, 
avoiding to create it. Tudor [23] proposed an 
optimization model for XML data processing based on a 
heuristic algorithm to extract data from XPath views. 
3 Basic concepts 
3.1 XML documents 
XML is a markup language which was proposed by W3C 
in 1996. The main purpose of the standard language is to 
provide data descriptions and data exchanges across 
different platforms. Like other markup languages, the 
contexts of XML are declared between start and end tags; 
however, especially different from others, the tags can be 
flexibly defined by users to describe data, and 
furthermore XML is supported in different platforms and 
systems. That is why it becomes the most common 
format for data exchanges. 
An XML document is with a nested structure, and it 
could be represented as a rooted, ordered, and labeled 
tree structure. Figure 1 and Figure 2 illustrate an XML 
document and its corresponding tree representation, 
respectively. In the document, there is a unique root 
element called “root” and one of the descendant 
elements, called “Book”, has seven child element nodes; 
i.e., Title, Chapter, Para, Author with an attribute node 
“Id”, Publisher, Name, Email, and their texts. The 
symbols as shown in Figure 2 are circles, rectangles, and 
triangles; they represent elements, texts, and attributes, 
respectively. 
 
<?xml version="1.0" standalone="yes"?>
<root>
   <store>
      <Books category="Technology">
         <Book>
            <Title>How to know XML</Title>
            <Chapter>
               Introduction to XML
               <Para>Your First XML</Para>
            </Chapter>
            <Author Id="Q345">John</Author>
            <Publisher>
               <Name>XML tech</Name>
               <Email>John@hpdiy.zzn.com</Email>
            </Publisher>
         </Book>
         <Book>
              
         </Book>
      </Books>
   </store>
</root>
...
 
Figure 1: XML document. 
3.2 XPath 
XPath (XML Path Language) is an expression language 
for addressing and querying an XML document. In 
XPath expressions, each step is separated by "/" and 
contains three components: axis, node test, and 
predicate. Axis defines the relationship to be followed in 
the document tree. Node test defines what kind of nodes 
Accelerating XML Query Processing on Views Informatica 41 (2017) 305–315 307 
is required. Predicate is optional and provides the 
capability to filter nodes, according to selection criteria. 
Given an XPath example “//child::Publisher 
[child::Name='XML tech'] /child::Email” , it is to get the 
email of the publisher whose name is “XML tech”. When 
navigating the XML document, it must start from the 
root element “root”, then the descendant node 
“Publisher”. Beneath “Publisher”, we search the child 
nodes to find the node called “Email”. Besides, during 
the search, it must have a child node called “Name” 
whose text matches with the specified predicate “XML 
tech”. In general, the example above can be abbreviated 
to “//Publisher [Name ='XML tech'] /Email”. 
 
 
Figure 2: XML document tree. 
3.3 Labeling schemes 
One of the major query searches is to determine the 
relationships between nodes. In order to determine 
element relationships quickly, several different labeling 
schemes have been proposed. O'Connor and Roantree 
categorized labeling schemes into containment schemes, 
prefix schemes and prime number schemes [24]. Here, 
labeling schemes are classified into prefix-based ones 
and region-based ones (or containment schemes). 
Dewey code [25] is a prefix-based labeling scheme 
that records the position information of a node, according 
to the path from the root to the node. For example, 
Dewey-id of node “Para” is 1.1.1.1.2.2, and indicates that 
we can get node “Para” if we search alone the path (the 
first node of level 1, the first node of level 2, the first 
node of level 3, the first node of level 4, the second node 
of level 5, the second node of level 6). Besides, since 
(1.1.1.1.2) is the prefix of (1.1.1.1.2.2), the relationship 
between node “Chapter” (1.1.1.1.2) and “Para” 
(1.1.1.1.2.2) can be deduced as a parent-child one. 
However, the drawback of the prefix-based labeling 
scheme is its lengthy Dewey codes, especially when the 
levels of an XML document tree are too deep. 
The region-based labeling scheme [12] is another 
numbering scheme. The label contains three elements 
(start, end, level) where the start value and end value 
forms a region. The region of an upper-level node (i.e., 
ancestor or parent) must cover those of lower-level nodes 
(i.e. children or descendants). In other words, if node A 
covers node B, then A.start < B.start and B.end < A.end. 
Besides, the level value represents the node level in a 
document tree. With the coverage information, we can 
determine the relationships between nodes quickly. As 
for the labeling, we can label each node by traversing an 
XML document tree in a depth-first search way. 
3.4 XML document storage 
An XML documents can be stored in a few different 
forms, such as in flat files, in relational databases, and in 
native XML databases. For an XML document to be 
stored in flat files, we need to parse the files in advance 
before accessing them. Although it is the simplest form, 
the parsing time would be very lengthy when the XML 
document size is too large. Besides, it also incurs multi-
user access and concurrency control problems. For an 
XML document to be stored in relational database, since 
the XML document is a tree structure, it must use some 
middleware to translate the XML format into relational 
tables. Besides, when querying the XML document, it is 
also necessary to translate a query into an SQL 
statement, and execute join operations repeatedly among 
different relation tables, so that it exposes lower 
efficiency. Native XML databases aim to provide 
complete XML document storage and manipulation. 
Different from other database systems, native XML 
databases use an XML document as a basic unit of 
storage, and defines an XML model used to store and 
retrieve XML documents. 
3.5 Materialized views 
A view is a virtual and derived table defined by users for 
facilitating to express a complicated query. Rather than 
physically stored as parts of a database, a view definition 
is merely recorded by the database system. It is evaluated 
only when a user issues a query involving this view. 
However, a materialized view is the one which is 
physically stored in the database, in addition to its 
definition. Absolutely, the performance of a materialized 
view is better than a non-materialized view because not 
only these data can be accessed without re-
materialization, but also they can be fetched faster by 
building indexes on these data beforehand. Thus, a 
materialized view is usually used in accessing a large 
amount of data, such as a data warehouse or in business 
intelligence applications, where we need to take more 
time to query them. A data warehouse is a subject-
oriented, integrated, time-variant, and nonvolatile 
collection of data in support of management’s decision-
making process. It can be accessed by decision makers 
through OLAP queries, almost read operations. In short, 
our design is for decision makers to accelerate XML 
query processing in a data warehouse. 
In this paper, based on native XML databases, we 
use materialized views to query required data from an 
original document. Here, a materialized view can be 
defined using the “CREATE MATERIALIZED VIEW” 
308 Informatica 41 (2017) 305–315 Y.-F. Huang et al. 
function and an XPath expression. For the materialized 
views on an original document, we build auxiliary files 
and construct indexes using numbering schemes to avoid 
unnecessary sub-tree traversal, thereby improving the 
navigation efficiency of a query. 
4 System architecture 
4.1 Overview 
In order to achieve faster query processing on the views 
defined in a native XML database, we propose a system 
architecture consisting of an offline phase and an online 
phase, as shown in Figure 3. In the offline phase called 
view processing, we build view-relevant structures such 
as T-Bitmap and indexes to accelerate later query 
processing. In the online phase called query processing, 
the system can promptly respond to view-based queries, 
utilizing the T-Bitmap and indexes built beforehand. 
 
 
Figure 3: System architecture. 
4.2 View processing 
In this section, the motivation of view materialization is 
introduced first. Then, we build relevant structures such 
as T-Bitmap and indexes on materialized views to further 
accelerate query processing. 
4.2.1 View pre-processing 
Usually, an Xpath expression is used to address and 
query an XML document. However, for the query 
execution, the system always searches an XML 
document tree from the root. When a query is frequently 
executed, the system performance would be degraded 
since a large amount of unnecessary sub-tree traversal 
cannot be avoided. For the query with an Xpath 
expression as shown in Figure 4, we can define a 
materialized view beforehand, which is rooted from node 
“Books” with an attribute “category” matching with the 
specified predicate “Technology”, as shown in Figure 5. 
Then, the materialized view can be created from the 
original document, as shown in Figure 6. Thus, rather 
than traversing the original document tree always from 
the root, the system only needs to search the materialized 
view, thereby improving the navigation efficiency of the 
query. 
 
 
Figure 4: Query with an XPath expression. 
 
Figure 5: View definition. 
<?xml version="1.0" standalone="yes"?>
      <Books category="Technology">
         <Book>
            <Title>How to know XML</Title>
            <Chapter>
               Introduction to XML
               <Para>Your First XML</Para>
            </Chapter>
            <Author Id="Q345">John</Author>
            <Publisher>
               <Name>XML tech</Name>
               <Email>John@hpdiy.zzn.com</Email>
            </Publisher>
         </Book>
         <Book>
            <Title>Small World</Title>
            <Chapter>
               Q&A
               <Para>The One</Para>
            </Chapter>
            <Author Id="A854">Jimmy</Author>
            <Publisher>
               <Name>Network</Name>
               <Email>Jimmy@hpdiy.zzn.com</Email>
            </Publisher>
         </Book>
      </Books>
...
  
Figure 6: Materialized view. 
Before building T-Bitmap and indexes on a 
materialized view to further accelerate query processing, 
we must determine the relationships between nodes (i.e., 
parent-child axes and ancestor-descendant axes) in a 
materialized view using the region-based labeling 
scheme as mentioned in Section 3.3. We traverse a 
materialized view and label nodes in a depth-first search 
way. When a node is visited first, its start value is 
created; when we leave the node, the end value is 
labeled. After traversing the whole materialized view, all 
the nodes in the view are completely labeled as shown in 
Figure 7. 
 
CREATE MATERIALIZED VIEW mv AS( 
SELECT extract(sys_nc_rowinfo$, 
'/root/store/Books[@category="Technology"]') 
FROM XMLTABLE); 
XPath:/root/store/Books[@category="Technology"] 
/Book[Title="How to know XML"]/Publisher 
/Email=‘John@hpdiy.zzn.com’ 
Accelerating XML Query Processing on Views Informatica 41 (2017) 305–315 309 
Book(1,25)
Title(2,4)
How to know 
XML
(3,3)
Introduction 
to XML
(6,6)
Para(7,9)
Your First 
XML
(8,8)
Chapter(5,10)
Id(12,14)
Q345
(13,13)
John(15,15)
Author(11,16)
Publisher(17,24)
Name(18,20)
XML tech
(19,19)
John@hpdiy.zzn.com
(22,22)
Email(21,23)
Books(0,251)
…
 
Figure 7: Labeling nodes in a depth-first search way. 
4.2.2 Building T-Bitmap 
T-Bitmap is a bit string type, which is used to record 
what descendant nodes are beneath a current node. First, 
a dictionary recording the positions in T-Bitmap and the 
corresponding tags is created, as shown in Table 1. Then, 
the T-Bitmap value on each node can be calculated using 
OR operators. For an example as shown in Figure 8, the 
T-Bitmap of node “Publisher” can be calculated by 
combining the T-Bitmaps of “Name”, “Email”, and itself 
using OR operators. 
Table 1: Dictionary: positions in T-Bitmap and 
corresponding tags. 
Position 0 1 2 3 
Tag Books Book Title Chapter 
Position 4 5 6 7 
Tag Para Author Id Publisher 
Position 8 9 - - 
Tag Name Email - - 
 
        Name         0000000010
OR   Email         0000000001
-----------------------------------------------------
                            0000000011
OR  Publisher   0000000100
-----------------------------------------------------
                                 0000000111
                         (b)
Title(2,4)
(0010000000)
Chapter
(5,10) Author(11,16) Publisher(17,24)
Book(1,25)
(0111111111)
Para(7,9)
(0000100000)
Id(12,14)
(0000001000)
Name(18,20)
(0000000010)
Email(21,23)
(0000000001)
(0000000111)(0000011000)(0001100000)
OR operator
OR operator
Level2
Level3
Level4
(a)
Level1
...
Books(0,251)
(1111111111)
OR operator
 
Figure 8: Combing T-Bitmaps using OR operators. 
4.2.3 Building index 
Here, we use labeling codes to build two kinds of index 
trees; i.e., tag index trees and value index trees. To 
illustrate the tag index construction, we extend the 
storage model in Figure 7. As shown in Figure 9, we can 
see a lot of nodes with the same tag names but with the 
different labeling codes; e.g., node “Email(21, 23)” and 
“Email(46, 48)”. We can build the index tree of each tag 
using the start values in labeling codes as keys and the 
well-known B+-tree algorithm, as shown in Figure 10 
where the pointers of a leaf node in the tag index tree 
indicate the positions of corresponding nodes in the 
materialized view. For the query with an XPath: 
“Book//Email”, when processing the current node 
“Book(26, 50)”, we can use the tag index tree of “Email” 
to locate each leaf node by following the dotted path, and 
find out node “Email(46, 48)” covered by node 
“Book(26, 50)”. 
  
Figure 9: Extended storage model. 
  
Figure 10: Tag index tree. 
Besides, we can also build a value index tree 
according to the text values of nodes in the document, as 
shown in Figure 11. The construction method is the same 
as that used to build tag index trees. However, we 
generate only one value index tree for each materialized 
view, and the records of a leaf node are with the [text, 
start, pointer] format where the pointers also indicate the 
positions of corresponding nodes in the materialized 
view. 
4.3 Query processing 
In this section, the query transformation based on a 
materialized view is introduced first. Then, according to 
different axes specified in the transformed query, we 
make use of the T-Bitmap and indexes built in the view 
processing to accelerate query processing. Finally, we 
also introduce subsequent processing for a query 
specifying the particular predicate in an Xpath 
expression. 
John,15
Introduction to XML,6 Q345,13 The One,33
pA854,38
pHow to know XML,3
pIntroduction to XML,6
pJimmy@hpdiy.zzn.com,47
pJimmy,40
pJohn,15
pNetwork,44
pJohn@hpdiy.zzn.com,22
pQ345,13
pSmall World,28
pQ&A,31
pThe One,33
PYour First XML,8
pXML,19
 
Figure 11: Value index tree. 
310 Informatica 41 (2017) 305–315 Y.-F. Huang et al. 
4.3.1 Query pre-processing 
After materialized views are created, a query should be 
transformed based on its corresponding materialized 
view. For the query as shown in Figure 4, its Xpath 
expression (traversing from node root) is transformed 
into a new one (traversing from node “Books”) as shown 
in Figure 12. 
 
 
Figure 12: Query transformation based on a materialized 
view. 
Before utilizing T-Bitmap and indexes to accelerate 
query processing, we must deal with parent-child axes 
and/or ancestor-descendant axes specified in the 
transformed query. We execute navigation or indexing 
according to different axes, and recursively check nodes 
in the document tree. 
4.3.2 Navigation 
For a parent-child axis, we use a navigation way to 
search nodes in the document tree. Here, T-Bitmap can 
be used to avoid unnecessary search during the 
navigation, since it provides the information whether 
result nodes are beneath the current processing node. For 
a query “/R/A/C” as shown in Figure 13, we can use an 
AND operator to determine whether node C is beneath 
the current processing node. First, for current node R, 
Query(11010)^R(11111)=Query(11010) indicates node 
C is beneath node R. Then, for the next node A1, 
Query(01010)^A1(01100)Query(01010) indicates node 
C cannot be beneath node A1, and we do not need to 
search the sub-tree rooted at node A1. Next, for node A2, 
Query(01010)^A2(01110)=Query(01010) indicates node 
C is beneath node A2. Finally, we recursively check 
nodes in the document tree until node C is found. 
4.3.3 Indexing 
For a query “/A//B∙ ∙ ∙” specifying an ancestor-
descendant axis as shown in Figure 14, although we can 
also use T-Bitmap to search node B, the search based on 
a parent-child axis would go through a lot of unnecessary 
intermediate nodes. Therefore, we use indexes to directly 
search a descendant node, instead of using T-Bitmap. As 
mentioned in Section 4.2.3, we use the start value in the 
labeling code of the ancestor as the key to search the tag 
index tree of the descendant. Then, we check whether the 
descendant node is covered by the ancestor node; if yes, 
we fetch the descendant node and proceed to parse the 
query downward. 
4.3.4 Subsequent processing 
In this section, we investigate the processing for the 
query specifying a particular predicate. One is the query 
specifying values, and another is the twig query. 
4.3.4.1 Query specifying values 
For a query “/Books/Book[Author = ‘Jimmy’]/Publisher 
= ‘XML tech’ ”, two values are specified for filtering 
nodes in the document. As shown in Figure 15, we use 
the value index tree as mentioned in Section 4.2.3 to find 
out value nodes “Jimmy(12)”, “XML tech1(7)”, and 
“XML tech2(15)”, and then put them into their 
corresponding queues, respectively. When processing 
node “Book” in the query, we fetch node “Book1”, and 
find that node “Jimmy(12)” is not covered by node 
“Book1”; i.e., [2,9] < 12. Then, for the next node 
“Book2”, although node “Book2” covers node 
“Jimmy(12)”, node “XML tech1(7)” is not covered by 
node “Book2”. Then, we choose the next value node 
“XML tech2(15)” and do the same inspection. Finally, 
we find out node “Book2” is the result node. 
Book1
1111
Books
Book2
John
XML
tech1
Jimmy
XML
tech2
0111 0111
0010 0001 0010 0001
Author1 Author2Publisher1 Publisher2
(1,18)
(2,9) (10,17)
(3,5) (6,8) (11,13) (14,16)
4 7 12 15
Book
Jimmy
0111
0010 0001
Author Publisher
Query: /Books/Book[Author = ‘Jimmy’]/Publisher = ‘XML tech’
BookBooks PublisherAuthor
T-Bitmap
XML
tech
Jimmy
XML
tech2
XML
tech1
V1 V2
V1 queue V2 queue
Books 1111
 
Figure 15: Value node processing. 
SELECT extract(sys_nc_rowinfo$,'/Books 
/Book[Title="How to know XML"]/Publisher 
/Email=' John@hpdiy.zzn.com '') 
FROM mv; 
R
A
C
Query
11010
01010
00010
R
A3A2A1
B B C B D
11111
01100
00100 00100 00010 00100 00001
01110 01101
R A B C D
T-Bitmap
 
Figure 13: Navigation using T-Bitmap. 
A
B
.......
......
Ancestor-descendant 
relationship
intermedia nodes
 
Figure 14: Unnecessary intermediate nodes. 
Accelerating XML Query Processing on Views Informatica 41 (2017) 305–315 311 
4.3.4.2 Twig Query 
Besides, for a query “/Book[Publisher]/Author” as shown 
in Figure 16(a), we can also process it in the same way as 
done in Section 4.3.2. As shown in Figure 16(b), two 
solutions “Book, Publisher, Author1, Jim” and “Book, 
Publisher, Author2, Jimmy” exist in the document. They 
can be found by 1) merging Path1 and Path2, and 2) 
merging Path1 and Path3, as shown in Figure 16(c). 
 
Book
Publisher Author
Book
Publisher Author1 Author2
Jim Jimmy
PublisherBook Author
T-Bitmap
Book
Publisher
Book
Author1
Book
Author2
twig query
/Book[Publisher]/Author
(a)
XML document
(b) (c)
Path 1 Path 2 Path 3
 
Figure 16: Twig query processing. 
5 Experiments 
In this section, four experiments are conducted to show 
the superiority of our approach proposed in this paper. 
These experiments are written in Java (JDK1.7) and 
conducted on an Intel Pentium4 3GHz CPU with 3G 
main memory in Windows 7. In the first experiment, we 
present the comparisons among different methods. In the 
second experiment, we investigate the effect of query 
types on different methods, especially on our method. In 
the third experiment, we use synthesis documents to 
analyze our method, and try to find some characteristics. 
In the last experiment, we address the space vs. time 
issue if multiple materialized views can be integrated 
together to save the space; in other words, more than one 
query would search from a materialized view. 
5.1 Comparisons among different methods 
In this experiment, we compare the search ways in 
different methods as shown in Figure 17. The first 
method is the original search way which is always from 
the root of a document tree. The second method was 
proposed by Godfrey et al. [1], which searches from the 
root of a materialized view. The last method is ours 
which also searches from the root of a materialized view, 
but with the aid of auxiliary data structures. 
To fairly compare with the method proposed by 
Godfrey et al., an XML benchmark available on the 
XMark site is used in the experiment, which has data size 
113.794MB and 1,513,518 nodes. Also, we follow the 
similar query types and comparison ways used in the 
experiments conducted by Godfrey et al. As shown in 
Table 2, there are twelve different types of queries tested 
in the experiment. To contrast with the searching ways as 
shown in Figure 17, the columns as shown in Table 3 are 
1) query types, 2) searching time on the original tree, 3) 
view creation time, 4) searching time using Godfrey et 
al.' method, and 5) searching time using our method. The 
experimental results show that our method performs 
much better than the Godfrey et al.' method in all the 
queries, especially for long-path and twig queries Q3, 
Q7, Q8, Q9, and Q11. This is why our method uses the 
T-Bitmap and index structures to accelerate query 
processing. 
5.2 Effect of query types on different 
methods 
In this experiment, we use the same XML benchmark as 
the first experiment, but different queries as shown in 
Table 4. For the searching axis, Q1 and Q2 are based on 
the parent-child axis, Q3 and Q4 on the ancestor-
descendant axis, and the others on mixed axes. For the 
query types, Q1, Q3, Q5, Q6, and Q7 are path queries, 
whereas the others are twig queries. Furthermore, Q1, 
Q3, Q4, Q5, Q6, Q7, and Q10 are with value predicates. 
In this experiment, as shown in Table 5, our method 
is still the best one among different methods in all the 
queries. Even taking the worst case Q8 as an example, 
our method is 206 times faster than the original way, and 
88 times faster than the Godfrey et al.' method. The 
reason is, as shown in Table 6, the number of nodes 
visited for Q8 in our method is only 1/28 times of the 
original way, and is only 1/10 times of the Godfrey et al.' 
method. 
For the path queries (i.e., Q1, Q3, Q5, Q6, and Q7), 
both execution time of the Godfrey et al.' method and 
ours is less than one second. For the twig queries (i.e., 
Q2, Q8, and Q9), they cost more execution time than the 
path queries since a large amount of nodes are visited. 
However, for the similar twig queries (i.e., Q4 and Q10), 
they do not cost much execution time since only a very 
small amount of nodes are required to visit. In summary, 
the advantages of our method are 1) when dealing with 
twig queries, we only need to check T-Bitmap and skip 
an entire sub-tree if not matched, and 2) when dealing 
with the queries with value predicates, we can use the 
value index tree to achieve efficient processing. 
From begining to end
rootOriginal search
Index search
Godfrey method Our method
 
Figure 17: Search ways in different methods. 
 
312 Informatica 41 (2017) 305–315 Y.-F. Huang et al. 
5.3 Experiments on synthesis documents 
In this experiment, six synthesis documents with 
different fanout are used to analyze our method for three 
different types of queries as shown in Table 7. Q1 is 
based on the parent-child axis, Q2 is based on the 
ancestor-descendant axis, and Q3 is a twig query with 
three predicates and based on mixed axes. As shown in 
Table 8, we find that the execution time of each query 
increases as the fanout increases. Moreover, regardless of 
the complexity in Q3, it still costs almost the same time 
as Q1 and Q2 using our method; i.e., its execution time 
would not increase significantly even if it is a twig query 
with three predicates. However, for Q3 using the original 
way and/or the Godfrey et al.' method, their execution 
time increases seriously as shown in Table 9. Especially 
for the Godfrey et al.' method, the execution time for 
fanout30 is 202 times slower than that for fanout5. 
5.4 Experiments on space vs. time 
In this experiment, we explore the space vs. time issue 
when multiple materialized views are integrated together. 
In order to achieve the premise that more than one query 
can search from a materialized view, we reuse seven 
queries (i.e., Q3, Q4, Q5, Q6, Q7, Q8 and Q10) as shown 
in Table 4. Then, we find that 1) Q3, Q6, and Q7 can 
search from the materialized view built based on Q3, 2) 
Q4 and Q10 can search from the materialized view built 
based on Q4, and 3) Q5 and Q8 can search from the 
materialized view built based on Q8. Thus, after the 
integration, we have three materialized views for these 
seven queries. The data space and execution time 
between no integration and integration are shown in 
Table 10. Taking the group (Q3, Q6, Q7) as an example, 
the overall data space is 3+1+3=7(KB) and the total 
execution time is 3+2+12=17(ms) if each query has its 
own materialized view; however, after the integration, 
the data space is only 3(KB), but the total execution time 
increases to 3+5+26=34(ms). 
In order to explore the relationship between data 
space and execution time for these two strategies (i.e., 
no-integration and integration), we define two terms: 1) 
amount ratio for data space and 2) speed ratio for 
execution time as follows. 
Table 2: Twelve kinds of queries. 
Q1 /site/regions/europe/item/mailbox/mail 
Q2 /site//item/mailbox/mail 
Q3 /site//africa/item/description/parlist/listitem 
Q4 /site//person/profile/interest[@category] 
Q5 /site//person/profile[age]/interest[@category="category620"] 
Q6 /site//person/profile[contains(age,"18")]/education 
Q7 /site//open_auction[@id="open_auction5"]//date 
Q8 /site//category/description[text]/parlist/listitem 
Q9 /site//category/description[text/keyword]/parlist/listitem 
Q10 /site/*/*/item/mailbox/mail 
Q11 /site//*//africa/item/name 
Q12 /site//item/mailbox[count(mail)] 
Table 3: Comparisons among different methods. 
Time(ms) 
Query Original tree 
% View 
creation 
* View non-
indexed 
# View 
indexed 
Q1 193718 30131 64980 5752 
Q2 195146 32147 84574 20599 
Q3 310315 31137 100130 593 
Q4 185361 32890 99890 29436 
Q5 193474 31147 147344 6865 
Q6 191034 29702 65248 9106 
Q7 203447 27112 132522 279 
Q8 212511 28317 60353 232 
Q9 241417 30169 64384 916 
Q10 185357 31603 80727 19811 
Q11 199274 28417 65059 320 
Q12 209115 31731 99283 20687 
  % View creation: Views created for both methods 
  * View non-indexed: Godfrey et al.' method 
  # View indexed: Ours 
 
Accelerating XML Query Processing on Views Informatica 41 (2017) 305–315 313 
( )
( )
strategy
space strategy
amount ratio
space original
                         (1) 
( )
( )
strategy
time strategy
speed ratio
time original
                               (2) 
where space(strategy) is the overall data space used in 
the no-integration or integration strategies, time(strategy) 
is the total execution time required in the no-integration 
or integration strategies, space(original) is the data space 
of the original document, and time(original) is the 
execution time on the original document. 
Then, we can use these two terms to judge which 
strategy is better for the system performance as follows. 
integration no integration
no integration integration
amount ratio speed ratio
amount ratio speed ratio


 

 
 
(3) 
or  
( ) ( )
( ) ( )
space integration time no integration
space no integration time integration



  (4) 
For Equation (4), the former term represents that the 
data space benefits for the integration strategy can be 
Table 4: Different kinds of queries. 
Q1 
Q2 
/site/open_auctions/open_auction[@id="open_auction5"]/initial 
/site/open_auctions/open_auction[annotation/author]/bidder/date 
Q3 
Q4 
//site//open_auctions//open_auction[@id="open_auction0"]//current 
//person[@id="person0"][creditcard]//watch 
Q5 
Q6 
Q7 
Q8 
Q9 
Q10 
/site/regions//item[@id="item0"]//mail 
//open_auction[@id="open_auction0"]/bidder/date 
/site/open_auctions/open_auction[@id="open_auction0"]/../end 
/site/regions//item[//text/bold]//location 
//closed_auctions/closed_auction[//description/text]/seller 
//people/person[@id="person0"][//business]/name 
Table 5: Execution time. 
Time(ms) 
Query Original tree 
View 
creation 
View non-
indexed 
View 
indexed 
Q1 31818 14920 193 8 
Q2 68769 15731 37949 1883 
Q3 29718 11787 140 3 
Q4 30989 10056 103 3 
Q5 27976 10705 119 3 
Q6 30123 11723 48 2 
Q7 32680 12135 139 12 
Q8 4344137 222996 1853537 21111 
Q9 1560306 246697 201444 16185 
Q10 28425 10374 70 2 
Table 6: Number of nodes visited. 
Nodes 
Query Original tree 
View non-
indexed 
View 
indexed 
Q1 48005 57 5 
Q2 259604 87331 23794 
Q3 24001 64 3 
Q4 31502 24 8 
Q5 34124 50 3 
Q6 25960 8 5 
Q7 24005 64 3 
Q8 1204343 432651 43502 
Q9 736217 102736 39003 
Q10 25647 24 5 
Table 7: Three kinds of queries. 
Q1 /root/L1/R 
Q2 //root//R 
Q3 /root//L1[//R][//Q][//S] 
 
314 Informatica 41 (2017) 305–315 Y.-F. Huang et al. 
gained (i.e., data space reduced) whereas the latter term 
represents that the execution time benefits for the no-
integration strategy can be gained (i.e., execution time 
reduced). If Equation (4) with the equal weighting 
between space and time is true, the integration strategy 
should be adopted. According to the data taken from 
Table 10, we can calculate the former term and latter 
term in Equation (4), as shown in Table 11. From the 
statistical data, we find that the integration strategy is 
better for group (Q3, Q6, Q7) and group (Q4, Q10), but 
the no-integration strategy is better for group (Q5, Q8). 
Absolutely, different strategies can be adopted for 
different query groups at the same time to make the 
system performance in the best status. 
6 Conclusions 
In this paper, we employ some processing methods on 
XML documents to improve document retrieval. The 
goal is to reduce the number of visited nodes during tree 
traversing, thereby leading to faster query processing. To 
achieve this goal, we focus on the usage of database 
views. First, we build a materialized view from an 
original document for each query. Then, on each 
materialized view, we also build auxiliary structures such 
as T-Bitmap and indexes to further accelerate query 
processing. According to different axes specified in an 
Xpath expression, we have different techniques to handle 
them. Finally, through the experiments, we 1) compare 
the performances among different methods, 2) 
investigate the effect of query types on them, 3) use 
Table 8: Comparisons for different fanout in our method. 
Time(ms) 
 Q1 Q2 Q3 
Fanout5 358 326 392 
Fanout10 571 554 569 
Fanout15 1065 969 1023 
Fanout20 1514 1432 1486 
Fanout25 2164 2023 2137 
Fanout30 2701 2579 2658 
Table 9: Comparisons for different fanout in different methods. 
Q3 Time(ms) 
 
Original tree 
View non-
indexed 
View 
indexed 
Fanout5 143618 419 392 
Fanout10 278615 796 569 
Fanout15 436672 2700 1023 
Fanout20 655059 10815 1486 
Fanout25 678379 38339 2137 
Fanout30 879231 84779 2658 
Table 10: Comparisons between no integration and integration. 
No integration Space(KB) Time(ms) 
Q3 3 3 
Q4 1 3 
Q5 2 3 
Q6 1 2 
Q7 3 12 
Q8 56226 21111 
Q10 1 2 
 
Integration Space(KB) Time(ms) 
(Q3, Q6, Q7) 3 3+5+26 
(Q4, Q10) 1 3+2 
(Q5, Q8) 56226 16491+21111 
Table 11: Comparisons based on Equation (4). 
 (Q3, Q6, Q7) (Q4, Q10) (Q5, Q8) 
Former term 3/7=0.43 1/2=0.5 56226/56228=1 
Latter term 17/34=0.5 5/5=1 21114/37602=0.56 
 
Accelerating XML Query Processing on Views Informatica 41 (2017) 305–315 315 
synthesis documents to analyze our method, and 4) 
address the space vs. time issue if materialized views are 
integrated together. 
References 
[1]  Godfrey P, Gryz J, Hoppe A, et al. Query rewrites 
with views for XML in DB2. In: Ioannidis Y, Lee 
D, Ng R, eds. Proceedings of the IEEE 25th 
International Conference on Data Engineering, 
Shanghai, China, 2009. 1339-1350 
[2]  Murthy R, Banerjee S. XML schemas in Oracle 
XML DB. In: VLDB Endowment, eds. Proceedings 
of the 29th International Conference on Very Large 
Data Bases, Berlin, Germany, 2003. 1009-1018 
[3]  Halevy Y. Answering queries using views: a 
survey. Very Large Data Bases Journal, 2001, 10: 
270-294 
[4]  Jayavel S, Jerry K, Eugene S, et al. Querying XML 
views of relational data. In: VLDB Endowment, 
eds. Proceedings of the 27th International 
Conference on Very Large Data Bases, Rome, Italy, 
2001. 261-270 
[5]  Katsifodimos A, Manolescu I, Vassalos V. 
Materialized view selection for XQuery workloads. 
In: Fuxman A, eds. Proceedings of ACM SIGMOD 
International Conference on Management of Data, 
Scottsdale, Arizona, USA, 2012. 565-576 
[6]  Roantree M, Liu J. A heuristic approach to 
selecting views for materialization. Software: 
Practice and Experience, 2014, 44: 1157-1179 
[7]  Bonifati A, Goodfellow M, Manolescu I, et al. 
Algebraic incremental maintenance of XML views. 
ACM Transactions on Database Systems, 2013, 38: 
14:1-14:45 
[8]  Wu X, Theodoratos D, Wang W H, et al. 
Optimizing XML queries: bitmapped materialized 
views vs. indexes. Information Systems, 2013, 38: 
863-884 
[9]  Wu X, Theodoratos D, Kementsietsidis A. 
Configuring bitmap materialized views for 
optimizing XML queries. World Wide Web, 2015, 
18: 607-632 
[10]  Gosain A, Sabharwal S, Gupta R. Architecture 
based materialized view evolution: a review. 
Procedia Computer Science, 2015, 48:256-262 
[11]  Gosain A, Sachdeva K. A systematic review on 
materialized view selection. In: Satapathy S et al., 
eds. Proceedings of the 5th International 
Conference on Frontiers in Intelligent Computing: 
Theory and Applications. Advances in Intelligent 
Systems and Computing, Vol. 515. Springer, 
Singapore, 2017. 663-671 
[12]  Bruno N, Koudas N, Srivastava D. Holistic twig 
joins: optimal XML pattern matching. In: Franklin 
M J, eds. Proceedings of ACM SIGMOD 
International Conference on Management of Data, 
Madison, WI, USA, 2002. 310-321 
[13]  Jiang H, Wang W, Lu H, et al. Holistic twig joins 
on indexed XML documents. In: VLDB 
Endowment, eds. Proceedings of the 29th 
International Conference on Very Large Data 
Bases, Berlin, Germany, 2003. 273-284 
[14]  Huang Y F, Wang S H. An efficient XML 
processing based on combining T-Bitmap and index 
techniques. In: Biaz S, Bellaachia A, eds. 
Proceedings of the IEEE International Symposium 
on Computers and Communication, Marrakech, 
Morocco, 2008. 858-863 
[15]  Hsu W C, Liao I E, Wu S Y, et al. An efficient 
XML indexing method based on path clustering. In: 
Alhajj R S, eds. Proceedings of the 20th IASTED 
International Conference on Modeling and 
Simulation, Banff, Alberta, Canada, 2009. 339-344 
[16]  Karthiga D, Gunasekaran S. Optimization of query 
processing in XML document using TAR and path 
based indexing. International Journal of Computer 
Science and Network Security, 2013, 13: 119-127 
[17]  Alghamdi N S, Rahayu W, Pardede E. Object-based 
semantic partitioning for XML twig query 
optimization. In: Barolli L et al., eds. Proceedings 
of the IEEE 27th International Conference on 
Advanced Information Networking and 
Applications, Barcelona, Spain, 2013. 846-853 
[18]  Thi Le D X, Maghaydah M, Orgun M A, et al. 
Optimization of XML queries by using semantics in 
XML schemas and the document structure. In: Lin 
X et al., eds. Proceedings of the 14th International 
Conference on Web Information Systems 
Engineering, Nanjing, China, 2013. 343-353 
[19]  Ordonez C. Optimization of linear recursive queries 
in SQL. IEEE Transactions on Knowledge and Data 
Engineering, 2010, 22: 264-277 
[20]  Subramaniam S, Haw S C. ME labeling: a robust 
hybrid scheme for dynamic update in XML 
databases. In: Ismail M, Ramli N, eds. Proceedings 
of the IEEE 2nd International Symposium on 
Telecommunication Technologies, Langkawi, 
Malaysia, 2014. 126-131 
[21]  Belgamwar H C, Dhore S M, Rathod P U, 
Deshmukh S S, Nandanwar G S. Review on storing 
and indexing XML documents upside down. 
International Journal for Engineering Applications 
and Technology, 2015, Manthan-15 
[22]  Ferro N, Silvello G. Descendants, ancestors, 
children and parent: a set-based approach to 
efficiently address XPath primitives. Information 
Processing & Management, 2016, 52:399-429 
[23]  Tudor N L. Query optimization against XML data. 
Studies in Informatics and Control, 2016, 25:173-
180 
[24]  O’Connor M F, Roantree M. Desirable properties 
for XML update mechanisms. In: Proceedings of 
the 2010 EDBT/ICDT Workshops, Lausanne, 
Switzerland, 2010 
[25]  Lu J, Ling T W, Chan C Y, et al. From region 
encoding to extended Dewey: on efficient 
processing of XML twig pattern matching. In: 
VLDB Endowment, eds. Proceedings of the 31st 
International Conference on Very Large Data 
Bases, Trondheim, Norway, 2005. 193-204