https://doi.or g/10.31449/inf.v47i3.4742 Informatica 47 (2023) 315–326 315
An Automatic Labeling Method for S ubword-Phrase Recognition in Effective
T ext Classification
Y usuke Kimura
1
, T akahiro Komamizu
2
and Kenji Hatano
3
1
Graduate School of Culture and Information Science, Doshisha University , Japan
2
Mathematical and Data Science Center , Nagoya University , Japan
3
Faculty of Culture and Information Science, Doshisha University , Japan
E-mail: usk@acm.or g, taka-coma@acm.or g, hatano@acm.or g
Keywords: text classification, subword-phrase, multi-task learning
Received: March 16, 2023
The deep learning-based text classification methods perform better than traditional ones. In addition to
the success of the deep learning technique, multi-task learning (MTL) has come to become a pr omising
appr oach for text classification; for instance, an MTL appr oach in text classification employs named entity
r ecognition as an auxiliary task and has showcased that the task helps to impr ove the text classification
performance. Existing MTL-based text classification methods depend on the auxiliary tasks using super -
vised labels. Obtaining such supervision labels r equir es additional human and financial costs in addition
to those for the main text classification task. T o r educe these additional costs, we pr opose an MTL-based
text classification framework on supervised label cr eation by automatically labeling phrases in texts for the
auxiliary r ecognition task. A basic idea to r ealize the pr oposed framework is to utilize phrasal expr essions
consisting of subwor ds (called subwor d-phrases). T o the best of our knowledge, no text classification ap-
pr oach has been designed on top of subwor d-phrases because subwor ds only sometimes expr ess a coher ent
set of meanings. The novelty of the pr oposed framework is in adding subwor d-phrase r ecognition as an
auxiliary task and utilizing subwor d-phrases for text classification. It extracts subwor d-phrases in an unsu-
pervised manner using the statistics appr oach. T o construct labels for effective subwor d-phrase r ecognition
tasks, extracted subwor d-phrases ar e classified based on document classes to ensur e that subwor d-phrases
dedicated to some classes can be distinguishable. Experimental evaluation for text classification using
five popular datasets showcased the effectiveness of the subwor d-phrase r ecognition as an auxiliary task.
It also showed that comparing various labeling schemes in r ecent studies indicated insights for labeling
common subwor d-phrases among several document classes.
Povzetek: Za klasifikacijo besedil je uporabljeno globoko učenje in večopravilno učenje iz uporabo
podbesednih fraz za avtomatsko označevanje.
1 Intr oduction
T ext classification is a fundamental technology that has
been studied for a long time. Applications that use text clas-
sification include speech [7], categorizing daily news arti-
cles, and unfair clause detection in terms of services [15].
These text classification applications are achieved by ef-
fectively and ef ficiently retrieving information from lar ge
amounts of text [12, 23]. T ext classification is a super -
vised learning task manually assigning labels to documents
as classification criteria, such as categories and classes.
A classifier learns classification criteria in a feature space
based on the dataset. T raditionally , text classification uses
hand-crafted features such as term frequency-inverse docu-
ment frequency . In recent literature, deep learning-based
technologies have achieved significantly improved clas-
sification performance. A component that has improved
text classification performance in recent years is pre-trained
neural language models such as BER T , which have been
trained on vast amounts of text. Pre-trained neural language
models provide semantically rich features for text; there-
fore, even a simple multi-layer perceptron-based classifier
has performs excellently . After the initial success of BER T ,
many pre-trained models, such as RoBER T a [19] and GPT -
3 [5], have been published.
The tokenizers in these pre-trained neural language mod-
els typically divide documents into subwords as the small-
est unit. Subwords reduce the number of unknown words
not in the vocabulary , thus preventing the performance of
pre-trained neural language models from being degraded
by unknown words. Subword-based tokenization ef fec-
tively handles out-of-vocabulary (OOV) words by decom-
posing such words into several subwords. Concatena-
tions of these subwords represent OOV words, while tradi-
tional approaches represent them as unknown tokens. The
subword-based tokenization was initially employed for ma-
chine translation [29]; after that, it was used in various natu-
ral language processing tasks, including text classification.
316 Informatica 47 (2023) 315–326 Y . Kimura et al.
Multi-task learning (MTL) [6, 37, 39], which involves
one or more auxiliary tasks with the primary task by sharing
parameters, is a promising approach to enhance the perfor -
mance of deep learning models. It has also been applied to
text classification [17, 35, 36]. Learning models with auxil-
iary tasks positively af fect the generalization performance
of the main task and reduce over -fitting. Early studies on
MTL-based text classification [17, 35] focused on methods
to combine multiple tasks a nd combined tasks in dif ferent
datasets. Recent studies have combined text classification
with auxiliary tasks using the same dataset, such as named
entity recognition (NER) [2, 31] or label co-occurrence pre-
diction [36].
The fact that MTL with NER and text classification im-
proves the accuracy of text classification performance sug-
gests that the recognition of clause representations, such
as named entities, is suitable as an auxiliary task to MTL-
based text classification. However , to realize NER as an
auxiliary task for MTL-based text classification, supervised
labels for NER are required in addition to those for text clas-
sification. Constructing such training datasets is costly be-
cause of additional human costs for NER labeling.
Therefore, in this study , we seek to achieve MTL-
based text classification with phrasal expression recog-
nition, which does not require additional human cost to
construct a training dataset. Phrasal expressions (or key
phrases) for texts have been studied in past decades [27, 38].
Applying keyphrase extraction based on the subword-based
tokenization of popular pre-trained neural language mod-
els is not straightforward. Therefore, we define a phrasal
expression based on subwords as a subwor d-phrase and
seek its potential usability for the MTL-based text classi-
fication. In contrast to phrasal expressions based on words,
subword-phrases are not necessarily semantically coher -
ent because a vocabulary of subwords is determined sta-
tistically [29]. Owing to such little semantic coherence
of subword-phrases, studies have never been conducted on
their utilization for text classification.
In this study , we propose a framework for MTL-based
text classification with subword-phrase recognition to im-
prove the accuracy of text classification. Our frame-
work comprises unsupervised subword-phrase labeling
and MTL-based text classification for the subword-phrase
recognition task. Notably , we assume the presence of
labels for the classification of a dataset. T o implement
our framework, we employ a highly primitive approach:
frequency-based subword-phrase labeling, in which fre-
quently co-occurring consecutive subwords are mer ged to
form a subword-phrase; various implementations can be re-
alized using this approach. W e also employ the concept
of byte-pair encoding [29]. W e seek labeling schemes to
handle commonly appearing subword-phrases among doc-
ument classes to make the auxiliary task more ef fective than
text classification tasks.
The contributions of this study can be summarized as
follows: MTL-based text classification with low-cost aux-
iliary task preparation, utilization of phrasal expression
for subwords, and superior performance over conventional
methods, and comparable performance with the novel
methods. The proposed framework comprises an unsu-
pervised labeling module and an MTL-based classifica-
tion module. Existing MTL-based text classification meth-
ods assume the presence of supervision for auxiliary tasks;
however , obtaining this supervision requires further hu-
man and financial costs. In contrast, the proposed frame-
work does not require these costs as it utilizes unsupervised
subword-phrase extraction to obtain labels to create auxil-
iary tasks.
Our method is the first study that utilizes subword-
phrases. As subwords are not necessarily semantically co-
herent, their phrasal expressions have yet to be considered
for any task. In contrast, the co-occurrence of consecu-
tive subwords or subword-phrases could contribute to the
text classification task. Such subwords may represent dis-
tinguished instances of a class from those of others. In
the experimental evaluation of five popular text classifica-
tion datasets, the proposed framework with subword-phrase
recognition auxiliary task demonstrated improved classifi-
cation performance (micro and macro F-scores) compared
to the single-task method. Compared with the state-of-the-
art method (BertGCN [14]), the proposed framework also
demonstrated superior performance for datasets with more
labels, exhibiting comparative classification performance
for the other datasets.
The rest of this paper is or ganized as follows. Sec-
tion 2 introduces studies concerning MTL-based text clas-
sification. Section 3 explains the proposed framework of
MTL-based text classification with subword-phrase recog-
nition task. Section 4 then presents the experimental evalu-
ation, which demonstrates the ef fectiveness of the proposed
framework compared to that of the single-task text classifi-
cation baseline as well as other novel methods; it also dis-
cusses the ef fect of subword-phrases. Finally , Section 5
concludes this paper .
2 Related work
This section introduces literature related to MTL-based text
classification. MTL-based text classification methods are
categorized into the following three types based on the rela-
tionships between the main and auxiliary tasks [35]; Multi-
Cardinality , Multi-Domain, and Multi-Objective.
Multi-Cardinality means that the main and auxiliary
tasks are of dif ferent datasets but are in the same domain;
these tasks also dif fer in cardinality , meaning that they vary
in terms of their text lengths and the number of classes,
among other parameters.
Multi-Domain means that the main and auxiliary tasks
are similar , but their domains dif fer . For example, Liu et
al. [16] and Zhang et al. [35] examined MTL-based movie
review classification with classification tasks of reviews for
various products, such as books and DVDs [4].
Multi-Objective means that the main and auxiliary tasks
An Automatic Labeling Method for Subword-Phrase… Informatica 47 (2023) 315–326 317
have dif ferent objectives. For example, Liu et al. [18] com-
bined query classification and search result ranking using
an MTL approach, and Zhang et al. [35] attempted MTL-
based movie review classification (IMDB [21]) with news
article classification (RN [1]) and question type classifica-
tion (QC [13]) as auxiliary tasks.
In addition, MTL approaches [3, 30, 33, 40] in which the
main and auxiliary tasks are in the same dataset have ex-
hibited their ef fectiveness. Bi et al. [3] improved the per -
formance of news recommendations by using MTL, which
combines the news recommendation task with news arti-
cle classification and named entity recognition. The MTL-
based medical query intent classification model, proposed
by T ohti et al. [30], was trained together with the named
entity recognition, and consequently showed superior clas-
sification performance. On another task, Y ang et al. [33]
and Zhao et al. [40] showed similar observations on po-
larity classification combined with the aspect term extrac-
tion task. In the emotion prediction task, Li et al. [1 1] dealt
with the emotion-cause pair extraction task using the MTL-
based approach, which is combined with the emotion clause
extraction and the cause clause extraction. Similarly , Qi
et al. [24] proposed the MTL-based aspect sentiment clas-
sification method, where the auxiliary task was the aspect
term extraction; they also demonstrated its ef fectiveness. In
addition to the text classification task, the MTL-based ap-
proaches to image classification tasks have also shown its
ef fectiveness [9, 32].
MTL-based text classification, which utilizes the re-
lationship between labels in the same dataset, has also
been proposed to solve the multi-label classification prob-
lem, where a single text can be classified into multiple la-
bels [36]. Zhang et al. [36] showed improved classification
performance by designing an auxiliary task to learn the re-
lationship between labels.
These studies have shown the ef fectiveness of combin-
ing multiple supervised learning. However , in general, cre-
ating supervised data is expensive in terms of human and
financial costs; thus, lower -cost solutions to design auxil-
iary tasks are desirable.
Self-supervised learning (SSL) is a training approach that
understands data without supervised datasets. It first hides
pieces of data and trains the model so that the model can es-
timate the hidden pieces. Masked language model (MLM)
is a popular SSL in the natural language processing do-
main [8]. A popular pre-trained neural language model,
BER T [8], is trained based on two SSL tasks: MLM and
next sentence prediction. In the image processing domain,
DALL-E [25] showcased the significant performance of
SSL, where an area of an image was erased and DALL-E
was trained to estimate the erased area. The increasing at-
tention to these models indicates the usefulness of SSL for
data understanding and r epr esentation learning .
In contrast to data understanding, text classification is a
supervised learning task. In other words, SSL expects mod-
els to reconstruct broken pieces of data, while supervised
learning expects models to learn dedicated criteria from su-
pervision. Therefore, task settings in SSL are not easily
imported to MTL-based text classification.
The proposed framework in this study focuses on creat-
ing datasets for auxiliary tasks with no supervision, signif-
icantly reducing human ef forts and financial costs. T o our
knowledge, no research has been conducted that aimed to
design auxiliary tasks of MTL-based text classification with
no supervision. In addition, as subwords are not necessar -
ily semantically coherent, subword-phrases have not been
considered for any task. Therefore, this study proposes a
novel methodology of MTL-based text classification in two
aspects: In addition, since subwords are not necessarily se-
mantically coherent, subword-phrases have not been con-
sidered for any task. Therefore, this paper proposes a novel
methodology of MTL-based text classification in two as-
pects: (1) low-cost auxiliary task design and (2) introduc-
tion of subword-phrases. The experimental evaluation of
this study reveals promising results for these two aspects.
3 Pr oposed framework
This section explains our framework of the MTL-based text
classification, which generates subword-phrase labels for
auxiliary tasks in an unsupervised manner .
3.1 Framework overview
Figure 1 illustrates our framework. It consists of two
phases: unsupervised labeling and MTL-based text classi-
fication. The basic approach underlying of the framework
is that subword-phrase recognition is added as an auxil-
iary task for MTL-based text classification. T o realize the
recognition task, unsupervised subword-phrase extraction
is employed to create pseudo-supervision. A text classifier
based on the framework is trained using the following steps:
1. Input : the text classifier receives a training set of text
with classification labels;
2. T okenization : the text is tokenized into subwords using
a subword-based tokenizer;
3. Labeling (Phase 1) : the unsupervised labeling mod-
ule appends subword-phrase labels to each text in the
training set for the auxiliary subword-phrase recognition
task;
4. T raining (Phase 2) : the text classifier is trained in an
MTL manner , which is trained together with the auxil-
iary subword-phrase recognition task based on the ap-
pended labels.
Formally , a training set is denoted as D = { (T
i
,y
i
) | 1 ≤ i ≤ N} , whereT
i
represents a sequence of subword
tokens of the i -th text, y
i
represents the class label corre-
sponding to thei -th text, andN is the number of texts. In
the first phase, the unsupervised subword-phrase labeling
module receives D and performs subword-phrase extrac-
tion on subword token sequences to create another training
set D
aug
= { (T
i
,Y
aug
i
) | 1 ≤ i ≤ N} for the auxiliary
318 Informatica 47 (2023) 315–326 Y . Kimura et al.
Text with Label
Text with Subword-phrase Labels
P retrained N eural Language 
M odel
U nsupervised Subword-phrase 
E xtraction
Subword-based Tokenizer
Text
C lassification
Subword-phrase
R ecognition
Phase 1
Unsupervised Labeling
Phase 2
MTL-based
Text Classification
main auxiliary
Figure 1: Our MTL-based T ext Classification Framework.
The framework accepts text with text classification labels
and trains an MTL-based text classification model. The
framework consists of two phases: the first phase is un-
supervised labeling of the input text, and the second phase
is the training of the MTL-based text classification model
using the text classification labels and labels from the first
phase.
task, whereY
aug
i
is a corresponding sequence of labels for
each token in T
i
. In the second phase, D and D
aug
are
passed to an MTL-based text classification module based
on a pre-trained neural language model; they then train the
text classification model in conjunction with the training
subword-phrase recognition model.
3.2 Unsupervised subword-phrase labeling
Unsupervised subword-phrase labeling provides a label se-
quence that corresponds to the input text sequence. This
unsupervised labeling is a task formalized as follows:
• Given : a sequence of subword tokens T along with a
class labely , (T,y ) ∈ D
• Generate : a sequence of labels Y
aug
whose length is
exactly the same as that ofT
The labeling scheme is inspired by NER tasks that
employ the inside-outside-beginning (IOB2) tagging
scheme [26]. IOB2 tagging is a labeling scheme where
the first token of a phrase is tagged with B (beginning),
the intermediate tokens of a phrase are tagged with I
(inside), and tokens other than the phrase are tagged with O
(outside). Besides these tags, semantic types are appended
to distinguish types of phrases; for example, B-PERSON
and I-PERSON represent the beginning and intermediate
tokens of a token sequence corresponding with a person’ s
name, respectively .
A straightforward labeling scheme for subword-phrase
labeling is to treat all phrases equally . In other words, the
semantic type is set to Phrase . Formally , when ann -length
sequence of tokensS = (s
1
,s
2
,...,s
n
) has a phrase which
is an m -length sub-sequence P = (s
k
,s
k+1
,...,s
k+m
)
of S where m ≤ n , s
k
is labeled as a particular type
B-Phrase ; the rest of the tokens froms
k+1
tos
k+m
are la-
beled as I-Phrase and other tokenss
i
∈ S\ P are labeled
as O .
This approach is so straightforward that subword-phrases
appearing in dif ferent document classes are treated equally .
However , to provide cues to the main text classification
model, subword-phrases dependent on document classes
should be distinguishable. A simple classification-specific
labeling scheme assigns dif ferent labels to subword-phrases
appearing in other classes. When a subword-phrase P =
(s
k
,s
k+1
,...,s
k+m
) , which is a sequence of tokens of a
text belonging to class y , s
k
is labeled as B-y , and the
remaining tokens from s
k+1
to s
k+m
are labeled as I-y .
However , subword-phrases commonly appearing in dif fer -
ent classes cannot be handled in this scheme. T o han-
dle such common subword-phrases, we propose three la-
beling schemes, namely , Disr egard , Common-Label , and
Bit-Label . T o compare, the aforementioned straightfor -
ward labelling scheme is called All-Phrase . Disregard
scheme simply ignores the common subword-phrases, in
other words, they are labeled by O tags. In Common-
Label scheme, a special class label ∅ is used as a special
semantic type of labeling in the IOB2 scheme. Specif-
ically , the common subword-phrase P is labeled as B-∅ fors
k
and I-∅ for other tokens. T o handle such subword-
phrases, this study proposes a bit-encoding-based labeling
scheme. Bit-Label scheme still inherits the IOB2 labeling
scheme; therefore, suppose that d = 4 , a subword-phrase
P = (s
k
,s
k+1
,...,s
k+m
) , which is a sequence of tokens
of a text and belongs to the first and third classes, thens
k
is labeled as B-1010 , and the rest of the tokens froms
k+1
tos
k+m
are labeled as I-1010 .
3.3 MTL-based text classification
Our framework uses a text classification model based on
MTL and a pre-trained neural language model (NLM). In
this method, the NLM performs token encoding, and classi-
fication modules for main and auxiliary tasks are appended
on top of the encoding. Therefore, NLM is the part shared
among tasks and is trained in an MTL manner . A fully con-
nected layer and a softmax non-linear layer design the clas-
sification models.
For the main task (i.e., text classification), a representa-
tion h
cls
for a given input token sequence is obtained from
NLM. It is passed to a fully connected layer followed by
a softmax layer to predict class distribution ˆ y
cls
. Formally ,
ˆ y
cls
for h
cls
is calculated by the following equation:
ˆ y
cls
= softmax(W
⊤ cls
· h
cls
+ b
cls
), (1)
whereW
cls
and b
cls
denote the parameter matrix and bias,
respectively , for the text classification task.
An Automatic Labeling Method for Subword-Phrase… Informatica 47 (2023) 315–326 319
For the auxiliary tasks (i.e., subword-phrase recogni-
tion), a representation h
spr
j
for thej -th token of a given input
sequence is obtained from NLM. It is passed to a fully con-
nected layer followed by a softmax layer to predict token
label distribution ˆ y
spr
. Formally , ˆ y
spr
j
for h
spr
j
is calculated
by the following equation:
ˆ y
spr
j
= softmax(W
⊤ spr
· h
spr
j
+ b
spr
), (2)
whereW
spr
and b
spr
denote the parameter matrix and bias,
respectively , for the subword-phrase recognition task.
These main and auxiliary tasks are multi-class classifica-
tion tasks; therefore, using the cross-entropy loss as a loss
function is straightforward. The following equation calcu-
lates the lossL
cls
for the text classification task:
L
cls
= − N
∑ i=1
∑ c∈ C
y
i,c
logˆy
cls
i,c
, (3)
whereN is the number of training sample texts,C denotes
a set of classes, y
i,c
∈ { 0, 1} denotes a true label for the
i -th text wherey
i,c
= 1 if the true label of the text isc and
0 otherwise, and ˆy
cls
i,c
denotes the predicted probability of
classc for the text.
Similarly , the following equation calculates the lossL
spr
for the subword-phrase recognition task:
L
spr
= − N
∑ i=1
Mi
∑ j=1
∑ c∈ C
y
i,j,c
logˆy
cls
i,j,c
, (4)
whereN denotes the number of training sample texts,M
i
denotes the number of tokens in thei -th text,C denotes a
set of classes,y
i,j,c
∈ { 0, 1} denotes a true label for thej -th
token of thei -th text wherey
i,j,c
= 1 if the true label of the
token isc and 0 otherwise, and ˆy
spr
i,j,c
denotes the predicted
probability of classc for that token.
T o train both tasks simultaneously , feedback from results
on these tasks is fed to the NLM model to fine-tune its pa-
rameters. Therefore, joint lossL
joint
of losses for these tasks
are calculated using the following equation and used for pa-
rameter optimization.
L
joint
=L
cls
+L
spr
(5)
W e note that the weighting scheme in MTL approaches
to involve the importance of individual tasks has been stud-
ied [22, 28]. Although considering the weighting scheme
in our framework is promising, the purpose of this study is
to show the capability of MTL-based text classification in
conjunction with subword-phrase recognition, whoselabels
for auxiliary tasks are created in an unsupervised manner .
Therefore, employing the weighting scheme in our frame-
work can be the focus of future studies.
4 Experimental evaluation
T o evaluate the proposed framework, we conducted an ex-
perimental evaluation to answer the following items: (1)
Whether or not our MTL-based text classification meth-
ods that create auxiliary tasks in an unsupervised man-
ner improve classification performance compared to single-
task text classification methods?, (2) Whether or not our
MTL-based text classification can outperform state-of-the-
art (SOT A) text classification methods?, (3) Whether or not
the subword-phrase technique contributes to text classifi-
cation?, and (4) Whether or not there is the best labeling
scheme for subword-phrase recognition in terms of com-
mon subword-phrases?
The rest of this section is or ganized as follows: Sec-
tion 4.1 introduces the implementation of the proposed
framework; Section 4.2 explains the SOT A text classifica-
tion method for comparison; Section 4.3 describes the ex-
perimental settings; Section 4.4 showcases the experimen-
tal results, and Section 4.5 presents remarks on the experi-
ments by answering items mentioned above.
4.1 Implementation of the pr oposed
framework
In this experiment, we implemented a simple frequency-
based subword-phrase extraction method; the labeling
scheme used for the extracted subword-phrase was the
classification-specific labeling scheme. The frequency-
based method expects that frequently co-occurring sub-
words compose the regular textual expressions for each
class. T o control the number of subword-phrases, we uti-
lized the byte-pair encoding (BPE) algorithm [29]. The
BPE algorithm concatenates consecutive tokens if they fre-
quently co-occur in a corpus and repeats this concatenation
until the number of unique tokens equals the expected num-
ber . The ability to control the number of subword-phrases
was suitable for this experiment because the subword-
phrase was newly proposed in this study; therefore, we
needed to try variations of evaluation experiments which
were realized by creating dif ferent numbers of subword-
phrases.
In general, the number of texts is skewed among classes;
the number of particular texts of a class may be quite
lar ge, while that of other classes is very small. This af-
fected the extraction of subword-phrases; therefore, in this
experiment, the extraction mentioned above was applied
for each set of texts of class. Specifically , we extracted
n subword-phrases for each class. n was chosen from
{ 10, 100, 1000, 10000} to achieve the best classification
performance on the validation data.
4.2 Comparison method: BertGCN
BertGCN [14] is a SOT A method for text classification
that combines a pre-trained NLM with the inductive learn-
ing of graph neural networks (GNNs). BertGCN fol-
lows T extGCN [34] by constructing a graph of the co-
occurrence relations between texts and words and between
words and words. In BertGCN, vectors of vertices are ini-
tialized using the pre-trained NLM. These vectors are up-
320 Informatica 47 (2023) 315–326 Y . Kimura et al.
dated through graph convolutional neural network (GCN)
to involve the co-occurrence relationships between texts
and words. Based on the u pdated vectors, BertGCN per -
forms text classification by adding a fully connected layer
followed by a softmax layer . In addition, [14] reported
that integrating the output of the NLM-based classifica-
tion model and that of BertGCN can improve classifica-
tion performance; specifically , the linear sum of the pre-
dicted class distributions Z
GCN
and Z
NLM
, which are ob-
tained from BertGCN and the classifier using NLM, respec-
tively , as seen in the following equation:
Z =λ · Z
GCN
+(1− λ )· Z
NLM
, (6)
whereλ ∈ [0, 1] denotes the weight for BertGCN classifi-
cation. This experiment usedλ = 0. 7 as [14] reported that
it was the optimal value. BertGCN can use any pre-trained
NLM, and [14] reported that RoBER T a showed the optimal
performance. Therefore, RoBER T a was also used in imple-
menting the proposed framework to make the comparison
as reasonable as possible.
4.3 Settings
Datasets For the evaluation, the following five popu-
lar datasets in the text classification task are used; Movie
Review (MR), 20 Newsgroups (20NG), R8, R52 and
Ohsumed (OHS). MR is a dataset of movie reviews catego-
rized into binary sentiment classes (i.e., positive and nega-
tive). 20NG is a dataset of news texts categorized into 20
categories. R8 is a dataset of news articles from Reuters-
21578
1
limited to eight selected classes. R52 is a dataset
of news articles from Reuters-21578 limited to 52 selected
categories. OHS is a dataset of medical abstracts catego-
rized into 23 medical concepts called MESH categories.
The statistics of the dataset are shown in T able 1. As the
table shows, datasets with dif ferent classes and variations
in the number of instances per class (the standard deviation
(Std.) of the number of instances within a class) were used
in the experiment. These datasets were expected to reveal
the advantages and disadvantages of the proposed method.
Metrics The evaluation metric is F -score which is the
harmonic mean of precision and recall scores as shown be-
low .
Pre =
TP
TP +FP
(7)
Rec =
TP
TP +FN
(8)
F =
2· Pre· Rec
Prec+ Rec
(9)
The precision, denoted by Pre is the ratio of the number of
true positives (TP ) over the number of instances estimated
as positive (i.e., TP + FP , where FP is the number of
1
Reuters-21578, h t tp : / / w w w . d a v i d d l e w i s . c o m / r e s o u r c e s / t e
stcollections/reuters21578/ , visited on Aug. 4, 2022
false positives). The recall, denoted by Rec is the ratio of
TP over the number of positive instances in the evaluation
set (i.e.,TP +FN , whereFN is the number of false nega-
tives). T o observe various aspects for evaluation, micro and
macro averages ofF -scores were used in this experiment.
The micro average ofF -scores,F
micro
, is the instance-level
average of the F -score, and the macro average, F
macro
, is
the class-level average of theF -scores. When the numbers
of instances of dif ferent classes are highly skewed (class
imbalance problem), the F
micro
is not suitable to evalu-
ate the classification performance; this is because the lar ger
the number of instances of a class, the more it af fects this
metric. In other words, the classification performance in
the instances of minority classes is underestimated. In con-
trast, theF
macro
metric can ignore the skewness as theF
scores of dif ference classes are treated independently and
averaged.
Parameters For the base model in the proposed
method and BertGCN, we employed the RoBER T a-base
model [19], available at Huggingface
2
. BertGCN with the
RoBER T a model was called RoBER T aGCN in this experi-
ment. In this study , the ef fect of common subword-phrases
was also evaluated; therefore, the proposed method had
two variations: one included common subword-phrases
(denoted as Proposed w/ cmn) and the other excluded
them (denoted as Proposed w/o cmn). In addition, as
a baseline method, we also employed a single-task text
classification method based on RoBER T a. The baseline
method was implemented by adding a fully connected
layer and a softmax layer on top of RoBER T a, which is
equivalent to Eq. 1 with the loss function shown in Eq. 3.
The only dif ference between the proposed and the baseline
methods was the number of tasks on top of RoBER T a.
Therefore, the comparison between them was expected to
reveal the ef fectiveness of MTL-based text classification.
These models were optimized using the AdamW optimizer
(Adam optimizer [10] with decoupled weight decay reg-
ularization) [20]. Experiments were conducted with 100
epochs, batch size 64, and a maximum token length of 256.
Only the experiment for RoBER T aGCN was conducted
with a batch size of 128 and a maximum token length of
128, which yielded better results than the aforementioned
hyper parameters.
4.4 Results
T able 2 shows the experimental results ofF
micro
(T able 2(a))
andF
macro
(T able 2(b)), and showcases the following three
observations. (1) The proposed method performed better
than the baseline method in both metrics except the simple
binary classification on the MR dataset. (2) The proposed
method outperformed RoBER T aGCN for three of the five
datasets in terms of theF
micro
metric and four of the five
datasets in terms of theF
macro
metric. (3) In terms of label-
ing schemes, the Bit-Label and the Disregard approaches
2
https://huggingface.co/roberta- base
An Automatic Labeling Method for Subword-Phrase… Informatica 47 (2023) 315–326 321
T able 1: Statistics of datasets. The number of instances in train-valid-test splits, number of classes, and average (A vg.)
and standard deviation (Std.) of the number of instances across classes.
MR 20NG R8 R52 OHS
#T rain 6,398 10,183 4,937 5,879 3,022
#V alid 710 1,131 548 653 335
#T est 3,554 7,532 2,189 2,568 4,043
#Class 2 20 8 52 23
A vg. #Instances/Class 5,331 942 959 175 321
Std. #Instances/Class 0 94 1,309 613 305
T able 2: Evaluation results. The best score in each column (i.e., dataset) is bold-faced. RoBER T aGCN is the SOT A
text classification method and Baseline is the single-task text classification based on the RoBER T a model. The proposed
method has two variations: one, denoted as Proposed w/ cmn, includes common subword-phrases in the labeling scheme,
and the other , denoted as Proposed w/o cmn, excludes them. (a) and (b) showcase the results ofF
micro
andF
macro
, respec-
tively .
(a)F
micro
Model MR 20NG R8 R52 OHS
RoBER T aGCN 0.880 0.894 0.979 0.944 0.736
Baseline (RoBER T a) 0.881 0.831 0.977 0.962 0.690
Proposed - All-Phrase 0.888 0.838 0.979 0.967 0.705
Proposed - Common-Label 0.860 0.850 0.978 0.967 0.704
Proposed - Bit-Label 0.882 0.846 0.979 0.968 0.71 1
Proposed - Disregard 0.866 0.851 0.979 0.969 0.71 1
(b)F macro
Model MR 20NG R8 R52 OHS
RoBER T aGCN 0.880 0.861 0.925 0.756 0.605
Baseline (RoBER T a) 0.881 0.825 0.943 0.836 0.594
Proposed - All-Phrase 0.888 0.832 0.948 0.842 0.622
Proposed - Common-Label 0.860 0.845 0.947 0.841 0.610
Proposed - Bit-Label 0.882 0.840 0.953 0.866 0.636
Proposed - Disregard 0.866 0.845 0.955 0.851 0.637
performed better than other schemes in terms of theF
macro
metric.
The comparison between the proposed method and the
baseline method in bothF
micro
andF
macro
revealed the ef-
fectiveness of the MTL-based approach, in which the auxil-
iary task was systematically constructed. In addition to in-
sights from existing literature that MTL-based approaches
using auxiliary tasks with supervision are ef fective, this ex-
periment showcased the ef fectiveness of an MTL approach
in which training data for an auxiliary task was generated
in an unsupervised manner . The results showcase that low-
cost auxiliary tasks for MTL-based text classification now
demonstrate promising performance.
While the results of MR and R8 datasets showed compa-
rable performances between the proposed and the baseline
methods, these datasets were composed of smaller numbers
of classes. These results suggest that the proposed method
did not perform ef fectively when the number of classes was
small.
A notable fact from the results was the proposed method
achieved significantly better performance than RoBER T a-
GCN in terms ofF
macro
on the R8, R52, and OHS datasets.
Simultaneously , the proposed method was also more accu-
rate than RoBER T aGCN in terms of F
micro
. These facts
indicate that the proposed method achieved state-of-the-
art classification performance on these datasets. Recall-
ing the statistics of these datasets from T able 1, the num-
bers of classes in each R8, R52 and OHS dataset are lar ger
than those of other datasets and the number of instances per
class is highly skewed. These facts indicate that the pro-
posed method is good for highly skewed datasets. Though
20NG dataset had similar number of classes to the OHS
dataset and was less skewed than the OHS dataset, the per -
formance in terms of F
micro
and F
macro
of the proposed
322 Informatica 47 (2023) 315–326 Y . Kimura et al.
T able 3: Evaluation results: Accuracy of auxiliary tasks
(a)F
micro
Model MR 20NG R8 R52 OHS
Proposed - All-Phrase 0.971 0.975 0.978 0.998 0.971
Proposed - Common-Label 0.922 0.968 0.974 0.972 0.978
Proposed - Bit-Label 0.918 0.974 0.965 0.975 0.977
Proposed - Disregard 0.922 0.851 0.962 0.975 0.978
(b)F macro
Model MR 20NG R8 R52 OHS
Proposed - All-Phrase 0.960 0.975 0.945 0.796 0.953
Proposed - Common-Label 0.761 0.889 0.869 0.853 0.725
Proposed - Bit-Label 0.756 0.852 0.764 0.864 0.762
Proposed - Disregard 0.761 0.845 0.731 0.847 0.725
method was worse than RoBER T aGCN. Consequently , the
proposed method performed better than the SOT A method
when datasets were composed of lar ge classes and highly
skewed in the number of instances across classes.
The comparison among variations of the proposed
method in terms of the labeling schemes for commonly ap-
pearing subword-phrases among document classes showed
that the proposed method with dif ferent schemes had simi-
lar performances, each with their pros and cons for dif ferent
datasets. The All-Phrase scheme had all phrases labeled by
the IOB2 tagging scheme regardless of document classes.
Compared with other schemes that take document classes
into account, its performance was inferior . This indi-
cates that class-specific labeling (the Common-Label, Bit-
Label, and Disregard schemes) is ef fective, except for the
MR dataset, which is a binary classification dataset; thus,
subword-phrases are merely class-specific . For the com-
parison of labeling common subword-phrases among the
Common-Label, Bit-Label, and Disregard schemes, their
classification performances were comparable, and the Dis-
regard scheme had relatively better performance.
T o show the dif ficulties of subword-phrase recognition
tasks with dif ferent labeling schemes, T able 3 displays the
F scores of the auxiliary tasks. In general, the number
of classes in a sequence labeling problem is related to its
dif ficulty . Thus, the All-Phrase scheme was expected to
be the easiest and the Bit-Label scheme the most dif ficult.
As shown in the results in the table, the F scores of the
All-Phrase scheme are the highest among these schemes,
thereby confirming their easiness in terms of a sequence la-
beling problem. In contrast,F scores of the other schemes
were inferior , but still high enough to aid the generalization
performance of the main text classification model.
4.5 Remarks
This section summarizes the findings from our experiment
by answering the abovementioned items and introduces the
limitations of the proposed method.
(1) The proposed method outperformed the baseline method
when the number of classes of a dataset was lar ge and
was comparable to them when the number was small.
However , datasets with a few classes were also less
skewed in the number of instances per class. Therefore,
the frequency-based subword-phrase extraction for con-
structing auxiliary tasks was suitable when datasets had
many classes, and the number of instances per class was
skewed. A promising outcome is that an auxiliary recog-
nition task in which (pseudo) supervision is generated
unsupervised is ef fective in the MTL-based classifica-
tion. Therefore, this outcome opens up new possibilities
for constructing auxiliary tasks for the MTL-based clas-
sification methods on tasks other than text classification.
(2) The proposed method was superior to the SOT A method,
RoBER T aGCN, for the R52 and OHS datasets, which
contained many classes and where the number of in-
stances per class was skewed. A promising direction
to overcome the inferiority of the proposed method in
the other datasets is to utilize RoBER T aGCN as a base
model for the proposed method.
(3) The subword-phrase recognition task as an auxiliary
task improves text classifications in various datasets. A
promising outcome is the usage of phrasal expressions
for subwords, which which needs more attention in the
literature.
(4) T o handle common subword-phrases among document
classes, the Bit-Label scheme, which encodes depen-
dence of subword-phrases in a bit sequence that can rep-
resent all combinations of appearing classes, and the
Disregard scheme, which ignores common subword-
phrases, were the best. The higher the number of classes
(e.g., R52), the better the classification performance us-
ing the Bit-Label scheme. Contrastingly , the smaller the
number of classes (e.g., R8 and OHS), the better the Dis-
regard scheme performance.
An Automatic Labeling Method for Subword-Phrase… Informatica 47 (2023) 315–326 323
Consequently , when the number of classes is lar ge, and
the number of instances for document classes is skewed, the
MTL-based text classification suf fers from the class imbal-
ance problem, which is still an open problem in the general
text classification tasks domain. This domain showcases
some promising results by using subword-phrase recogni-
tion tasks, whose labels are obtained in an unsupervised
manner . However , at the same time, the classification per -
formance still leaves a lot to be desired. Therefore, future
studies should seek more ef fective auxiliary tasks to deal
with the class imbalance problem.
5 Conclusion
W e proposed an MTL-based text classification framework
using auxiliary tasks with lower human and financial costs
by creating auxiliary task labels unsupervised. W e also
sought to ascertain the possibility of phrasal expressions of
subwords called subword-phrases to utilize subword-based
neural language pre-trained models. As an implementation
of our framework, we extracted subword-phrases in terms
of their frequency of occurrence and labeled them into doc-
uments in three dif ferent ways. Our experimental evalua-
tion for text classification using five popular datasets high-
lighted the ef fectiveness of the subword-phrase recognition
as an auxiliary task. It also showed comparative results with
RoBER T aGCN which is the state-of-the-art method.
The main conclusions of this paper are: an auxiliary
recognition task in which pseudo supervision is generated
in an unsupervised manner is ef fective in MTL-based clas-
sification, and opens up the possibility of constructing aux-
iliary tasks for MTL-based classification methods for clas-
sification tasks other than text classification, and phrasal
expressions for subwords (subword-phrase) can be helpful
in text classification.
Acknowledgment
This work was partly supported by the Grants-in-Aid for
Academic Promotion, Graduate School of Culture and In-
formation Science, Doshisha University , JSPS KAKENHI
Grant Number 19H01 138, 19H04218, and 21H03555, and
JST , the establishment of university fellowships towards
the creation of science and technology innovation, Grant
Number JPMJFS2145.
Refer ences
[1] C. Apté, F . Damerau, and S. M. W eiss. Auto-
mated Learning of Decision Rules for T ext Catego-
rization. ACM T ransactions on Information Systems ,
12(3):233–251, 1994.
[2] A. Benayas, R. Hashempour , D. Rumble, S. Jameel,
and R. C. De Amorim. Unified T ransformer Multi-
T ask Learning for Intent Classification W ith Entity
Recognition. IEEE Access , 9:147306–147314, 2021.
[3] Q. Bi, J. Li, L. Shang, X. Jiang, Q. Liu, and H. Y ang.
MTRec: Multi-T ask Learning over BER T for News
Recommendation. In Findings of the Association for
Computational Linguistics: ACL 2022 , pages 2663–
2669, May 2022.
[4] J. Blitzer , M. Dredze, and F . Pereira. Biogra-
phies, Bollywood, Boom-boxes and Blenders: Do-
main Adaptation for Sentiment Classification. In Pr o-
ceedings of the 45th Annual Meeting of the Associ-
ation of Computational Linguistics , pages 440–447,
2007.
[5] T . B. Brown, B. Mann, N. R yder , M. Subbiah,
J. Kaplan, P . Dhariwal, A. Neelakantan, P . Shyam,
G. Sastry , A. Askell, S. Agarwal, A. Herbert-V oss,
G. Krueger , T . Henighan, R. Child, A. Ramesh,
D. M. Ziegler , J. W u, C. W inter , C. Hesse, M. Chen,
E. Sigler , M. Litwin, S. Gray , B. Chess, J. Clark,
C. Berner , S. McCandlish, A. Radford, I. Sutskever ,
and D. Amodei. Language Models are Few-Shot
Learners. In Advances in Neural Information Pr ocess-
ing Systems , 2020.
[6] R. Caruana. Multitask Learning. Machine Learning ,
28(1):41–75, 1997.
[7] O. de Gibert, N. Pérez, A. G. Pablos, and M. Cuadros.
Hate Speech Dataset from a White Supremacy Forum.
In Pr oceedings of the 2nd W orkshop on Abusive Lan-
guage Online (AL W2) , pages 1 1–20, 2018.
[8] J. Devlin, M. Chang, K. Lee, and K. T outanova.
BER T : Pre-training of Deep Bidirectional T ransform-
ers for Language Understanding. In Pr oceedings of
the 2019 Confer ence of the North American Chap-
ter of the Association for Computational Linguistics:
Human Language T echnologies, NAACL-HL T 2019,
Minneapolis, MN, USA, June 2-7, 2019, V olume 2 (In-
dustry Papers) , pages 4171–4186, 2019.
[9] S. Graham, Q. D. V u, M. Jahanifar , S. Raza, F . A. Af-
sar , D. R. J. Snead, and N. M. Rajpoot. One model is
all you need: Multi-task learning enables simultane-
ous histology image segmentation and classification.
Medical Image Analysis , 83:102685, 2023.
[10] D. P . Kingma and J. Ba. Adam: A method for stochas-
tic optimization. In 3r d International Confer ence on
Learning Repr esentations , 2015.
[1 1] C. Li, J. Hu, T . Li, S. Du, and F . T eng. An
ef fective multi-task learning model for end-to-end
emotion-cause pair extraction. Applied Intelligence ,
53(3):3519–3529, 2023.
[12] Q. Li, H. Peng, J. Li, C. Xia, R. Y ang, L. Sun, P . S.
Y u, and L. He. A Survey on T ext Classification: From
T raditional to Deep Learning. ACM T ransactions
on Intelligent Systems and T echnology , 13(2):31:1–
31:41, 2022.
324 Informatica 47 (2023) 315–326 Y . Kimura et al.
[13] X. Li and D. Roth. Learning Question Classifiers. In
COLING 2002: The 19th International Confer ence on
Computational Linguistics , 2002.
[14] Y . Lin, Y . Meng, X. Sun, Q. Han, K. Kuang, J. Li, and
F . W u. BertGCN: T ransductive T ext Classification by
Combining GNN and BER T . In Findings of the Asso-
ciation for Computational Linguistics: ACL-IJCNLP
2021 , pages 1456–1462, Online, Aug. 2021.
[15] M. Lippi, P . Palka, G. Contissa, F . Lagioia, H. Mick-
litz, G. Sartor , and P . T orroni. CLAUDETTE: an au-
tomated detector of potentially unfair clauses in on-
line terms of service. Artifcial Intelligence and Law ,
27(2):1 17–139, 2019.
[16] P . Liu, X. Qiu, and X. Huang. Deep Multi-T ask
Learning with Shared Memory for T ext Classification.
In Pr oceedings of the 2016 Confer ence on Empirical
Methods in Natural Language Pr ocessing , pages 1 18–
127, 2016.
[17] P . Liu, X. Qiu, and X. Huang. Recurrent Neu-
ral Network for T ext Classification with Multi-T ask
Learning. In Pr oceedings of the T wenty-Fifth Inter -
national Joint Confer ence on Artificial Intelligence ,
pages 2873–2879, 2016.
[18] X. Liu, J. Gao, X. He, L. Deng, K. Duh, and
Y . W ang. Representation Learning Using Multi-T ask
Deep Neural Networks for Semantic Classification
and Information Retrieval. In Pr oceedings of the
2015 Confer ence of the North American Chapter of
the Association for Computational Linguistics: Hu-
man Language T echnologies , pages 912–921, 2015.
[19] Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,
O. Levy , M. Lewis, L. Zettlemoyer , and V . Stoyanov .
RoBER T a: A Robustly Optimized BER T Pretraining
Approach. CoRR , abs/1907.1 1692, 2019.
[20] I. Loshchilov and F . Hutter . Decoupled W eight Decay
Regularization. In 7th International Confer ence on
Learning Repr esentations, ICLR 2019, New Orleans,
LA, USA, May 6-9, 2019 . OpenReview .net, 2019.
[21] A. L. Maas, R. E. Daly , P . T . Pham, D. Huang, A. Y .
Ng, and C. Potts. Learning word vectors for sentiment
analysis. In Pr oceedings of the 49th Annual Meeting
of the Association for Computational Linguistics: Hu-
man Language T echnologies , pages 142–150. Associ-
ation for Computational Linguistics, 201 1.
[22] Y . Mao, Z. W ang, W . Liu, X. Lin, and P . Xie.
MetaW eighting: Learning to W eight T asks in Multi-
T ask Learning. In Findings of the Association for
Computational Linguistics: ACL 2022 , pages 3436–
3448, 2022.
[23] S. Minaee, N. Kalchbrenner , E. Cambria, N. Nikzad,
M. Chenaghlu, and J. Gao. Deep Learning-based
T ext Classification: A Comprehensive Review . ACM
Computing Surveys , 54(3):62:1–62:40, 2021.
[24] R. Qi, M. Y ang, Y . Jian, Z. Li, and H. Chen. A Lo-
cal context focus learning model for joint multi-task
using syntactic dependency relative distance. Applied
Intelligence , 53(4):4145–4161, 2023.
[25] A. Ramesh, M. Pavlov , G. Goh, S. Gray , C. V oss,
A. Radford, M. Chen, and I. Sutskever . Zero-Shot
T ext-to-Image Generation. In Pr oceedings of the
38th International Confer ence on Machine Learning ,
pages 8821–8831, 2021.
[26] L. Ramshaw and M. Marcus. T ext Chunking using
T ransformation-Based Learning. In Thir d W orkshop
on V ery Lar ge Corpora , 1995.
[27] F . Sebastiani. Machine Learning in Automated T ext
Categorization. ACM computing surveys , 34(1):1–47,
mar 2002.
[28] O. Sener and V . Koltun. Multi-T ask Learning as
Multi-Objective Optimization. In Pr oceedings of the
32nd International Confer ence on Neural Information
Pr ocessing Systems , pages 525–536, 2018.
[29] R. Sennrich, B. Haddow , and A. Birch. Neural Ma-
chine T ranslation of Rare W ords with Subword Units.
In Pr oceedings of the 54th Annual Meeting of the As-
sociation for Computational Linguistics (V olume 1:
Long Papers) , pages 1715–1725, Berlin, Germany ,
Aug. 2016. Association for Computational Linguis-
tics.
[30] T . T ohti, M. Abdurxit, and A. Hamdulla. Medical QA
Oriented Multi-T ask Learning Model for Question In-
tent Classification and Named Entity Recognition. In-
formation , 13(12):581, 2022.
[31] C. W u, G. Luo, C. Guo, Y . Ren, A. Zheng, and
C. Y ang. An attention-based multi-task model for
named entity recognition and intent analysis of Chi-
nese online medical questions. Journal of Biomedical
Informatics , 108:10351 1, 2020.
[32] M. Xu, K. Huang, and X. Qi. A Regional-Attentive
Multi-T ask Learning Framework for Breast Ultra-
sound Image Segmentation and Classification. IEEE
Access , 1 1:5377–5392, 2023.
[33] H. Y ang, B. Zeng, J. Y ang, Y . Song, and R. Xu.
A multi-task learning model for Chinese-oriented as-
pect polarity classification and aspect term extraction.
Neur ocomputing , 419:344–356, 2021.
[34] L. Y ao, C. Mao, and Y . Luo. Graph Convolutional
Networks for T ext Classification. In Pr oceedings of
the Thirty-Thir d AAAI Confer ence on Artificial Intelli-
gence and Thirty-First Innovative Applications of Ar -
tificial Intelligence Confer ence and Ninth AAAI Sym-
posium on Educational Advances in Artificial Intelli-
gence , pages 7370–7377, 2019.
An Automatic Labeling Method for Subword-Phrase… Informatica 47 (2023) 315–326 325
[35] H. Zhang, L. Xiao, Y . W ang, and Y . Jin. A Gener -
alized Recurrent Neural Architecture for T ext Clas-
sification with Multi-T ask Learning. In Pr oceedings
of the T wenty-Sixth International Joint Confer ence on
Artificial Intelligence , pages 3385–3391, 2017.
[36] X. Zhang, Q. Zhang, Z. Y an, R. Liu, and Y . Cao. En-
hancing Label Correlation Feedback in Multi-Label
T ext Classification via Multi-T ask Learning. In Find-
ings of the Association for Computational Linguis-
tics: ACL-IJCNLP 2021 , pages 1 190–1200. Associ-
ation for Computational Linguistics, 2021.
[37] Y . Zhang and Q. Y ang. A Survey on Multi-T ask
Learning. IEEE T ransactions on Knowledge and Data
Engineering , 34(12):5586–5609, 2022.
[38] Y . Zhang, N. Zincir -Heywood, and E. Milios. Nar -
rative T ext Classification for Automatic Key Phrase
Extraction in W eb Document Corpora. In Pr oceed-
ings of the 7th Annual ACM International W orkshop
on W eb Information and Data Management , WIDM
’05, page 51–58, 2005.
[39] Z. Zhang, W . Y u, M. Y u, Z. Guo, and M. Jiang. A
Survey of Multi-task Learning in Natural Language
Processing: Regarding T ask Relatedness and T raining
Methods. CoRR , abs/2204.03508, 2022.
[40] M. Zhao, J. Y ang, and L. Qu. A multi-task learning
model with graph convolutional networks for aspect
term extraction and polarity classification. Applied
Intelligence , 53(6):6585–6603, 2023.
326 Informatica 47 (2023) 315–326 Y . Kimura et al.