https://doi.or g/10.31449/inf.v48i6.5660 Informatica 48 (2024) 185–198 185
Evaluation of Shallow Convolutional Neural N etwork in Open-W orld Chart
Image Classification
Filip Bajić
1, ∗ , Marija Habijan
2
and Krešimir Nenadić
2
1
University Computing Centre, University of Zagreb, Zagreb, 10000, Croatia
2
Faculty of Electrical Engineering, Computer Science and Information T echnology Osijek, Osijek, 31000, Croatia
E-mail: filip.bajic@srce.hr , marija.habijan@ferit.hr , kresimir .nenadic@srce.hr
*Corresponding author
Keywords: data visualization, machine learning, chart image, neural network, pattern recognition
Received: January 23, 2024
Data’ s r ole is pivotal in the era of internet technologies, but unstructur ed data poses compr ehension chal-
lenges. Data visualizations like charts have emer ged as crucial tools for condensing complex information.
Classifying charts and applying various pr ocessing techniques ar e vital to interpr eting visual data. T radi-
tional chart image classification methods r ely on pr edefined rules and have limited accuracy . The advent
of support vector machines (SVMs) and convolutional neural networks (CNNs) significantly impr oved the
accuracy of these methods. This r esear ch evaluates our pr eviously intr oduced Shallow convolutional neu-
ral network (SCNN) ar chitectur e for chart image classification, comprising four convolutional layers, two
max-pooling layers, and one fully-connected layer . The network achieves state-of-the-art r esults, r equir -
ing smaller datasets and r educed computational r esour ces. When two networks ar e combined into Siamese
SCNN (SSCNN), emphasizing generalization, it achieves high accuracy with small datasets and excels
in open-set classification. The evaluation pr ocess encompasses the utilization of six publicly available
datasets.
Povzetek: Raziskava uvaja ar hitektur o plitkega konvolucijskega nevr onskega omr ežja (SCNN) za klasi-
fikacijo slik grafikonov .
1 Intr oduction
In the contemporary digital landscape characterized by ad-
vanced internet technologies, the crucial role of data and in-
formation is evident. Despite the internet’ s capacity to pro-
cess vast amounts of data, challenges arise with accumulat-
ing unstructured data, especially when presented in tables
that demand significant mental ef fort for comprehension.
Statistical information, often in numerical formats, can be
intricate to interpret, leading to dif ficulties distinguishing
crucial from less relevant data[ 1 ].
Data visualization, widely employed in mathematics,
statistics, analytics, and pattern identification scenarios,
faces accessibility challenges. Search engines struggle to
include results from visualizations, and blind or visually
impaired individuals encounter dif ficulties due to a lack
of awareness about accessibility guidelines. Digital docu-
ments often lack essential elements, hindering comprehen-
sive understanding for those relying on screen readers. On-
going research aims to enhance accessibility , focusing on
improved classification and interpretation of information in
visual representations [ 2 ].
V arious tools and methods, particularly data visualiza-
tions like charts and diagrams, have emer ged to address
these challenges. These visual representations condense
complex data, aiding in transmission, understanding, and
decision-making. Customized charts tailored to specific
data types enhance structural clarity . The initial step in in-
terpreting visualizations involves chart classification, fol-
lowed by processing techniques like data export algorithms
and optical character recognition systems [ 3 , 4 , 5 ].
This work comprehensively evaluates the SCNN archi-
tecture for chart image classification, utilizing six publicly
available datasets throughout training and evaluation. The
study explores the impact of dataset quality and quantity on
results, addressing open classification through traditional
SCNN and Siamese architecture, yielding positive results.
2 Related r esear ch
Unlike tables, charts enhance comprehension and simplify
interpretation. V arious digital tools, such as Microsoft Ex-
cel, Matlab, D3, Plotly , or manual methods, can be used
for chart creation. However , challenges arise when charts
are stored or digitized as a single visual entity , leading to
the loss of structural information. As charts find exten-
sive applications, algorithms have been developed to re-
trieve and process information from stored images. Over
the years, researchers have used dif ferent methods to ex-
tract information from charts, which can be categorized into
four groups: methods that use custom algorithms, model-
186 Informatica 48 (2024) 185–198 F . Bajić et al.
based methods, machine-learning (ML) based methods, and
neural network-based methods. An extended review of re-
search in this field is available in [ 3 ].
Custom algorithms involve analyzing and extracting fea-
tures of the chart image at the pixel and graphic symbol
level and combining it with dif ferent image preprocessing
techniques. The primary task of image preprocessing is to
prepare the image for future analysis and feature extrac-
tion. The most basic types of image preprocessing used for
chart extraction and visualization include image resolution
normalization, image color space normalization, and image
noise reduction [ 2 , 6 , 7 ]. Advanced types of image pre-
processing, such as edge detection, vectorization, and seg-
mentation, enable separating textual and graphic image el-
ements [ 1 , 2 , 8 , 9 , 10 , 1 1 , 12 ]. Edge detection distinguishes
sudden changes in image brightness, while vectorization al-
lows the conversion of a raster image into a vector image to
extract graphic shapes. The segmentation process reduces
the information in the image, making it possible to distin-
guish between essential and non-essential elements needed
for chart image classification.
Model-based chart image classification is based on hand-
crafted models created for each chart-type separately .
Defining the critical textual and graphic elements that a
specific chart-type should contain is necessary within the
model. The chart image will not be successfully classified
if some of the essential elements are omitted or if their lo-
cation in the image space dif fers from the location defined
by the model [ 13 , 14 ].
Machine-learning methods such as SVMs are commonly
used for regression and classification. The features created
using the previously mentioned methods are often used as
input values of SVM. SVMs achieve successful results even
with small datasets; however , setting a mar gin is challeng-
ing when classes share features [ 2 , 12 , 15 , 16 , 17 , 18 ].
Since 2015, current research has employed CNNs for
chart image classification, and a comparison of traditional
methods and CNN architectures [ 41 ] showed a visible
results improvement of 20%. Researchers use various
CNN architectures, the most common ones being LeNet,
AlexNet, VGG, GoogLeNet, ResNet, Inception, and Mo-
bileNet. A comparison of these architectures is available
in [ 8 , 42 , 36 , 43 ], and the results show a dif ference of up
to 5%. Nevertheless, comparing the results is challenging
since the authors use dif ferent variables and datasets. Even
if some authors use existing, out-of-the-box solutions, they
must adjust the input parameters to suit their needs and the
dataset used. The dataset is the most crucial input variable
for comparing dif ferent methods. Since all the datasets used
are qualitatively and quantitatively dif ferent, the reported
numbers should be considered additional information rather
than a reference point to achieve specific results. More-
over , CNN can be used alone or in combination with SVM,
where CNN is used for feature extraction, and SVM is used
as a classifier . Either way , it has been shown that using
CNN produces very high classification accuracy , as seen in
T able 1 . Several noteworthy articles from T able 1 warrant
attention.
Foremost among these is ”ReV ision,” acknowledged as
the most cited scientific paper in the field, despite its 201 1
publication date [ 16 ]. This seminal work remains state-of-
the-art, elucidating a comprehensive process for classify-
ing chart images. The methodology encompasses leverag-
ing low-level image features, incorporating textual infor -
mation to enhance classification outcomes, extracting data
from bar and pie charts, and applying perceptually based
design principles to reimagine data visualizations. The au-
thors report a notable classification accuracy of 0.81 across
ten chart-types.
Another notable contribution is utilizing a modified
LeNet CNN to classify input images into 1 1 chart-types
[ 23 ]. Assisted by a custom dataset, the average classifica-
tion accuracy achieved is 0.89. The article provides intri-
cate details about the dataset, including the average classifi-
cation accuracy for each chart-type. Furthermore, it of fers a
comprehensive model description, experimental setup, and
a comparative analysis involving classic LeNet, pre-trained
LeNet, and the modified LeNet model.
Addressing the accessibility of data visualizations for vi-
sually impaired users, [ 1 ] introduced a fully automated sys-
tem titled ”V isualizing for the Non-V isual.” This system
adeptly classifies input images into ten chart-types, detects
and categorizes graphical and textual elements, extracts
shapes into vector format, and retrieves data from three
chart-types. The authors achieved an average classification
accuracy of 0.96 using ResNet. The article provides a de-
tailed dataset overview and conducts a comparative analy-
sis with other systems, including ”Reverse-Engineering V i-
sualizations” [ 2 ], ”ChartSense” [ 10 ], and ”ReV ision” [ 16 ].
Incorporating multiple CNNs for a specific task led to at-
taining state-of-the-art results.
”ChartDETR” [ 40 ] represents the latest advancement in
the field, employing a transformer -based multi-shape detec-
tor for localizing keypoints at the corners of regular shapes,
thereby reconstructing multiple data elements within a sin-
gle chart image. The proposed methodology introduces
query groups in set prediction, enabling the simultaneous
prediction of all data element shapes and eliminating the
necessity for subsequent postprocessing. This distinctive
attribute positions ”ChartDETR” as a unified framework
capable of accommodating various chart-types without ne-
cessitating alterations to the network architecture, thereby
adeptly detecting data elements with diverse shapes. The
method under goes evaluation on three datasets, demon-
strating competitive results across three chart-types and
achieving a classification accuracy of 0.98.
3 Methods
This section explains two used methodologies involving
an SCNN and an SSCNN. Both model’ s architectures are
available in our previously published article [ 39 ].
Evaluation of Shallow Convolutional Neural Network… Informatica 48 (2024) 185–198 187
Authors Y ear Method Number of chart-types Accuracy Dataset
Redeke [ 1 1 ] 2001 Custom algorithm 2 0.83 small
Mishchenko and V assilieva [ 13 ] 201 1 Model-based 5 0.92 small
Mishchenko and V assilieva [ 14 ] 201 1 Model-based 5 0.92 small
Savva et al. [ 16 ] 201 1 SVM 10 0.81 medium
Gao et al. [ 12 ] 2012 Custom algorithm, SVM 3 0.97 small
Karthikeyani and Nagarajan [ 19 ] 2012 Custom algorithm, SVM 8 0.69 – 0.77 small
Cheng et al. [ 20 ] 2013 Custom algorithm 3 0.96 medium
Liu et al. [ 21 ] 2015 CNN 5 0.75 medium
Siegel et al. [ 15 ] 2016 CNN 7 0.84 – 0.86 lar ge
Poco and Heer [ 2 ] 2017 CNN, SVM 10 0.94 medium
Junior et al. [ 22 ] 2017 CNN 10 0.70 medium
Amara et al. [ 23 ] 2017 CNN 1 1 0.89 medium
Jung et al. [ 10 ] 2017 CNN 10 0.91 medium
Battle et al. [ 24 ] 2018 Custom algorithm 24 0.83 – 0.94 lar ge
Lin et al. [ 25 ] 2018 CNN 2 0.89 – 0.93 small
Dai et al. [ 8 ] 2018 CNN 5 0.99 lar ge
Choi et al. [ 1 ] 2019 CNN 10 0.96 medium
Jobin et al. [ 26 ] 2019 CNN 28 0.88 – 0.93 lar ge
Liu et al. [ 27 ] 2019 CNN 2 0.96 lar ge
Davila et al. [ 28 ] 2019 CNN 7 0.88 – 0.99 lar ge
Bajić et al. [ 4 ] 2019 CNN 10 0.81 medium
Kaur and Kiesel [ 29 ] 2020 CNN 14 0.92 medium
Bajić et al. [ 6 ] 2020 CNN 10 0.73 – 0.89 medium
Kosemen and Birant [ 30 ] 2020 CNN 1 0.93 medium
Ishihara et al. [ 31 ] 2020 CNN 1 0.97 medium
Zhu et al. [ 32 ] 2020 Custom algorithm, SVM 5 0.98 lar ge
Huang [ 33 ] 2020 CNN 16 0.4 – 0.7 medium
Davila et al. [ 34 ] 2020 CNN 15 0.93 – 1.00 lar ge
Dadhich et al. [ 35 ] 2021 CNN 1 0.85 medium
Thiyam et al. [ 36 ] 2021 CNN 9 0.87 – 0.98 lar ge
Bajić and Job [ 37 ] 2021 CNN 7 1.00 medium
Thiyam et al. [ 38 ] 2021 CNN 25 0.70 – 0.90 medium
Bajić et al. [ 39 ] 2022 CNN 7 0.99 – 1.00 medium
Xue et al. [ 40 ] 2023 CNN 3 0.98 lar ge
T able 1: An overview of research in the area. A variable classification performance value indicates the use of dif ferent
methods or datasets. Datasets are classified into three categories: small (sum of images used is less than 1,000), medium
(sum of images is between 1,000 and 10,000), and lar ge (sum of images is greater than 10,000)
3.1 Used datasets
Data is fundamental in ML and critical for training mod-
els and ensuring their ef ficacy . Several datasets were used
to train, validate, and test the SCNN and SSCNN, as seen
in Figure 1 . The ”ReV ision” [ 16 ] dataset, created in 201 1,
served as a primary dataset but was downsized due to un-
availability . It is specifically used to test SSCNN and un-
der goes complete processing with the image preprocess-
ing algorithm presented in [ 4 , 6 , 37 ]. The ICDAR7 [ 28 ]
dataset, artificially generated using Python Matplotlib, sup-
ports training, validating, and testing both SCNN and SS-
CNN, also processed by the image preprocessing algorithm.
The ChartDS [ 44 , 45 ] dataset, created with Python Plotly ,
includes high-resolution, labeled chart images, used ex-
tensively for training, validating, and testing SCNN and
SSCNN, and is publicly available. The Linnaeus [ 46 ]
dataset, formed from internet-searched images, serves as
the H-class and is mildly preprocessed. CIF AR10 [ 47 ], an-
other internet-collected dataset, expands the H-class, shar -
ing Linnaeus’ preprocessing method. The A T&T DoF [ 48 ]
dataset, designed for human face recognition, supports SS-
CNN training with mild preprocessing.
For open-set classification methods, a Mendeley Data
repository [ 49 ] houses chart images or ganized similarly to
their distribution during method evaluation.
3.2 The ar chitectur e of the models
The demand for digital data processing has propelled com-
puting advancements, notably in deep CNNs (DCNNs) for
chart image classification. However , the need for a signifi-
188 Informatica 48 (2024) 185–198 F . Bajić et al.
(a)
(b)
(c)
(d)
(e)
Figure 1: A sample batch of images from dif ferent datasets used for training, testing, and validation of SCNN and SSCNN:
(a) ReV ision, (b) ICDAR7, (c) ChartDS, (d) Linnaeus, and (e) CIF AR10
cant volume of images per chart-type, especially with lim-
ited datasets, may lead to overfitting and reduced accuracy .
Deepening DCNNs introduces challenges like vanishing
or exploding gradients, requiring more computational re-
sources and a lar ger dataset for training. In response, the
SCNN architecture has been developed, dif fering from tra-
ditional CNN models by of fering reduced parameters and
computational complexity to mitigate resource demands.
3.2.1 An overview of the SCNN ar chitectur e
SCNNs [ 39 ] represent a subset of neural networks charac-
terized by fewer hidden layers than conventional CNNs. In
contrast to DCNNs, SCNNs are tailored to address specific
tasks with a minimal hidden layer configuration, making
them advantageous in scenarios with limited computational
resources or when handling relatively straightforward tasks.
While the layer count is a defining characteristic, the perfor -
mance of a neural network is influenced by various factors,
including the number of neurons per layer , the activation
function, and the optimization algorithm.
SCNNs of fer interpretability benefits, providing a clearer
understanding of decision-making processes and identify-
ing crucial input elements. They may exhibit improved
generalization and resilience to overfitting compared to
deeper networks. The ideal SCNN configuration needs to
be more strictly defined; practical considerations dictate a
minimal number of hidden layers tailored to the task’ s com-
plexity . A representative SCNN model [ 39 ] includes four
convolutional layers, two max-pooling layers, and a fully-
connected layer . Input chart images under go preprocess-
ing, enabling the network to learn intricate patterns. The
network’ s architecture should align with the task’ s specific
requirements.
3.2.2 An overview of the SSCNN ar chitectur e
SSCNNs [ 39 ], initially designed for signature verification,
faced limitations due to substantial computing needs. Re-
cent advancements, like CUDA technology , have acceler -
ated their learning time, making SSCNNs widely applicable
for various tasks. Its key features include positive results
for unseen classes, shared weights, explainable outcomes,
and ef fectiveness with minimal training images. Although
it emphasizes dataset quality over quantity , challenges, like
increased computational demands and loss function selec-
tion, must be considered during training.
SSCNNs are a specialized subtype of CNNs that use two
identical (SCNN) neural networks sharing weights and up-
dated parameters [ 39 ]. These models process dif ferent in-
put images, producing distinct output vectors for subse-
quent comparison. The contrastive loss (CL) function, cru-
Evaluation of Shallow Convolutional Neural Network… Informatica 48 (2024) 185–198 189
(a) (b) (c)
(d) (e) (f)
Figure 2: MCCV method shown on the three datasets that participated in the training, from left to right: dataset with 1,
20, and 500 images per chart-type. The first row shows the training process on the ChartDS dataset, and the second row
shows the training process on the ICDAR7 dataset. Increasing the dataset size increases the performance of the model.
cial in SSCNNs, calculates the Euclidean distance between
two vectors, determining spatial disparity . In SSCNNs, CL
gauges the similarity score (SS) between input vectors, with
a lower score indicating greater similarity . This approach
helps discern similarities and distinctions between input
vectors of the same or dif ferent classes. Unlike the SCNN
model, which provides a probability score, CL yields an SS
within the 0 to 1 range. V erification tests with identical im-
ages yield an SS of 0, confirming the model’ s correctness.
Conversely , the SS approaches zero for disparate images,
signifying the same class, while distinct classes yield a non-
zero SS.
4 Evaluation and r esults
In this section, we discuss the experimental evaluation
of the SCNN model on two publicly available datasets,
ChartDS and ICDAR7, encompassing various chart-types
and styles. The model underwent separate training on both
datasets, creating seven models for each dataset with in-
creasing image counts per chart-type in distinct subsets (1,
5, 10, 20, 50, 100, 250, and 500 images per class). Post-
training, the models were evaluated on both datasets us-
ing five non-utilized testing data subsets (S1 – S5), and the
outcomes were summarized. V arious learning parameters,
such as learning rate, number of epochs, and seed value for
randomizing network weights, were adjusted during train-
ing to ensure optimal model performance. The learning rate
decreased dynamically , and a validation dataset assessed
the model’ s performance. The training spanned 20 epochs,
and the seed value randomized network weights, prevent-
ing local minima conver gence. This randomization influ-
enced the network output, ensuring replicable results with
consistent seed values. The random value also played a piv-
otal role in selecting and partitioning images in training and
validation datasets, significantly impacting the model’ s fi-
nal result.
The SCNN model’ s experimental evaluation demon-
strated its ef fectiveness in accurately detecting and clas-
sifying various chart-types and styles. Additionally , to
enhance the SCNN model’ s learning, an SSCNN model
with similar learning parameters was introduced. The SS-
CNN model, trained over 100 epochs, utilized an additional
dataset, A T&T DoF , containing 40 classes. This supple-
mentary dataset improved the learning of similarities be-
tween pairs of images. The experimental evaluation of the
SSCNN model dif fered notably from the SCNN model,
providing SS instead of probability values. The SSCNN
model employed N-way-K-shot learning, where N is the
number of classes and K is the number of samples from
each class. In this evaluation, 7-way-1-shot learning was
used. The SSCNN model compared known and unknown
images, calculating SS for each pair . The dataset’ s qual-
ity proved crucial, with the model matching the unknown
190 Informatica 48 (2024) 185–198 F . Bajić et al.
image against all training images to eliminate anomalies.
The evaluation comprised 140 pairs of images for one class
and 980 for all seven classes, emphasizing the significance
of dataset quality over quantity in SSCNN model perfor -
mance.
4.1 Monte-Carlo
Monte Carlo cross-validation (MCCV) is a method that in-
volves iteratively dividing the training set and validation
on the original dataset, potentially including duplicate sam-
ples in the validation set. The division ratio between train-
ing and validation data varies randomly from 70% to 80%
for training and 20% to 30% for validation. Increasing it-
erations (recommended from 10 to 1,000) reduces uncer -
tainty and model bias. MCCV was repeated 100 times in
this model training due to the smaller dataset.
Optimal results for ChartDS were achieved with a ran-
dom value of 1529; for ICDAR7, it was 635. Figure 2
illustrates the 100-time repeated model training, showing
classification accuracy on the validation dataset for datasets
with 1, 20, and 500 images per chart-type. Learning was
challenging with just one image per chart-type but im-
proved with 20 images, showing oscillation in accuracy ,
with slightly higher average values for ChartDS. Maximum
accuracy was achieved with 500 images per chart-type, re-
gardless of the dataset. Cross-validation and MCCV signif-
icantly enhance ML model accuracy and robustness.
4.2 Confusion matrix
The confusion matrix visually presents actual and predicted
classes, with the diagonal indicating accurately predicted
classes. Standard metrics like Accuracy , Precision, Re-
call, and F1-score are derived from this matrix across chart-
types. Accuracy represents the percentage of correctly clas-
sified images, Precision is the proportion of true images
correctly classified, Recall represents the proportion of cor -
rectly classified true images, and F1-score is the harmonic
mean of Precision and Recall. Probability values for each
class introduce statistical representations denoted as R1,
R2, R3, R4, and R5, representing performance scores based
on average probability class values.
4.3 Open-set classification
Open-set classification is a widely recognized challenge in
the field of ML. It refers to the ability of a model to adapt to
unexpected images that it has not been trained on. This task
is relatively simple for humans - we can quickly determine
if an object belongs to a particular class just by looking at
it. However , this presents a unique computer challenge that
has yet to be solved by a single method. In open-set clas-
sification, the model is expected to classify images that do
not belong to any pre-defined classes as H-class. This pre-
vents unknown images from being classified incorrectly as
a known class. There have been several proposed solutions
to this challenge, such as the use of a binary SVM classifier
[ 50 ], the use of an SVM classifier with a defined threshold
[ 51 ], the use of statistical knowledge about the probabil-
ity of including or excluding a vector based on a threshold
value [ 52 ], rejecting specific inputs during network training
[ 53 ], creating a separate class that will contain all expected
unknown classes or creating a separate class that will con-
tain the lar gest possible number of known images [ 54 , 55 ],
and using Zero/One/Few shot learning [ 56 , 57 ]. Despite
these methods, open-set classification is still a challenging
problem.
4.4 Obtained r esults using SCNN
Figure 3a shows the model’ s performance on seven subsets
belonging to the ChartDS dataset, and Figure 3b shows the
model’ s performance on seven subsets belonging to the IC-
DAR7 dataset. The models were tested on both datasets.
The test results are consistent with MCCV testing on the
validation dataset. When one image per chart-type is used
for training the model, the classification accuracy is ap-
proximately 0. When 20 images per chart-type are used
for training, it is approximately 0.85; when 500 images per
chart-type are used for training, it is approximately 1. The
previously published journal article always used 20 images
per chart-type for testing [ 39 ]. This identical dataset is la-
beled S1. The research was extended to five datasets (S1
– S5), i.e., 100 images per chart-type, and minimal devia-
tion of the results is visible. When the model was trained
on one dataset and tested on another , a significant drop in
classification accuracy was visible, even though the same
charts were tested. A manual check of the achieved results
shows that the model classifies most images in the H-class.
Figure 3c , shows the H-class classification accuracy ,
which for the ChartDS dataset follows the Figure 3a , and
Figure 3d , shows the H-class classification accuracy , which
for the ICDAR7 dataset follows the Figure 3b . The model
successfully recognizes seven chart-types from the H-class.
When the ICDAR7 dataset is used to test the model trained
on the ChartDS dataset, the model achieves significantly
worse results than Figure 3d . Although both datasets con-
tain an H-class constructed from the CIF AR10 and Lin-
naeus datasets, in the case of the ICDAR7 dataset, the
model fails to distinguish chart images from H-class images
successfully . The reason is that the model, in this case, has
an additional class, so the charts should be recognized, but
due to the low probability , they are classified in the H-class.
Due to the low probability of the model on the chart classes,
more than 80% of the images are classified into the H-class.
When the ChartDS dataset is used to test the model trained
on the ICDAR7 dataset, the model achieves approximate
values, as shown in Figure 3b . The structure of ICDAR7
images is unified and significantly dif ferent from the H-
class; therefore, the model successfully recognizes images
belonging to the H-class.
A detailed presentation of the classification performance
values for the model trained on the ChartDS dataset and
Evaluation of Shallow Convolutional Neural Network… Informatica 48 (2024) 185–198 191
(a) (b) (c) (d)
Figure 3: SCNN classification accuracy for seven chart-types with H-class (a,b). Classification accuracy for the H-class
only (c,d). The model was trained on ChartDS (a,c). The model was trained on ICDAR7 (b,d). The evaluation results are
consistent with MCCV testing on the validation dataset. The blue line represents testing on ICDAR7, while the green line
represents testing on ChartDS. A continuous line is for S1, and a dashed line is for S1 - S5.
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 4: Confusion matrix for the model trained on the ChartDS dataset, and tested on the ICDAR7 dataset (a), for the
model trained and tested on the ChartDS (c), for the model trained on the ICDAR7 dataset and tested on the ChartDS (e),
and for the model trained and tested on the ICDAR7 dataset (g). The model was trained using 500 images per class and
tested using 100 images per class. Statistical analysis of realized probabilities in the confusion matrix (b,d,f,h). A: pie
and donut, B: horizontal bar , C: horizontal box, D: line, E: scatter , F: vertical bar , G: vertical box, H: other , R1: actual
class, R2: predicted & actual class, R3: actual class & not predicted class, R4: not actual class & predicted class, R5: not
predicted & not actual class.
tested on the ICDAR7 dataset is shown by the confusion
matrix Figure 4a and Figure 4b . If we were to isolate the H-
class, the accuracy would be 0.16, the recall would be 1.00,
and the F1-score would be 0.28. Statistical analysis shows
that the model achieves a very high probability , but in the
case of this dataset, the model fails to distinguish the charts
from the H-class. The results are above average when the
model is trained on the ChartDS dataset and tested on the
same dataset, as shown in Figure 4c . The model achieves a
classification accuracy greater than 0.99 and uses a neural
network with significantly reduced depth compared to other
research in this area, as shown in T able 1 .
Additionally , this is the first application of SCNN for
open-set classification in chart classification. From the
confusion matrix shown in Figure 4c , the H-class’ s accu-
racy is 0.99, recall 1.00, and F1-score 1.00. From the sta-
tistical analysis shown in Figure 4d , the model predicts
where the prediction is expected, and the probability value
192 Informatica 48 (2024) 185–198 F . Bajić et al.
is very high. In the case of incorrect predictions, the model
achieves responses with high probability (R3), but the pre-
diction probability is still lower than for true predictions
(R1, R2). By setting a threshold value, it is possible to make
an additional exclusion and thus achieve the maximum ac-
curacy of the H-class.
On the other hand, a detailed representation of the clas-
sification accuracy for the model trained on the ICDAR7
dataset and tested on the ChartDS dataset is shown by the
confusion matrix, Figure 4e , and Figure 4f . If we were
to isolate the H-class, the achieved classification accuracy
would be 0.79, recall 0.99, and F1-score 0.88. By compar -
ative analysis with previous testing, the model recognizes
three types of charts (A, D, G). Figure 4f shows the statis-
tical analysis of the confusion matrix shown in Figure 4e .
As in the previous testing, the model achieves a high
probability but fails to recognize specific charts. When the
model was trained on the ICDAR7 dataset and tested on the
same dataset, the results were comparable to those from the
research area, as shown in Figure 4g . The model achieves
a classification accuracy of 0.90. The model does not dis-
tinguish B-class from C-class and F-class from G-class. By
checking the E-class, it is visible that it cannot be distin-
guished from all other classes, which causes a total loss of
classification accuracy . The results of the statistical analy-
sis, shown in Figure 4h , are based on the previous tests. The
model achieves a high probability of correct and incorrect
predictions. The overall classification performance w ould
increase significantly by increasing the variety of images in
the training set.
4.5 Obtained r esults using SSCNN
The same dataset used to test the SCNN model was also
used to test the SSCNN model. The SSCNN model was
tested on both the C hartDS and ICDAR7 datasets. Figures
5a and 5b show that the SSCNN model has an advantage
over SCNN. The model successfully works with a mini-
mal number of images per class. When only one image
per chart-type is used for training the model, it achieves
a classification accuracy of 0.4 to 0.6. However , when 20
images per chart-type are used for model training, the clas-
sification accuracy drops, despite the increased number of
examples. The model reaches the maximum classification
accuracy when the input image is compared with one from
each class, K = 1. However , when the entire dataset is used
for comparison, the progress is slower , and more examples
are needed to achieve the same result. The model trained
on one dataset performs better when tested with data from
another dataset, but the results are still below average. By
performing manual inspection, the model does not distin-
guish between B-class and C-class, and F-class and G-class.
SSCNN can also generalize for inputs not participating in
the training process. Figures 5c and 5d show the current
H-class. SSCNN was not trained on the H-class in the
training process, and it sees the entire H-class for the first
time. The H-class classification accuracy for any dataset is
more than 0.8, and the maximum classification accuracy is
achieved with a model trained on 50 or more images per
class. The success of the SSCNN model on the H-class
can be seen in Figures 6a and 6b . The left image com-
pares two slightly preprocessed images from the Linnaeus
dataset, and the right image compares the same image from
the Linnaeus dataset with a scatter chart from the ChartDS
dataset. SSCNN successfully classifies the image into the
H-class with an SS of 0.21. In contrast, the scatter chart SS
is significantly higher at 0.60 (a value closer to 0 indicates
the same class). Additionally , the SSCNN was tested on
the ”ReV ision” dataset, using five chart-types: area chart,
bar chart, line chart, radar , and V enn diagram. All images
underwent the same preprocessing as the ChartDS and IC-
DAR7 dataset images. From each chart-type, 100 images
were randomly selected for testing. The total achieved clas-
sification accuracy is 0.96. It is essential to note that the
models did not see images from the ”ReV ision” dataset in
the training process.
4.6 In-Depth analysis of the networks
A comprehensive examination of the network’ s capabil-
ities is imperative before assessing the dataset using the
previously delineated network architecture. This scrutiny
aims to ascertain whether the convolutional layers can ef-
fectively discern crucial information from less significant
details within the image. The analysis employs the method
of feature extraction from convolutional layers, utilizing
Feature V isualization to represent learned abstract features
graphically .
Feature V isualization involves maximizing the activation
function to visually display the features of a single neuron,
channel, or layer in a neural network. While visualizing
individual neurons is impractical due to their foundational
nature, a computationally ef ficient approach involves visu-
alizing channel features for an individual layer or the entire
neural network. Features comprise receptive field filters
and activation matrices, each corresponding to a specific
filter , with the activation matrix representing the degree of
feature presence in the input image.
In the presented SCNN architecture, four convolutional
layers with varying numbers of channels and a shared 3
× 3 point receptive field filter are employed. An analy-
sis of the donut, horizontal bar , and line chart progression
through these layers is detailed in Figure 7 . The visualiza-
tion reveals that the first convolutional layer learns primi-
tive shapes like lines and edges, with subsequent layers fo-
cusing on patterns and textures. The final layer captures ob-
ject elements bounded by boundaries, emphasizing essen-
tial information for the subsequent fully-connected layer .
The visualization underscores the network’ s proficiency
in recognizing simple elements, defining chart-types, and
leveraging them for classification decisions. Image prepro-
cessing enhances the observation of critical primitive ele-
ments, while background and fill color are not emphasized.
Comparative analysis based on time and space complex-
Evaluation of Shallow Convolutional Neural Network… Informatica 48 (2024) 185–198 193
(a) (b) (c) (d)
Figure 5: SSCNN classification accuracy for seven chart-types without H-class (a,b). Classification accuracy for the H-
class only (c,d). The model is trained on ChartDS (a,c). The model is trained on ICDAR7 (b,d). The evaluation results
are consistent with MCCV testing on the validation dataset. The blue line represents testing on ICDAR7, while the green
line represents testing on ChartDS. A continuous line is for K=1, and a dashed line is for K=dataset.
(a) (b)
Figure 6: Comparison of two H-classes (Linnaeus). The SS is 0.21 (a). Comparison of H-class (Linnaeus) and scatter
chart (ChartDS). The SS is 0.60 (b).
Ar chitectur e
Number of param.
(× 10
6
)
MACs
(× 10
9
)
Number of
weight layers
T ime to train
(CPU) [s]
T ime to train
(GPU) [s]
Accuracy
SCNN (Bajić et al. [ 39 ]) 0.50 0.33 5 4,400 40 0.99
AlexNet 61.10 0.77 8 6,000 60 0.98
SSCNN (Bajić et al. [ 39 ]) 1.01 0.67 10 9,000 100 1.00
VGG16 138.36 15.61 16 - 800 0.95
MobileNet v1 4.24 0.56 23 13,200 240 0.99
Inception v3 27.16 5.75 48 - 200 0.99
ResNet50 v2 25.56 4.14 50 - 600 0.30
Xception 42.8 16.80 71 - - -
DenseNet121 7.98 2.90 121 - 700 1.00
Ef fNetB0 5.28 0.42 237 26,000 520 0.99
T able 2: Comparison of time and spatial complexities for dif ferent architectures. The symbol ”-” denotes the networks
ability to learn with available computational resources.
ity , defined by the sum of all weights and biases, indicates
that convolutional layers influence both factors. T ime com-
plexity includes convolutional layers, max-pooling, and
fully-connected layers, with convolutional layers account-
ing for 90% of computational time. The increase in convo-
lutional layers significantly escalates space and time com-
plexity .
The comparison of time and space complexity and eval-
uation of developed SCNN, SSCNN, and selected archi-
tectures is listed in T able 2 . The process was conducted
using well-established open-source frameworks, including
T ensorFlow , Keras, and PyT orch. All models underwent
training using the AMD EPYC 7B12 central processing unit
(CPU) with 13 GB of available memory , supported by the
NVIDIA T4 graphics processing unit (GPU) with 16 GB of
available memory . The parameters used during the train-
ing phase remained consistent across dif ferent models to
ensure the consistency and repeatability of the results ob-
tained during the evaluation. For the purpose of comparing
the achieved results with other architectures, the ChartDS
dataset was used for training, consisting of a total of 500
images per chart-type, while testing utilized 100 images per
chart-type.
When compared with other state-of-the-art architectures
194 Informatica 48 (2024) 185–198 F . Bajić et al.
(a)
(b) (c) (d) (e)
(f) (g) (h) (i)
(j) (k) (l) (m)
Figure 7: Progress display of donut (b,c,d,e), horizontal bar (f,g,h,i), and line (j,k,l,m) chart through convolutional layers.
The first row (a) shows an example of a receptive field filter with a resolution of 3× 3 points. The further rows show
several channels of each convolutional layer: (b,f,j) Conv2d-1, (c,g,k) Conv2d-2, (d,h,l) Conv2d-3, (e,i,m) Conv2d-4.
in the field, the proposed SCNN architecture excels in chart
image classification through its innovative design. Com-
prising four convolutional layers, two max-pooling layers,
and one fully-connected layer , it strategically processes vi-
sual information. This architecture demonstrates superior
ef ficiency , achieving significantly higher classification ac-
curacy than other methodologies in the field, as shown in
T able 1 .
The success of SCNN can be attributed to its ability to
capture and hierarchically process intricate features present
in chart images. Furthermore, SCNN’ s performance bene-
fits from its adaptability to smaller datasets, reducing the
computational resources required for training. This ef fi-
ciency is crucial in real-world scenarios where lar ge, la-
beled datasets may be challenging to obtain. The architec-
ture’ s ef fectiveness is highlighted in its successful training
on publicly available datasets, ChartDS and ICDAR7, and
its consistently high accuracy across various chart-types.
The SCNN architecture achieves optimal results by com-
bining advanced convolutional neural network principles
with a design that prioritizes ef ficiency , adaptability to
smaller datasets, and robust feature extraction mechanisms.
5 Discussion
The evolution of employed methodologies can be observed
through the examination of related research. Presently , the
predominant methods hinge on CNNs, which have demon-
strated state-of-the-art performance in chart-type classifi-
Evaluation of Shallow Convolutional Neural Network… Informatica 48 (2024) 185–198 195
cation. Notably , prevalent CNN models are rooted in
deep architectures, demanding substantial computational
resources that may not be universally accessible. Address-
ing this computational constraint, we introduce the SCNN
architecture tailored for chart image classification. The ex-
amination of related research also delves into the signifi-
cance of image preprocessing, elucidating its profound in-
fluence on the neural network’ s learning outcomes. Empir -
ical findings reveal a positive correlation between adept im-
age preprocessing and neural network learning, even within
a restricted dataset. Furthermore, an extensive dataset, de-
noted as ChartDS, has been curated. T o facilitate compar -
ative analyses with existing research, the SCNN is eval-
uated on renowned publicly available datasets, including
one from ”ReV ision” and the ICDAR7 competition dataset.
The attained state-of-the-art results underscore the SCNN’ s
capacity for learning across diverse datasets, provided the
images contain essential information exclusively .
In the context of employing SCNN w ithin a system ded-
icated to objectives such as chart description generation,
chart data extraction, or the identification and extraction
of scientific figures from documents, a fundamental lim-
itation arises. The inherent deficiency lies in the system’ s
lack of knowledge concerning entities beyond chart images,
leading to its incapacity in such diverse tasks. T o address
this limitation, we introduced two supplementary datasets,
CIF AR10 and Linnaeus, encompassing various images en-
countered in authentic systems. While SCNN exhibits util-
ity in open-set classification, superior outcomes are dis-
cerned through SSCNN, a composite structure comprising
two distinct SCNNs.
Lastly , an examination of the SCNN encompassed an
evaluation of its time and space complexity . Comparative
analyses were conducted with contemporary networks such
as ResNet, Inception, Xception, among others, revealing a
notable reduction in both parameters and computational op-
erations. Remarkably , the SCNN demonstrated adaptabil-
ity to training on machines with or without GPU support,
featuring significantly diminished real-time requirements
compared to alternative deep architectures.
Further scrutiny included subjecting the SCNN to lim-
itations and unveiling potential avenues for future re-
search. Specifically , attention is warranted towards refin-
ing the classification of chart sub-types, such as horizon-
tally grouped or stacked charts. Additionally , avenues for
fine-tuning the SCNN include enhancing its capability to
classify partially visible charts or discerning multiple charts
within a singular image.
6 Conclusion
Methods for classifying chart images have evolved over
time. T raditional methods used before 2015 had limited ac-
curacy and required chart images to follow predefined rules.
However , with the advent of SVMs and CNNs, classifi-
cation accuracy has increased significantly . In particular ,
”ReV ision” was the first journal article to introduce multi-
class classification using SVM. The authors used various
image preprocessing methods and algorithms to achieve the
highest possible chart image classification accuracy . W ith
the emer gence of CNN in the field, the methods and algo-
rithms for preprocessing the chart image were retained, and
the number of classes was significantly increased. This re-
search presents the SCNN architecture used for chart im-
age classification. The architecture comprises four con-
volutional layers, two max-pooling layers, and one fully-
connected layer . It can be used independently or in com-
bination with another network, such as SSCNN. The ex-
perimental evaluation shows that the SCNN architecture is
ef ficient and achieves significantly higher classification ac-
curacy than other research in the field. Additionally , it re-
quires smaller datasets for training and reduces the neces-
sary computer resources. Using the SSCNN architecture,
high classification accuracy values can be achieved with
small datasets. Based on the SCNN architecture, 32 models
were created and trained on two publicly available datasets,
ChartDS and ICDAR7. Each of the models was tested on
both datasets, allowing for the creation of a reference start-
ing point for future research. The models were also tested
on open-set classification, where they achieved high accu-
racy in recognizing chart images from other images. How-
ever , the SSCNN architecture and N-way-K-shot learning
showed a significant advantage in open-set classification.
The SCNN architecture needs an additional class, while the
SSCNN architecture achieves generalization, even for im-
ages that have never participated in model training.
In the future, we plan to expand our evaluation to support
more chart-types. The H-class should also be expanded to
include a more diverse selection of images not belonging to
any chart-types.
Refer ences
[1] J. Choi, S. Jung, D. G. Park, J. Choo, and N. Elmqvist,
“V isualizing for the non￿visual: Enabling the visu-
ally impaired to use visualization,” Computer Graph-
ics Forum , vol. 38, 2019, https://doi.org/10.1111/cgf.
13686 .
[2] J. Poco and J. Heer , “Reverse￿engineering visualiza-
tions: Recovering visual encodings from chart im-
ages,” Computer Graphics Forum , vol. 36, 2017,
https://doi.org/10.1111/cgf.13193 .
[3] F . Bajić and J. Job, “Review of chart image detection
and classification,” International Journal on Docu-
ment Analysis and Recognition (IJDAR) , pp. 1–22,
2023, https://doi.org/10.1007/s10032- 022- 00424- 5 .
[4] F . Bajić, J. Job, and K. Nenadić, “Chart classifica-
tion using simplified vgg model,” 2019 International
Confer ence on Systems, Signals and Image Pr ocess-
ing (IWSSIP) , pp. 229–233, 2019, https://doi.org/10.
1109/IWSSIP.2019.8787299 .
196 Informatica 48 (2024) 185–198 F . Bajić et al.
[5] K. C. Shahira and A. Lijiya, “Document image classi-
fication: T owards assisting visually impaired,” TEN-
CON 2019 - 2019 IEEE Region 10 Confer ence (TEN-
CON) , pp. 852–857, 2019, https://doi.org/10.1109/
TENCON.2019.8929594 .
[6] F . Bajić, J. Job, and K. Nenadic, “Data visualiza-
tion classification using simple convolutional neural
network model,” International Journal of Electrical
and Computer Engineering , vol. 1 1, pp. 43–51, 2020,
https://doi.org/10.32985/ijeces.11.1.5 .
[7] C. Rane, S. M. Subramanya, D. S. Endluri, J. W u, and
C. L. Giles, “Chartreader: Automatic parsing of bar -
plots,” 2021 IEEE 22nd International Confer ence on
Information Reuse and Integration for Data Science
(IRI) , pp. 318–325, 2021.
[8] W . Dai, M. W ang, Z. Niu, and J. Zhang, “Chart de-
coder: Generating textual and numeric information
from chart images automatically ,” J. V is. Lang. Com-
put. , vol. 48, pp. 101–109, 2018, https://doi.org/10.
1016/J.JVLC.2018.08.005 .
[9] A. Balaji, T . Ramanathan, and V . Sonathi, “Chart-
text: A fully automated chart image descriptor ,”
ArXiv , vol. abs/1812.10636, 2018, https://doi.org/10.
48550/arXiv.1812.10636 .
[10] D. Jung, W . Kim, H. Song, J. Hwang, B. Lee, B. H.
Kim, and J. Seo, “Chartsense: Interactive data extrac-
tion from chart images,” Pr oceedings of the 2017 CHI
Confer ence on Human Factors in Computing Systems ,
2017, https://doi.org/10.1145/3025453.3025957 .
[1 1] I. Redeke, “Image & graphic reader ,” Pr oceedings
2001 International Confer ence on Image Pr ocess-
ing (Cat. No.01CH37205) , vol. 1, pp. 806–809 vol.1,
2001, https://doi.org/10.1109/ICIP.2001.959168 .
[12] J. Gao, Y . Zhou, and K. E. Barner , “V iew: V isual
information extraction widget for improving chart
images accessibility ,” 2012 19th IEEE International
Confer ence on Image Pr ocessing , pp. 2865–2868,
2012, https://doi.org/10.1109/ICIP.2012.6467497 .
[13] A. Mishchenko and N. V assilieva, “Model-based
recognition and extraction of information from chart
images,” J. Multim. Pr ocess. T echnol. , vol. 2, pp. 76–
89, 201 1.
[14] ——, “Model-based chart image classification,” in In-
ternational Symposium on V isual Computing , 201 1,
https://doi.org/10.1007/978- 3- 642- 24031- 7_48 .
[15] N. Siegel, Z. Horvitz, R. Levin, S. K. Div-
vala, and A. Farhadi, “Figureseer: Parsing result-
figures in research papers,” in Eur opean Confer ence
on Computer V ision , 2016, https://doi.org/10.1007/
978- 3- 319- 46478- 7_41 .
[16] M. Savva, N. Kong, A. Chhajta, L. Fei-Fei,
M. Agrawala, and J. Heer , “Revision: automated clas-
sification, analysis and redesign of chart images,”
Pr oceedings of the 24th annual ACM symposium on
User interface softwar e and technology , 201 1, https:
//doi.org/10.1145/2047196.2047247 .
[17] R. R. Nair , N. Sankaran, I. Nwogu, and
V . Govindaraju, “Automated analysis of line
plots in documents,” 2015 13th Interna-
tional Confer ence on Document Analysis and
Recognition (ICDAR) , pp. 796–800, 2015,
https://doi.org/10.1109/ICDAR.2015.7333871 .
[18] Y . Shi, Y . W ei, T . W u, and Q. Liu, “Statistical
graph classification in intelligent mathematics prob-
lem solving system for high school student,” 2017
12th International Confer ence on Computer Science
and Education (ICCSE) , pp. 645–650, 2017, https:
//doi.org/10.1109/ICCSE.2017.8085572 .
[19] V . Karthikeyani and S. Nagarajan, “Machine learning
classification algorithms to recognize chart types in
portable document format (pdf) files,” International
Journal of Computer Applications , vol. 39, pp. 1–5,
2012, https://doi.org/10.5120/4789- 6997 .
[20] B. Cheng, R. J. Stanley , S. K. Antani, and G. R.
Thoma, “Graphical figure classification using data fu-
sion for integrating text and image features,” 2013
12th International Confer ence on Document Analysis
and Recognition , pp. 693–697, 2013.
[21] X. Liu, B. T ang, Z. W ang, X. Xu, S. Pu, D. T ao,
and M. Song, “Chart classification by combining deep
convolutional networks and deep belief networks,”
2015 13th International Confer ence on Document
Analysis and Recognition (ICDAR) , pp. 801–805,
2015, https://doi.org/10.1109/ICDAR.2015.7333872 .
[22] P . R. S. C. Junior , A. A. de Freitas, R. D. Akiyama,
B. P . Miranda, T . Araújo, C. G. R. Santos, B. S.
Meiguins, and J. M. de Morais, “Architecture pro-
posal for data extraction of chart images using con-
volutional neural network,” 2017 21st International
Confer ence Information V isualisation (IV) , pp. 318–
323, 2017, https://doi.org/10.1109/IV.2017.37 .
[23] J. Amara, P . Kaur , M. Owonibi, and B. Bouaziz,
“Convolutional neural network based chart image
classification,” 2017.
[24] L. Battle, P . Duan, Z. Miranda, D. Mukusheva,
R. Chang, and M. Stonebraker , “Beagle: Automated
extraction and interpretation of visualizations from
the web,” Pr oceedings of the 2018 CHI Confer ence
on Human Factors in Computing Systems , 2017.
[25] A. Y . Lin, J. Ford, E. Adar , and B. J. Hecht, “V izby-
wiki: Mining data visualizations from the web to en-
rich news articles,” Pr oceedings of the 2018 W orld
Evaluation of Shallow Convolutional Neural Network… Informatica 48 (2024) 185–198 197
W ide W eb Confer ence , 2018, https://doi.org/10.1145/
3178876.3186135 .
[26] K. V . Jobin, A. Mondal, and C. V . Jawahar , “Doc-
figure: A dataset for scientific document figure clas-
sification,” 2019 International Confer ence on Doc-
ument Analysis and Recognition W orkshops (IC-
DAR W) , vol. 1, pp. 74–79, 2019, https://doi.org/10.
1109/ICDARW.2019.00018 .
[27] X. Liu, D. Klabjan, and P . N. Bless, “Data extraction
from charts via single deep neural network,” ArXiv ,
vol. abs/1906.1 1906, 2019.
[28] K. Davila, B. U. Kota, S. Setlur , V . Govindaraju,
C. T ensmeyer , S. Shekhar , and R. Chaudhry , “Icdar
2019 competition on harvesting raw tables from in-
fographics (chart-infographics),” 2019 International
Confer ence on Document Analysis and Recognition
(ICDAR) , pp. 1594–1599, 2019, https://doi.org/10.
1109/ICDAR.2019.00203 .
[29] P . Kaur and D. Kiesel, “Combining image and cap-
tion analysis for classifying charts in biodiversity
texts,” in VISIGRAPP , 2020, https://doi.org/10.5220/
0008946701570168 .
[30] C. Kosemen and D. Birant, “Multi-label classification
of line chart images using convolutional neural net-
works,” SN Applied Sciences , vol. 2, pp. 1–20, 2020,
https://doi.org/10.1007/S42452- 020- 3055- Y .
[31] T . Ishihara, K. Morita, N. C. Shirai, T . W akabayashi,
and W . Ohyama, “Chart-type classification using
convolutional neural network for scholarly figures,”
in Asian Confer ence on Pattern Recognition , 2019,
https://doi.org/10.1007/978- 3- 030- 41299- 9_20 .
[32] J. Zhu, J. Ran, R. K.-W . Lee, K. Choo, and Z. Li, “Au-
tochart: A dataset for chart-to-text generation task,”
in Recent Advances in Natural Language Pr ocessing ,
2021.
[33] S. Huang, “An image classification tool of wikimedia
commons,” 2020.
[34] K. Davila, F . Xu, S. Ahmed, D. A. Mendoza,
S. Setlur , and V . Govindaraju, “Icpr 2022: Challenge
on harvesting raw tables from infographics (chart-
infographics),” 2022 26th International Confer ence
on Pattern Recognition (ICPR) , pp. 4995–5001, 2022,
https://doi.org/10.1109/ICPR56361.2022.9956289 .
[35] K. Dadhich, S. C. Daggubati, and J. Sreevalsan-
Nair , “Barchartanalyzer: Digitizing images of bar
charts,” in International Confer ence on Image Pr o-
cessing and V ision Engineering , 2021, https://doi.org/
10.5220/0010408300170028 .
[36] J. Thiyam, S. R. Singh, and P . K. Bora, “Challenges
in chart image classification: a comparative study
of dif ferent deep learning methods,” Pr oceedings of
the 21st ACM Symposium on Document Engineering ,
2021, https://doi.org/10.1145/3469096.3474931 .
[37] F . Bajić and J. Job, “Chart classification using siamese
cnn,” Journal of Imaging , vol. 7, 2021, https://doi.org/
10.3390/jimaging7110220 .
[38] J. Thiyam, S. R. Singh, and P . K. Bora, “Ef-
fect of attention and triplet loss on chart classifica-
tion: a study on noisy charts and confusing chart
pairs,” Journal of Intelligent Information Systems ,
vol. 60, pp. 731 – 758, 2022, https://doi.org/10.1007/
s10844- 022- 00741- 5 .
[39] F . Bajić, O. Orel, and M. Habijan, “A multi-purpose
shallow convolutional neural network for chart im-
ages,” Sensors (Basel, Switzerland) , vol. 22, 2022,
https://doi.org/10.3390/s22207695 .
[40] W . Xue, D. Chen, B. Y u, Y . Chen, S. Zhou,
and W . Peng, “Chartdetr: A multi-shape detec-
tion network for visual chart recognition,” ArXiv ,
vol. abs/2308.07743, 2023, https://doi.org/10.48550/
arXiv.2308.07743 .
[41] P . Chagas, R. D. Akiyama, A. S. G. Meiguins, C. G. R.
Santos, F . de Oliveira Saraiva, B. S. Meiguins, and
J. M. de Morais, “Evaluation of convolutional neu-
ral network architectures for chart image classifica-
tion,” 2018 International Joint Confer ence on Neural
Networks (IJCNN) , pp. 1–8, 2018, https://doi.org/10.
1109/IJCNN.2018.8489315 .
[42] T . Araújo, P . Chagas, J. B. Alves, C. G. R. Santos,
B. S. Santos, and B. S. Meiguins, “A real-world ap-
proach on the problem of chart recognition using clas-
sification, detection and perspective correction,” Sen-
sors (Basel, Switzerland) , vol. 20, 2020, https://doi.
org/10.3390/S20164370 .
[43] A. Dhote, M. H. Javed, and D. S. Doermann,
“A survey and approach to chart classification,”
in ICDAR W orkshops , 2023, https://doi.org/10.1007/
978- 3- 031- 41498- 5_5 .
[44] F . Bajić, “Chartds - chartdataset.zip. figshare,” 2022,
https://doi.org/10.6084/m9.figshare.19524844.v1 .
[45] F . Bajić, “Chartds 2022 - chartdataset.zip srce dabar ,”
2022, https://urn.nsk.hr/urn:nbn:hr:102:276396 .
[46] G. Chaladze and K. L, “Linnaeus 5 dataset for ma-
chine learning,” 2017.
[47] A. Krizhevsky , “Learning multiple layers of features
from tiny images,” 2009.
[48] S. Z. Li and A. K. Jain, “Handbook of face
recognition,” 201 1, https://doi.org/10.1007/
978- 0- 85729- 932- 1 .
198 Informatica 48 (2024) 185–198 F . Bajić et al.
[49] M. H. F . Bajić and K. Nenadić, “A synthetic dataset of
dif ferent chart types for advancements in chart iden-
tification and visualization,” 2024, https://doi.org/10.
1016/j.dib.2024.110233 .
[50] F . de Oliveira Costa, M. Eckmann, W . J. Scheirer ,
and A. Rocha, “Open set source camera attribution,”
2012 25th SIBGRAPI Confer ence on Graphics, Pat-
terns and Images , pp. 71–78, 2012, https://doi.org/10.
1109/SIBGRAPI.2012.19 .
[51] W . J. Scheirer , A. de Rezende Rocha, A. Sapkota,
and T . E. Boult, “T oward open set recognition,” IEEE
T ransactions on Pattern Analysis and Machine Intel-
ligence , vol. 35, no. 7, pp. 1757–1772, 2013, https:
//doi.org/10.1109/TPAMI.2012.256 .
[52] E. M. Rudd, L. P . Jain, W . J. Scheirer , and T . E.
Boult, “The extreme value machine,” IEEE T ransac-
tions on Pattern Analysis and Machine Intelligence ,
vol. 40, pp. 762–768, 2015, https://doi.org/10.48550/
arXiv.1506.06112 .
[53] T . E. Boult, S. Cruz, A. R. Dhamija, M. Günther ,
J. Henrydoss, and W . J. Scheirer , “Learning and the
unknown: Surveying steps toward open world recog-
nition,” in AAAI Confer ence on Artificial Intelligence ,
2019, https://doi.org/10.1609/aaai.v33i01.33019801 .
[54] H. Zhang and V . M. Patel, “Sparse representation-
based open set recognition,” IEEE T ransactions
on Pattern Analysis and Machine Intelligence ,
vol. 39, pp. 1690–1696, 2017, https://doi.org/10.
1109/TPAMI.2016.2613924 .
[55] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster
r -cnn: T owards real-time object detection with re-
gion proposal networks,” IEEE T ransactions on Pat-
tern Analysis and Machine Intelligence , vol. 39,
pp. 1 137–1 149, 2015, https://doi.org/10.48550/arXiv.
1506.01497 .
[56] Z. Chen, Y . Fu, Y . Zhang, Y .-G. Jiang, X. Xue, and
L. Sigal, “Multi-level semantic feature augmentation
for one-shot learning,” IEEE T ransactions on Image
Pr ocessing , vol. 28, pp. 4594–4605, 2018, https://doi.
org/10.48550/arXiv.1804.05298 .
[57] Y . Fu, T . Xiang, Y .-G. Jiang, X. Xue, L. Sigal,
and S. Gong, “Recent advances in zero-shot recog-
nition: T oward data-ef ficient understanding of vi-
sual content,” IEEE Signal Pr ocessing Magazine ,
vol. 35, pp. 1 12–125, 2017, https://doi.org/10.1109/
MSP.2017.2763441 .