https://doi.org/10.31449/inf.v48i9.5886 Informatica 48 (2024) 163–176 163 
 
River Ship Monitoring Based on Improved Deep-Sort Algorithm 
Yan Zhai 
Internship and Training Center, Zhengzhou Vocational College of Finance and Taxation, Zhengzhou, 450000, China 
E-mail: rocky_zhai@163.com 
Keywords: ship monitoring, object detection, target tracking, simple online real-time tracking 
Received: March 11, 2024 
As the economy develops rapidly, waterway transportation has gradually become an important part of 
the logistics industry. A model was built to improve the low detection and tracking accuracy of ship 
objects. First, the dilated convolution was introduced into the YOLOv3. A prediction scale of 104×104 
and L2 regularization were introduced to detect small objects. A target detecting model using improved 
YOLOv3 was constructed. Then the improved YOLOv3 was used as the detector for the deep simple 
online real-time tracking algorithm. The D-IoU distance was introduced into the cascaded matching 
loss to build a ship tracking model based on the improved tracking algorithm. These results confirmed 
that the improved YOLOv3 had an accuracy of 6345, a detecting time of 21.3 seconds, a recall rate of 
93.25%, a missing alarm rate of 6.76%, and an average precision of 92.53%. The proposed object 
detection model performed the best in terms of detecting accuracy, missing and false alarm rates, and 
average precision indicators, with values of 87.48%, 5.14%, 12.51%, and 94.35%, respectively. The 
proposed ship tracking model had the highest recall rate of 64.7% and a multi-target tracking 
accuracy of 61.8%. This study confirms that the proposed object detection and tracking models have 
good performance and contribute to the intelligent development of the waterway transportation 
industry. 
Povzetek: Model za nadzor rečnih ladij uporablja izboljšan algoritem Deep-SORT z uvedbo 
dilatacijske konvolucije in L2 regularizacije v YOLOv3, kar povečuje natančnost zaznavanja in 
sledenja ladij.
1 Introduction 
With the continuous deepening of economic globalization, 
the shipping industry also develops rapidly, but the 
increase of ships also poses serious challenges to river 
management [1]. Ship monitoring is an important task in 
river management, which not only ensures the normal 
operation of river transportation, but also ensures the 
safety of navigation. Therefore, the driving behavior of 
the drivers can be monitored, and navigation hazards 
caused by non-standard driving and unauthorized 
departure from the post can be avoided. The object 
detection and tracking are important methods for ship 
monitoring. Object detection is a key direction in the 
image processing, with the task of identifying all 
interested targets in the image. Target tracking refers to 
continuously tracking the position and shape information 
of targets in video sequences and updating the status of 
targets in real-time [2, 3]. The traditional video ship 
target monitoring method relies on manual searching and 
discrimination by the human eye. The target monitoring 
method has low efficiency and high cost due to the 
limited energy of the human body. As intelligent 
information processing technology develops, deep 
learning has extensive application in object detection and 
tracking, improving detection accuracy while saving 
labor costs [4]. However, the background in river ship 
monitoring videos is often complex and faces the 
challenge of small object detection, which affects the 
accuracy of object detection and tracking technologies. 
The detecting performance of existing research 
algorithms is needed for improvement [5]. In this context, 
ship tracking models are built based on an improved 
YOLOv3 object detection model and an improved Deep 
Simple Online Real-time Tracking (Deep-Sort) algorithm. 
There are two main innovations in this study. Firstly, 
dilated convolution is introduced into the backbone 
network of YOLOv3, and a prediction scale of 104×104 
and L2 regularization is introduced to detect small objects. 
Secondly, this improved YOLOv3 will be regarded as the 
detector of Deep-Sort, and the D-IoU distance is 
introduced into the loss of cascade matching. The main 
structure of the study includes four parts. Firstly, an 
analysis is conducted on the current research. Secondly, 
an object detection model based on improved YOLOv3 
and a ship tracking model based on improved Deep-Sort 
are built. Then an analysis of the application effectiveness 
of the proposed model is conducted. Finally, there is the 
conclusion of the entire study. 
2 Related works 
Ship monitoring is an important part in ensuring the safe 
operation of ships. Potential safety hazards can be 
identified and resolved in a timely manner by monitoring 
164   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
the machinery, equipment, and electrical systems of ships, 
ensuring the safety and reliability of ships. Tsoumpris and 
Theotokatos developed a method for monitoring the 
autonomous ships using dynamic Bayesian networks and 
rule-based energy management strategies. They captured 
performance indicators while considering the reliability 
of the entire system and its components. These results 
confirmed that the proposed means heightened the ship 
monitoring capability of hybrid power plants [6]. 
Capezza et al. stated that the rapid development of data 
collection technology on modern ships led to data rich. 
The functional regression control charts addressed the 
issue of whether the observed CO2 emission profile was 
as expected given covariate values [7]. There are 
problems such as low detecting accuracy, displaying 
delay, and computational blockage in ship detecting in 
surveillance video. Therefore, Zheng et al. optimized the 
anchor box algorithm in YOLOv5 based on the 
characteristics of ship targets, and t-SNE was used to 
reduce and visualize dataset. These results confirmed that 
this method improved ship detecting accuracy and speed 
[8]. Wang et al. proposed a model using SSD framework 
that detected different feature parameters in response to 
the feature detection-based ship recognition technology in 
the maritime. These results confirmed that the proposed 
model had good compatibility and performed well in 
efficiency and recognition accuracy, with certain 
theoretical value and application prospects [9]. Kim et al. 
addressed the safety and reliability issues associated with 
autonomous and remote control of ships. The safety 
challenges of automatic ship operations in a hybrid 
navigation environment and several methods were 
studied to reduce safety risks. Potential practical and 
research interests in ship navigation were also discussed 
in the future [10]. Wang et al. developed a means assisted 
frictional electric intelligent pad system for monitoring 
crew members, which not only obtained crew information 
but also did not need to consider privacy issues in video 
shooting. The comprehensive monitoring of crew and 
cargo was achieved, and the ability and efficiency were 
improved to handle emergency situations [11]. 
Deep-Sort is a multi-target tracking algorithm based 
on object detection, and the quality of the object detection 
algorithm will affect its tracking performance. Meemesis 
et al. proposed a real-time multi-target tracking 
framework based on the improved Deep-Sort algorithm, 
which was combined with the YOLO detection method to 
address the low accuracy in tracking multiple objects. 
These results confirmed that the proposed improved 
Deep-Sort algorithm was effective, and the multi-target 
tracking framework had good performance [12]. Chang et 
al. proposed an abnormal behavior detection model with 
pedestrian detection and tracking, combining YOLOv3 
and Deep-Sort, to improve the behavior recognition and 
detection of cameras. They used a network to predict 
abnormal behavior. This helped to satisfy the needs of 
real-time monitoring systems. These results confirmed 
that the proposed method had good recognition accuracy 
[13]. Mathias et al. proposed an adaptive Deep-Sort and 
YOLOv3 detecting and tracking scheme to address the 
difficulty of tracking and recognizing underwater objects 
caused by light refraction. This scheme could be used for 
tracking and recognition of underwater objects that were 
occluded. These results confirmed that the proposed 
scheme had good application effects in occlusion object 
detection tasks from different perspectives [14]. Zou et al. 
proposed a multi-target tracking model using an 
improved YOLOv3 as the detector for Deep-Sort to 
address the tracking livestock behavior and health status 
in livestock farming. The backbone of YOLOv3 was 
replaced by MobileNetV2 to improve the detecting speed. 
These results confirmed that the proposed model had high 
detection accuracy and performance [15]. Rishika et al. 
addressed the low accuracy in detecting and counting 
intelligent vehicles in the highway management. A 
Deep-Sort model based on YOLO-V4 was used to detect 
and track vehicles in real-time from video sequences and 
designed a vision-based vehicle detection and counting 
system. These results confirmed that the proposed 
method had certain feasibility and effectiveness [16]. 
Sahoo et al. proposed an optimization model that 
combined a region-based Convolutional Neural Network 
(CNN) with a detector and Deep-Sort to predict social 
distance in public places, addressing the personal loyalty 
monitoring toward social distance norms. These results 
confirmed that the proposed model had good distance 
detection performance [17]. The summary of relevant 
literature is shown in Table 1. 
 
Table 1: Summary of relevant literature 
Method Performance metrics Key findings Insufficient 
Tsoumpris et al. 
[6] 
Component reliability, 
engine speed 
Proved the usefulness of 
expanding ship monitoring 
functions 
Lack of practical 
application 
experiments 
Capezza et al. [7] 
Carbon dioxide 
emissions 
Can be used for automatic 
tracking mode and trend 
Do not directly allow 
real-time feedback 
control 
Zheng et al. [8] 
Accuracy and detection 
speed 
Can achieve more accurate 
target frame positioning and 
improve target detection 
accuracy 
The network structure 
is relatively complex 
River Ship Monitoring Based on Improved Deep-Sort Algorithm Informatica 48 (2024) 163–176  165 
Wang et al. [9] 
Calculating time, 
processing frame rate, 
and recognition 
accuracy 
An important component of 
intelligent ship automatic 
recognition edge platform 
The actual application 
effect has not been 
verified, and the 
stability is poor 
Kim et al. [10] Security challenges 
Security challenges increase 
with the improvement of 
ship automation level 
Increased complexity 
Wang et al. [11] Efficiency 
Comprehensive monitoring 
of crew and cargo has been 
achieved 
Increased complexity 
Meimetis et al. 
[12] 
Detection accuracy 
The improved Deep SORT 
and YOLO detection 
methods have good detection 
performance 
Increased complexity 
Chang et al. [13] Recognition rate 
Can meet the needs of 
real-time monitoring 
Increased complexity 
Mathias et al. 
[14] 
Efficiency and accuracy 
Capable of underwater 
tracking and recognition in 
complex scenarios 
Increased complexity 
Zou et al. [15] 
Average accuracy, 
identity switching 
Implemented adaptive 
learning of multi-scale 
features of objects 
Identity switching 
reduced 
Rishika et al. 
[16] 
Accuracy 
Real time detection and 
tracking of vehicle video 
sequences have been 
achieved 
The model calculation 
takes a long time 
Sahoo et al. [17] 
Accuracy, total loss, and 
training time 
Effectively monitoring social 
distance 
Increased complexity 
 
In summary, although the ship detection 
technologies and Deep-Sort are studied widely, the 
complex background and concentrated target distribution 
of ship images pose significant challenges for ship 
detection and tracking. Therefore, the study will utilize 
the modified YOLOv3 and Deep-Sort algorithms to build 
object detection and tracking models for effective 
monitoring of river ships. 
3 Object detection and tracking 
model construction for river ships 
Due to the complex and diverse environment and the 
unique shooting perspective, traditional object detection 
and tracking algorithms are no longer sufficient for ship 
object detection in complex backgrounds. An object 
detection model based on improved YOLOv3 and a ship 
tracking model based on improved Deep-Sort are built to 
achieve precise detection and tracking of river ships. 
 
 
 
 
3.1 Construction of an improved 
YOLOv3-based object detection model 
The YOLO algorithms use the DarkNet model as the 
feature extraction network for object detection tasks, 
which can obtain rich features by extracting multi-level 
target information. The YOLOv3 network can predict 
multiple bounding box and category probabilities 
simultaneously on the entire image through a single 
forward operation. YOLOv3 has good detection accuracy 
and real-time performance, so YOLOv3 is used as the 
head of the object detection controller [18]. The YOLOv3 
network forms a backbone network, DarkNet-53, with 
stronger feature extraction ability after borrowing from 
residual networks. DarkNet-53 alleviates the 
consumption of computing memory while deepening the 
network layers, improves the generalization ability of the 
networks, and accelerates the convergence speed of the 
training model. Multi-scale features are introduced using 
the Feature Pyramid Network (FPN) to detect small 
objects [19]. Figure 1 shows the network structure of 
FPN. 
 
166   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
1×1 Conv
2×Upsample
Predict
Predict
Predict
P5
P4
P3
C5
C4
C3
 
Figure1: The network structure of FPN 
 
YOLOv3 will form a fixed number of predicted 
bounding boxes on the feature map and perform position 
regression on the generated bounding boxes. The 
bounding boxes prediction is represented by formula (1). 
 
()
()
w
h
x x x
y y y
t
ww
t
hh
b t c
b t c
b p e
b p e


=+ 

=+


=


=

 (1) 
In formula (1), x b , y b , w b , and h b represent the 
border coordinate values. x t and 
y t
 are the positions 
from the center of the target to the upper left corner of the 
current grid. 
x c
 and 
y c
 refer to the quantity of grids 
that are unlike the midpoint of the prediction box to the 
up left corner. 
w p
 and 
h p
 mean the preset width and 
height of the anchor box. w t and h t represent the edge 
length of the predicted box. It is necessary to calculate 
each prediction box’s confidence and set a threshold to 
avoid duplicate prediction bounding box, discarding 
prediction boxes with confidence levels outside the 
threshold. The confidence level is expressed using 
formula (2). 
r r r P ( ) P ( ) P ( )
truth truth
conf i
pred pred
C class object object IOU class IOU =   = 
(2) 
r P ( ) i class object represents the probability of 
predicting C conditional categories within each grid 
cell in formula (2). r P ( )
truth
pred
object IOU  refers to the 
confidence when there is a target within the box. 
truth
pred
IOU means the intersection and union ratio between 
the predicted box area and the actual box area. A 
non-maximum suppression algorithm is used to 
determine the overlap of the remaining bounding box and 
to remove redundant predicted bounding box. Therefore, 
the predicted boxes having high reliability are retained as 
object detection boxes. The predicted bounding box 
consists of three loss functions, represented by formula 
(3). 
( ) ( )
( ) ( ) ( )
3 1 2 3
22
1
00
2
2
00
2
00
()
ˆ ˆ
ˆ
ˆ 2
ˆ ˆ ˆ
log( ) (1 )log(1 )
YOLOv
S S B
obj
coord i i i i
ij
ij
S S B
obj
coord i i i i i i
ij
ij
S S B
obj
obj i i i i
ij
ij
loss object loss loss loss
loss I x x y y
I w h w w h h
loss I C C C C





==

==

==
= − −

= − + −


+ −  − + −


 = + − −

+



 
00
3
00
ˆ ˆ ˆ
log( ) (1 )log(1 )
ˆ ˆ ( )log( ( )) (1 ( ))log(1 ( ))
S S B
noobj
noobj i i i i
ij
ij
S S B
obj
i i i i
ij
i j c classes
I C C C C
loss I p c p c p c p c

==

= = 













 + − −




= + − − 



  
(3) 
In formula (3), 1 loss represents the loss of the 
predicted bounding box. 2 loss means the loss of 
predictive confidence. 3 loss refers to the loss of 
predicted categories.  is the weight lost.  refers to 
true real value. 
i x
, 
i y
, 
i w
, and i h represent the i th 
bounding box’s four coordinates. i C means the 
confidence level of the i th bounding box. () i pc is the 
i th bounding box’s class probability. However, YOLOv3 
has poor sensitivity to small targets. Therefore, a dilated 
convolution is added to DarkNet-53 to expand the 
receptive field of the image. This modified network is 
called DC-DarkNet-53 [20]. CNN can perform 
convolution operations on images and automatically 
extract feature information of targets, providing rich 
detailed features for subsequent object detection and 
tracking. As the operation process repeats, the feature 
resolution of the input target will continuously decrease, 
and the information channels will increase accordingly. 
Dilated convolution has emerged to ensure that the output 
feature map contains more detailed target information 
[21]. Hollow convolution can increase receptive fields or 
domains by inserting voids into the standard convolution 
kernel. Hollow convolution can improve the model’s 
performance, especially in tasks that handle large-sized 
inputs or require consideration of remote pixel 
relationships. Figure 2 shows the specific structure. 
 
River Ship Monitoring Based on Improved Deep-Sort Algorithm Informatica 48 (2024) 163–176  167 
0 0
0 0 0 0 0
0 0
0 0 0 0 0
0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0
0 0 0
0 0 0
3×3
5×5
15×15
 
Figure 2: Structure diagram of dilated convolution 
 
In addition, a prediction scale of 104×104 was 
introduced to address the poor real-time detection of 
YOLOv3 for small and medium objects. These 
mathematical model parameters fitted after network 
training are generally small. As the training samples 
increase, the previously reasonable sample distribution 
may be disrupted. L2 regularization can enhance the 
network's anti-interference ability by using smaller 
parameter weights, as expressed by formula (4). 
 
2
0 JJ

 =+

 (4) 
In formula (4), 0 J refers to an original loss 
function.  refers to a regularization coefficient. When 
 is solved, the loss function in linear regression is 
represented by formula (5). 
 
( )
( ) ( )
1
0 0 1 1
1
( ) ( )
2
( ) ...
m
ii
i
nn
J h x y
m
h x x x x



  
=

=−



= + + +


 (5) 
Formula (6) can be obtained by using a gradient 
descending means to make the whole loss function 
reduced. 
( )
( )
( )
( ) ( ) ( )
1
( ) ( ) ( )
1
( ) ( ) ( )
2
1
2
2
1
( ) ( )
1
: ( )
1
: (1 ) ( )
2
m
i i i
j
i
j
m
i i i
jj
j
i
m
i i i
j L j
j
i
L
J h x y x
m
a h x y x
m
a a h x y x
mm
Loss Loss
N











=
=
=





=−
 


= − −



= − − −



=+






 (6) 
In formula (6), : j  represents the original loss 
function j  ’s iterative equation. 2 : jL  refers to an 
iterative equation after L2 regularization. (1 ) a
m

− is a 
penalty. 2 L Loss means an improved losing function. 
2
2N




 represents the L2 regularization term. 
Figure 3 shows the improved YOLOv3. 
 
104×104
52×52
26×26
13×13
Detection 
layer
Up sample
Conv 
1×1/1
Conv 
3×3/1
Residual 
network
 
Figure 3: Improved YOLOv3 model 
 
3.2 Construction of a ship tracking model 
based on improved Deep-Sort 
Further target tracking can be carried out after 
implementing the object detection of the ship. Target 
tracking refers to continuously tracking the target in 
subsequent frames after the target is specified in the first 
frame of the video sequence. That is, boundary boxes are 
used to calibrate the target and achieve target localization 
and scale estimation. Deep-Sort can track ship targets. 
168   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
The detection part of Deep-Sort utilizes the Faster 
R-CNN algorithm, which belongs to two-stage object 
detection method. Although the detection accuracy of 
Deep-Sort is high, its speed is slow. Therefore, the study 
uses the designed improved YOLOv3 detecting algorithm 
as the detector to modify Deep-Sort. The basic principle 
of Simple Online Real-time Tracking (SORT) is based on 
the object detection algorithm, using Kalman filtering for 
prediction and matching using Hungarian. Deep-Sort is a 
modified SORT that incorporates appearance information 
on top of the SORT algorithm. Figure 4 shows the ship 
target tracking based on Deep-Sort. 
 
Detector 
prediction box
Pretreatment
Preprocessed 
detection box
Cascade 
matching
Unmatched 
historical trajectory 
information
Matched historical 
trajectory 
information
Historical trajectory 
information
Bbox Kalman filter 
prediction
Kalman filter 
update
 
Figure 4: Ship target tracking flowchart of Deep-Sort 
 
In target tracking, cascade matching is a crucial step. 
Deep-Sort considers the correlation between target 
feature information and motion information to achieve 
the pairing of preprocessed detection boxes and bbox. 
Deep-Sort constructs Mahalanobis distance and cosine 
distance to represent the matching cost between the 
preprocessed detection box and bbox. The calculation of 
Mahalanobis distance correlation is represented by 
formula (7). 
 
11
( , ) ( ) ( )
T
j i j i
i
d i j d b S d b
−
= − − (7) 
In formula (7), j d refers to coordinate vector of the 
j
th preprocessed detecting box. i b refers to the target 
position predicted by the i th tracker. i S means the 
covariance matrix of  j d and i b . However, the 
matching degree of the Mahalanobis distance metric is 
not precise enough, which can easily lead to ID jumps. 
Therefore, Deep-Sort also introduces the cosine distance 
of feature vectors within the matching box. Meanwhile, 
CNN is used to extract target features within the box. At 
the same time, a 128-dimensional vector is output to 
represent the features of the target within the box. The 
minimum cosine distance between the last 100 
successfully associated feature sets i R of the i th 
tracker and the feature vectors of the current 
j
th 
detection result is represented by formula (8). 
  
2
( , ) min 1
T i i
j k k
d i j r r r R − − 
 (8) 
In formula (8), 
j r
 represents the feature vector 
corresponding to the 
j
th detecting box input by the 
current tracker. 
i
k
r refers to the k th feature vector in 
the feature set corresponding to the i th tracker. Formula 
(9) can be obtained by weighted averaging the 
Mahalanobis distance and cosine distance. 
 
12
, ( , ) (1 ) ( , ) ij c d i j d i j  = + − (9) 
In formula (9), 

 represents the weight coefficient. 
Finally, the linear weighting of the two distances is used 
as a measure of the matching loss between the two boxes. 
Hungarian is utilized to match the detection box and 
trajectory prediction box output [22]. In the object 
detection, the distance between the two boxes is usually 
constructed by combining the center coordinates of the 
predicted box with width and height as a whole. The 
distance loss function is used to calculate the predicted 
box and annotated box in reference ship detection to 
improve Deep-Sort. D-IoU distance is introduced in the 
loss of cascading matching. Figure 5 is a schematic 
diagram of IoU calculation and D-IoU distance. 
 
River Ship Monitoring Based on Improved Deep-Sort Algorithm Informatica 48 (2024) 163–176  169 
IOU=
Box 1
Box 2
Box 1
Box 2
Area of intersection 
of boxes
Area of union of 
boxes
d
c
(b) D-IoU distance (a) IoU calculation
 
Figure 5: Schematic diagram of IoU calculation and D-IoU distance 
 
D-IoU uses the intersection union ratio of two 
rectangular boxes to represent the overlap of the two 
boxes, expressed by formula (10). 
 
22
12
1 2 1 2
( , ) DIoU IoU b b c
IoU B B B B
  =−


=  


 (10) 
In formula (10), 1 b and 2 b represent the center 
coordinates of the 1 B and 2 B boxes, respectively. 
2
 
refers to the Euclidean distance. 
c
 means the diagonal 
length of the minimum bounding rectangle between 1 B 
and 2 B . These matched D-IoU loss and final weighted 
loss are represented by formula (11). 
3
1 2 3
, 1 2 3
( , ) 1 ( , )
( , ) ( , ) ( , )
ji
ij
d i j DIoU d b
c d i j d i j d i j   
 =−

= + +

 (11) 
In formula (11), j d represents the predicted box of 
the 
j
th preprocessed detection box. i b refers to the 
prediction box of the i th tracker for the target. 

 is 
the weight, and 
1 2 3 1    + + =
. The ship object 
detection and tracker can obtain the complete trajectory 
of ship movement in the video. Therefore, the direction of 
ship movement can be determined, and the ship flow in 
the river channel can be calculated during a fixed time 
period. 
4 Effectiveness analysis of ship 
object detection and tracking 
models 
An object detection model based on improved YOLOv3 
and a ship tracking model based on improved Deep-Sort 
were built to effectively monitor ships in river channels. 
However, their practical application effects still needed 
further verification. The research mainly analyzed from 
two aspects. Firstly, the detecting performance of an 
object detection model based on the improved YOLOv3 
was analyzed. Then the effectiveness of the ship tracking 
model based on the improved Deep-Sort was verified. 
4.1 Effectiveness analysis of object detection 
models 
A MyShip dataset containing 68054 ship targets was used 
to verify the improvement effect of DC-DarkNet-53 on 
YOLOv3. The weight attenuation value was 0.0001, the 
initial learning rate was 0.001, and the momentum was 
0.9. The Faster R-CNN with ResNet-101 and VGG-16 
backbone networks and the traditional DarkNet-53 model 
were compared. The comparison of the correct detection 
and the detection time is presented in Figure 6. Among 
the four models, DC-DarkNet-53 had the highest positive 
detection, with 6345, followed by DarkNet-53 with 
62759 correct detection, and VGG-16 performed the 
worst with 38493 correct detections. In addition, the 
detection time of the proposed improvement YOLOv3 
was 21.3 seconds, which was slightly higher than the 20.6 
seconds of DarkNet-53, but still within an acceptable 
range. These results confirmed that dilated convolution 
improved the YOLOv3 positively, and the improved 
network had high detection accuracy and efficiency. 
 
170   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
Model
0
10000
30000
40000
50000
60000
Detect correct and incorrect numbers
20000
ResNet-101 VGG-16 DarkNet-53 DC-DarkNet-53
70000
20
60
80
100
120
40
140
0
Detection time/s
Predict the correct number
Detection time
Number of prediction errors
 
Figure 6: Comparison results of detection accuracy and detection time for four models 
 
The recall, missing alarm rate, and Average 
Precision (AP) of the four models were compared to 
verify the detection performance of the proposed 
DC-DarkNet-53. From Figure 7 (a), among the four 
models, the recall rate of this study model was the highest 
at 93.25%, followed by DarkNet-53 at 92.21%, and 
VGG-16 was the worst at 56.55%. From Figure 7 (b), the 
missing alarm rate of the study model was the lowest, at 
6.76%, which was 1.03% lower than DarkNet-53. From 
Figure 7 (c), the AP index of the research model was the 
highest, at 92.53%, which was 1.32% higher than 
DarkNet-53. These results confirmed that the proposed 
DC-DarkNet-53 had good detection performance, 
feasibility, and effectiveness. 
 
ResNet-101 VGG-16 DarkNet-53
60
70
80
90
50
100
40
(a) Recall
Recall/%
ResNet-101
VGG-16
DarkNet-53
DC-DarkNet-53
ResNet-101 VGG-16 DarkNet-53
10
15
20
25
5
30
0
(b) Missed alarm rate
Missed alarm rate/%
35
40
45
ResNet-101 VGG-16 DarkNet-53
60
70
80
90
50
100
30
(c) AP
AP/%
40
DC-DarkNet-53
DC-DarkNet-53
DC-DarkNet-53
ResNet-101
VGG-16
DarkNet-53
DC-DarkNet-53
ResNet-101
VGG-16
DarkNet-53
DC-DarkNet-53
 
Figure 7: Comparison of detecting performance of four models 
 
The maximum iteration was set to 50000, the model 
batch was set to 4 to verify the detecting performance of 
the object detection model using improved YOLOv3. 
These remaining conditions remained unchanged for 
experimentation. This model was compared with 
ResNet-101, VGG-16, DarkNet-53, and DC DarkNet-53 
in Table 2. Among the five models, the research model  
 
 
 
showed the best performance in detection precision, 
missing alarm rate, false alarm rate, and AP, with values 
of 87.48%, 5.14%, 12.51%, and 94.35%, respectively. 
Although the training time was higher than DarkNet-53 
and DC DarkNet-53, it was still within an acceptable 
range. These results confirmed that the object detection 
model based on improved YOLOv3 had good detecting 
performance. 
 
 
River Ship Monitoring Based on Improved Deep-Sort Algorithm Informatica 48 (2024) 163–176  171 
 
 
Table 2: Comparison of detecting performance of five models 
Model Precision/% 
Missing alarm 
rate/% 
False alarm 
rate/% 
Training 
time/h 
AP/% 
ResNet-101 79.58 42.09 20.42 - 53.37 
VGG-16 78.51 43.44 21.48 - 51.29 
DarkNet-53 86.95 7.79 13.05 6 91.31 
DC-DarkNet-53 87.14 6.77 12.87 6 92.50 
This research 87.48 5.14 12.51 8 94.35 
 
DarkNet-53, DC-DarkNet-53, and the research 
model were used to detect ship images in MyShip to 
verify the practical application effect of the model. Figure 
8 shows the final detecting results of the three models. 
There were 17 ships in the original sample by comparing 
Figures 8 (b), 8 (c), and 8 (d). The research model  
 
successfully detected 17 ships, while the DC-DarkNet-53 
model only detected 15 ships. These results confirmed 
that the object detection model based on improved 
YOLOv3 had good practical application effects and 
detection accuracy. 
 
(a) Original drawing (b) DarkNet-53
(c) DC-DarkNet-53 (d) Our
 
Figure 8: Final detection results of three models 
 
The study compared the proposed model with the 
standard YOLOv3 and deep sorting algorithms using 
computational time and resource utilization as indicators 
to investigate the computational efficiency of the 
proposed model. The results are shown in Figure 9. The 
calculation time of the proposed model was 23.3 seconds, 
slightly longer than the standard YOLOv3 and deep 
sorting algorithms, but still within an acceptable range. 
The resource utilization rate was 76.65%, slightly lower 
than the standard YOLOv3. The results indicated that the 
calculation time of the proposed model increased, but it 
was still within an acceptable range and performed better 
overall. 
 
172   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
YOLOv3
19
20
22
23
24
Calculation time/s
21
Deep sorting Our
Model
Calculation time
72
76
78
80
74
70
Resource utilization rate/%
Resource utilization rate
 
Figure 9: Comparison results of calculation time and resource utilization 
 
4.2 Effectiveness analysis of ship tracking 
models 
Experiments were conducted using the Ships in Satellite 
Imagery dataset to validate the modifying effect of the 
ship tracking method using the improved Deep-Sort. 
Recall rate, ID conversion, and Multi-Object Tracking 
Accuracy (MOTA) were used as indicators. The 
improved Deep-Sort was compared with traditional 
Deep-Sort, Deep-Sort with YOLOv3 detector, and 
Deep-Sort with improved YOLOv3 detector, denoted as 
Models 1, 2, and 3, respectively. From Figure 10 (a), the 
recall rate of the study model was the highest, at 64.7%. 
From Figure 10 (b), the research model also achieved 
good performance in ID conversion, with the lowest ID 
conversion of 804. From Figure 10 (c), the MOTA index 
of the research model was 61.8%, indicating good 
tracking accuracy. These results confirmed that the 
proposed ship tracking model based on improved 
Deep-Sort had better improvement effects compared to 
traditional Deep-Sort. 
 
Model 1
50
55
60
65
45
70
40
(a) Recall
Recall/%
Our Model 2 Model 3
Model 1
Model 2
Model 3
Our
Model 1
1000
1500
2000
2500
500
0
(b) ID conversion
ID conversion
Our Model 2 Model 3
Model 1
Model 2
Model 3
Our
Model 1
60
59
62
58
(c) MOTA
MOTA
Model 2 Model 3
Model 1
Model 2
Model 3
Our
61
Our
 
Figure 10: Tracking performance of four models 
 
Experiments were conducted to further verify the 
detecting performance of the ship tracking model. MOTA, 
Multi-Object Tracking Precision (MOTP), total missing 
detection, and total false detection were used as indicators. 
The ship tracking model was compared with three 
algorithms: MOTDT, SORT, and Deep-Sort. Table 3 
shows the comparison results. Among these four models, 
the MOTA and MOTP indicators of the research model 
were the highest, with 65.4% and 80.8%, respectively. 
The total missing detection and false detection were the 
lowest, at 53449 and 7964, respectively. These results 
confirmed that the proposed tracking model achieved 
River Ship Monitoring Based on Improved Deep-Sort Algorithm Informatica 48 (2024) 163–176  173 
good performance and had certain feasibility and 
effectiveness. 
 
Table 3: Comparison of tracking model effects 
Model MOTA/% MOTP/% 
Number of missing 
detection 
Number of false 
positives 
MOTDT 47.5 74.7 85433 9255 
SORT 59.7 79.5 63246 8699 
Deep-Sort 61.3 79.2 56559 12853 
This research 65.4 80.8 53449 7964 
 
Experiments were conducted using the USVInland 
dataset containing different weather conditions to verify 
the adaptability of the proposed model under different 
environmental conditions. Other conditions remained 
unchanged. The results of the MOTA and MOTP 
indicators for the four models are shown in Figure 11. 
From Figure 11 (a), the MOTA index of the proposed 
model was 65.2% in the USVInland dataset, which was 
higher than the comparison models. From Figure 11 (b), 
the MOTA index of the proposed model was 80.6%, 
which was still higher than the other three models. The 
results indicated that the ship tracking model based on the 
improved Deep-Sort algorithm had good tracking 
performance under different conditions. 
 
20
30
40
50
60
70
MOTDT SORT Deep-Sort Our
MOTA/%
Model
68
70
72
74
76
78
80
82
MOTDT SORT Deep-Sort Our
MOTP/%
Model
(a) Comparison results of MOTA indicators (b) Comparison results of MOTP indicators
 
Figure 11: MOTA and MOTP indicators match the results 
 
MOTA, MOTP, recall rate, and ID conversion were 
used as evaluation indicators to validate the feasibility of 
the proposed model, and the MyShip dataset was selected 
for ablation experiments. Figure 12 shows the outcomes 
of the ablation experiment. From Figure 12 (a), the 
complete ship tracking model performed the best for 
MOTA, MOTP, and recall, with values of 61.7%, 80.8%, 
and 64.6%, respectively. From Figure 12 (b), the ID 
conversion of the complete ship tracking model was the 
lowest, at 805. These results confirmed that the improved 
YOLOv3 was treated as a detector for the Deep-Sort, and 
D-IoU distance was introduced in the loss of cascade 
matching. The improved YOLOv3 effectively improved 
the ship tracking performance of the model and had 
certain feasibility and effectiveness. 
 
50
55
60
65
70
75
80
85
Deep-Sort No YOLOv3 No D-IoU Complete model
Index/%
Model
MOTA MOTP Recall
0
500
1000
1500
2000
2500
Deep-Sort No YOLOv3 No D-IoU Complete model
ID conversion
Model
(a) Comparison results of MOTA, 
MOTP, and recall
(b) Comparison results of ID 
conversion
 
Figure 12: Results of ablation experiment 
 
174   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
5 Discussion 
An object detection model based on the improved 
YOLOv3 algorithm and a ship tracking model based on 
the improved Deep-Sort algorithm were built to address 
the ship monitoring in river channels. The experimental 
results in the MyShip dataset showed that the proposed 
object detection model performed well with a detection 
accuracy of 6345, a recall rate of 93.25%, a missing 
alarm rate of 6.76%, and an AP index of 92.53%. The 
proposed model performed better than the Faster R-CNN 
with ResNet-101 and VGG-16 backbone networks and 
the traditional DarkNet-53 model. This is because dilated 
convolution can expand the receptive field of images, 
effectively improve the sensitivity of YOLOv3 algorithm 
to small targets, improving the detection accuracy and 
efficiency. The proposed target tracking model performed 
well in the Ship in Satellite Imagery dataset, with a recall 
rate of 64.7%, an ID conversion of 804, and a MOTA 
metric of 61.8%, demonstrating good tracking accuracy. 
The proposed model performed better than the traditional 
Deep-Sort algorithm, the Deep-Sort algorithm with 
YOLOv3 detector, and the Deep-Sort algorithm with 
improved YOLOv3 detector. This is because introducing 
D-IoU distance into the loss of cascade matching can 
obtain the complete trajectory of ship motion in the video, 
thereby determining the direction of ship and improving 
tracking accuracy. 
This study conducted comparative experiments on 
actual ship images, demonstrating the better practical 
application of the proposed model compared with 
references [6] and [9]. The proposed model adopted the 
Deep-Sort algorithm for online real-time tracking, which 
had better real-time performance compared with 
reference [7]. The proposed target recognition model 
increased the computational complexity and time to a 
certain extent, which was similar to references [8], 
[10-14], and [16, 17]. Therefore, further methods such as 
introducing lightweight networks should be adopted to 
explore ways to improve computational efficiency while 
ensuring model recognition performance in the future. 
The proposed model performed better in ID conversion 
and better met the user needs in actual target tracking 
scenarios compared with reference [15]. 
The proposed object detection model demonstrated 
good performance in ship monitoring and tracking. This 
method can be applied to fields such as maritime rescue 
and road traffic monitoring. Therefore, rescue efficiency 
and road safety can be improved by identifying rescue 
targets and vehicles. However, object detection models 
for ships may be sensitive to morphological and texture 
features, which limits their applicability in scenarios 
other than river ship monitoring. In addition, there may 
be issues such as overlapping, occlusion, and target 
confusion among ships in densely populated situations. 
These issues may pose challenges for the model to 
accurately detect and track each ship target, affecting the 
practical application effect of the model. Therefore, target 
segmentation and recognition technology can be further 
combined to segment the target into separate parts in 
future research. Meanwhile, different sensor data can be 
combined to obtain more dimensional information to 
improve the accuracy of object detection and tracking of 
the model. 
6 Conclusion 
As the economy develops and intelligent information 
processing technology is continuously mature based on 
deep learning, ship monitoring technology is also moving 
towards intelligence and automation. An improved 
YOLOv3-based object detection model and an improved 
Deep-Sort-based ship tracking model were built to deal 
with the low accuracy of ship object detection and 
tracking. These results confirmed that the improved 
YOLOv3 had the highest positive detection, with 6345, 
followed by DarkNet-53 with 62759 correct detection, 
and VGG-16 had the worst performance with 38493 
correct detections. The improved YOLOv3 had the 
highest recall rate of 93.25%, the lowest missing alarm 
rate of 6.76%, and the highest AP rate of 92.53%. The 
proposed object detection model performed the best in 
terms of detecting accuracy, missing and false alarm rates, 
and AP index, with values of 87.48%, 5.14%, 12.51%, 
and 94.35%, respectively. The proposed object detection 
model successfully detected all 17 ship targets in actual 
samples. The proposed ship tracking model had the 
highest recall rate of 64.7%, the lowest ID conversion 
rate of 804, and a multi-target tracking accuracy of 61.8%. 
In addition, the ship tracking performance could be 
effectively improved by using the improved YOLOv3 as 
the detector for the Deep-Sort and introducing D-IoU 
distance into the cascaded matching loss. In summary, the 
constructed model had certain feasibility and 
effectiveness. However, the data collected through 
research is still limited for the dissemination of object 
detection, which may affect the practical application 
effectiveness of the model. Therefore, more data should 
be collected in future research to validate the practical 
application effectiveness of the model. 
7 Fundings 
The research is supported by: Provincial level, Education 
Reform Project of Henan Provincial Department of 
Education, "Research on Comprehensive 
Interdisciplinary Practical Training in Finance and 
Economics Based on 'Double Innovation' in the Context 
of Free Trade Zones", (No. 2019SJGLX684). 
References 
[1] A. Amro, V. Gkioulos, and S. Katsikas, 
“Communication architecture for autonomous 
passenger ship,” Proceedings of the Institution of 
Mechanical Engineers, Part O: Journal of Risk and 
River Ship Monitoring Based on Improved Deep-Sort Algorithm Informatica 48 (2024) 163–176  175 
Reliability, vol. 237, no. 2, pp. 459-484, 2023. 
https://doi.org/10.1177/1748006X211002546 
[2] R. Li, J. Wu, and L. Cao, “Ship target detection of 
unmanned surface vehicle base on efficientdet,” 
Systems Science & Control Engineering, vol. 10, no. 
1, pp. 264-271, 2022. 
https://doi.org/10.1080/21642583.2021.1990159 
[3] C. Yan, and C. Wang, “Ship target detection in sar 
image based on selective coordinate attention,” Acta 
Electonica Sinica, vol. 51, no. 9, pp. 2481-2491, 
2023. https://doi.org/ 10.12263/DZXB.20211416 
[4] T. Yao, R. Miao, W. Wang, Z. Li, J. Dong, Y. Gu, 
and X. Yan, “Synthetic damage effect assessment 
through evidential reasoning approach and neural 
fuzzy inference: Application in ship target,” Chinese 
Journal of Aeronautics, vol. 35, no. 8, pp. 143-157, 
2022. https://doi.org/10.1016/j.cja.2021.08.010 
[5] Y. Liu, D. Jiang, C. Xu, Y. Sun, G. Jiang, B. Tao, X. 
Tong, M. Xu, G. Li, and J. Yun, “Deep learning 
based 3D target detection for indoor scenes,” 
Applied Intelligence, vol. 53, no. 9, pp. 
10218-10231, 2023. 
https://doi.org/10.1007/s10489-022-03888-4 
[6] C. Tsoumpris, and G. Theotokatos, “Performance and 
reliability monitoring of ship hybrid power plants,” 
Journal of ETA Maritime Science, vol. 10, no. 1, pp. 
29-38, 2022. 
https://doi.org/10.4274/jems.2022.82621 
[7] C. Capezza, F. Centofanti, A. Lepore, A. Menafoglio, 
B. Palumbo, and S. Vantini, “Functional regression 
control chart for monitoring ship CO2 emissions,” 
Quality and Reliability Engineering International, 
vol. 38,no. 3, pp. 1519-1537, 2022. 
https://doi.org/10.1002/qre.2949 
[8] J. C. Zheng, S. D. Sun, and S. J. Zhao, “Fast ship 
detection based on lightweight YOLOv5 network,” 
IET Image Processing, vol. 16, no. 6, pp. 1585-1593, 
2022. https://doi.org/10.1049/ipr2.12432 
[9] X. Wang, J. Liu, X. Liu, Z. Liu, O. I. Khalaf, J. Ji, and 
O.Y. Quan, “Ship feature recognition methods for 
deep learning in complex marine environments,” 
Complex & Intelligent Systems, vol. 8, no. 5, pp. 
3881-3897, 2022. 
https://doi.org/10.1007/s40747-022-00683-z 
[10] T. Kim, L. P. Perera, M. P. Sollid, B. M. Batalden, 
and A. K. Sydnes, “Safety challenges related to 
autonomous ships in mixed navigational 
environments,” WMU Journal of Maritime Affairs, 
vol. 21, no. 2, pp. 141-159, 2022. 
https://doi.org/10.1007/s13437-022-00277-z 
[11] Y. Wang, Z. Hu, J. Wang, X. Liu, Q. Shi, Y. Wang, 
L. Qiao, Y. Li, H. Yang, J. Liu, L. Zhou, Z. Yang, C. 
Lee, and M. Xu, “Deep learning-assisted 
triboelectric smart mats for personnel 
comprehensive monitoring toward maritime safety,” 
ACS Applied Materials & Interfaces, vol. 14, no. 21, 
pp. 24832-24839, 2022. 
https://doi.org/10.1021/acsami.2c05734 
[12] D. Meimetis, I. Daramouskas, I. Perikos, and I. 
Hatzilygeroudis, “Real-time multiple object tracking 
using deep learning methods,” Neural Computing 
and Applications, vol. 35, no. 1, pp. 89-118, 2023. 
https://doi.org/10.1007/s00521-021-06391-y 
[13] C. W. Chang, C. Y. Chang, and Y. Y. Lin, “A hybrid 
CNN and LSTM-based deep learning model for 
abnormal behavior detection,” Multimedia Tools 
and Applications, vol. 81, no. 9, pp. 11825-11843, 
2022. https://doi.org/10.1007/s11042-021-11887-9 
[14] A. Mathias, D. Samiappan, and R. Kumar, 
“Occlusion aware underwater object tracking using 
hybrid adaptive deep SORT-YOLOv3 approach,” 
Multimedia Tools and Applications, vol. 81, no. 30, 
pp. 44109-44121, 2022. 
https://doi.org/10.1007/s11042-022-13281-5 
[15] X. Zou, Z. Yin, Y. Li, F. Gong, Y. Bai, Z. Zhao, W. 
Zhang, Y. Qian, and M. Xiao, “Novel multiple 
object tracking method for yellow feather broilers in 
a flat breeding chamber based on improved 
YOLOv3 and deep SORT,” International Journal of 
Agricultural and Biological Engineering, vol. 16, no. 
5, pp. 44-55, 2023. 
https://doi.org/10.25165/j.ijabe.20231605.7836 
[16] A. L. Rishika, C. Aishwarya, A. Sahithi, and M. 
Premchender, “Real-time vehicle detection and 
tracking using YOLO-based deep sort model: A 
computer vision application for traffic surveillance,” 
Turkish Journal of Computer and Mathematics 
Education (TURCOMAT), 14(1): 255-264, 2023. 
https://doi.org/10.17762/turcomat.v14i1.13530 
[17] S. K. Sahoo, G. Palai, B. R. Altahan, S. H. Ahannad, 
P. P. Priya, M. A. Hossain, and A. N. Z. Rashed, 
“An optimized deep learning approach for the 
prediction of social distance among individuals in 
public places during pandemic,” New Generation 
Computing, vol. 41, no. 1, pp. 135-154, 2023. 
https://doi.org/10.1007/s00354-022-00202-1 
[18] S. Pal, A. Roy, P. Shivakumara, and U. Pal, 
“Adapting a Swin transformer for license plate 
number and text detection in drone images,” 
Artificial Intelligence and Applications, vol. 1, no. 3, 
pp. 145-154, 2023. 
https://doi.org/10.47852/bonviewAIA3202549 
[19] X. Zhou, and L. Zhang, “SA-FPN: An effective 
feature pyramid network for crowded human 
detection,” Applied Intelligence, vol. 52, no. 11, pp. 
12556-12568, 2022. 
https://doi.org/10.1007/s10489-021-03121-8 
[20] K. Wang, and M. Liu, “YOLOv3-MT: A YOLOv3 
using multi-target tracking for vehicle visual 
detection,” Applied Intelligence, vol. 52, no. 2, pp. 
2070-2091, 2022. 
https://doi.org/10.1007/s10489-021-02491-3 
[21] A. Chaudhuri, “Hierarchical modified fast R-CNN 
for object detection,” Informatica, vol. 45, no. 7, pp. 
67-81, 2021. https://doi.org/ 
10.31449/inf.v45i7.3732 
176   Informatica 48 (2024) 163–176                                                                   Y. Zhai 
[22] M. Z. Alam, and A. Jamalipour, “Multi-agent 
drl-based hungarian algorithm (madrlha) for task 
offloading in multi-access edge computing internet 
of vehicles (iovs),” IEEE Transactions on Wireless 
Communications, vol. 21, no. 9, pp. 7641-7652, 
2022. https://doi.org/10.1109/TWC.2022.3160099