https://doi.org/10.31449/inf.v48i13.6063                                   Informatica 48 (2024) 155–174 155    
Retrieval and Analysis of Multimedia Data of Robot Deep Neural 
Network Based on Learning and Information Fusion 
Xian Guo, Jianing Yang
*
, Libao Yang
 
National Industrial Information Security Development Research Center, Beijing, China, 100040 
E-mail: jn_young90@126.com, wcbbrxeoxf474@163.com, lib_yong@163.com 
*
Corresponding author 
 
Keywords: big data, multimedia, teaching, data mining, information retrieval, system design 
Recieved: April 19, 2024 
In view of many problems of slow data information retrieval speed and low retrieval accuracy in the use 
of traditional data information retrieval systems, this research proposed a robotic deep neural network 
multimedia data retrieval methodology using information fusion and deep learning. By using deep 
learning combined with information fusion algorithms, we obtain a combination of lower-level features 
to form more abstract salient features in order to analyze the feature distribution characteristics of data 
information. This method can successfully address the "semantic gap" issue during the retrieval 
procedure and analysis of multimedia data from robotic deep neural networks. At the same time, the 
robot deep neural network can realize optimization of the system hardware from multimedia data 
tracking, data mining and retrieval system warning to design the corresponding software design process. 
Finally, the results of the analysis by example show that: teaching multimedia information retrieval as an 
example for analysis, the multimedia information retrieval system proposed in this paper has fast 
retrieval speed and high accuracy, which can provide a perfect platform for the field of education and 
will become an important part of media data retrieval in the future. 
Povzetek: Razvita je metodologijo za pridobivanje in analizo multimedijskih podatkov iz globokih 
nevronskih mrež robotov, ki temelji na učenju in združevanju informacij. S tem je rešen problem 
"semantične vrzeli" pri iskanju multimedijskih podatkov. Rezultati analize kažejo, da ima sistem hiter 
čas iskanja in visoko točnost, kar je obetavno za uporabo na področju izobraževanja in drugih aplikacij 
za pridobivanje medijskih podatkov. 
 
 
1   Introduction 
In the continuous deepening of the impact on society 
brought about by current educational reform, various 
universities have made appropriate use of multimedia 
information in the process of practical teaching for 
management reform, and its effective analysis of relevant 
operational methods has become a hot topic of discussion 
in the current society. The effective use of multimedia 
information retrieval technology in the teaching process 
can, to some extent, measure all aspects of students' 
abilities, which is an important indicator in examining the 
practical teaching and management leadership of the 
school in terms of practical teaching and management 
leadership is carried out by conducting practical teaching 
information retrieval technology platform [1-3]. The 
effective integration of deep learning concepts and 
knowledge, as multimedia data information retrieval 
technology to a certain extent, can break the relatively 
traditional multimedia data information retrieval model of 
an emerging model. Due to the effective coverage brought 
by multimedia data information technology, It offers an 
increasing number of options for learning for people's 
daily life. This is also because multimedia information 
data education is not only for students, but also provides 
certain educational opportunities for many other identities 
in the society [4-5]. Multimedia information retrieval 
technology systems are designed to reflect, to a certain 
extent, the relevant data and information parameters and to 
present them to the designer in the simplest and clear 
format. The designers use the appropriately extracted data 
information to theoretically and effectively analyze the 
relevant hardware and software that the system itself has, 
and to quickly record the real-time data information and 
the certain changes that arise between the parameters; thus 
making it possible to have a certain warning function 
relative to other systems: if the students' learning 
development is in a relatively backward state, then they 
must immediately activate the relevant modes used for 
warning [6]. The literature [7] discusses in great detail the 
current standards and principles of domestic and foreign 
methods of retrieval of multimedia information 
technologies, on the basis of which several main tools and 
some models for the design of multimedia information 
data systems are proposed; the literature [8] examines the 
security level of systems that currently use multimedia 
156   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
information technologies for retrieval, and analyzes the 
security aspects of multimedia information data retrieval 
systems in terms of security evaluation metrics; the study 
of the literature [9] focuses on a mature concept of 
anti-virus cooperation with the aim of creating an effective 
security wall for a series of designs of systems related to 
multimedia information data retrieval. The contribution of 
the literature [10] lies in the relatively virtual physical 
experiments carried out in the actual design of 
reference-related information systems, and certain 
conclusions are drawn based on the experimental summary, 
and has a very great possible development for the 
development of multimedia information data for retrieval 
reference systems [7-10]. 
In contrast, traditional content-based multimedia 
information data retrieval systems mainly use color, shape, 
texture, and other such categories of lower-level, visually 
relevant features. Most of the classification systems in this 
are relatively superficial classification systems, such as the 
svm system that has been developed and implemented. 
The main problem with these systems that have entered 
applications is their inability to deal more effectively with 
the semantic gap [11], that is, there is some differences 
between the similarities that machine systems obtain from 
relatively low-level visual properties and the similarities 
that humans obtain from relatively high-level semantic 
properties. Although current techniques related to 
multimedia information data retrieval have been proposed 
and effective results have been achieved to some extent, 
the problem of real-time retrieval of multimedia 
information data is still very challenging due to the 
existence of this uncrossable semantic gap [11]. This also 
means that at a higher level, the search based on 
multimedia information data content belongs to a more 
artificial and intelligent field, and it also means that the 
discussion is whether there can be such a machine that can 
recognize multimedia information data content as 
effectively as a human can do. Of all the currently 
available technologies and a range of literature studies, 
machine learning-related intelligent technologies are 
somewhat expected to be the largest approach to 
addressing the language gap. 
The methods of searching for multimedia information data 
in relatively traditional teaching are very limited, which 
results in not reflecting very adequately the learning 
situation that most students are currently in. On such a 
basis, this paper proposes the design of a system for 
teaching multimedia information search as a way to 
achieve a deeper integration of learning and knowledge. 
The first thing to do is to monitor each component from 
the data source, the components related to mining through 
data information and the components of the system for 
timely alerts, which are the three main components 
designed to carry out the hardware of the search system, 
and focus on the analysis of the algorithms related to data 
mining; the second thing to do is to effectively analyze the 
software part of the search system from the flow chart of 
the software related design, and make full use of the 
algorithms related to information search so as to obtain the 
specific model functions; the last thing to do is to conduct 
some experimental comparison between the traditional 
methods and the teaching methods systematically 
discussed in this paper. The results of several experiments 
were conducted to demonstrate that the design of the 
application system is essential for the development of the 
application. 
 
Table 1: Related works
Reference objective findings limitations 
[12] A semi-supervised deep learning 
hashing (DLH) technique for quick 
multimedia retrieval was presented in 
the study. To be more explicit, they used 
label and visual information in the first 
component to create a relative similarity 
graph that better reflected the 
relationship between the training data. 
Based on the graph, they then generated 
the hash codes. In order to concurrently 
train a decent multimedia representation 
and hash functions, they employ a deep 
convolutional neural network (CNN) in 
The DLH outperformed 
both supervised and 
unsupervised hashing 
techniques, as shown by 
extensive testing on three 
widely used datasets. 
 
The proposed DLH 
method's reliance on 
labeled data may limit its 
applicability in scenarios 
where acquiring labeled 
data was challenging or 
costly. 
 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   157                                                                                                                                             
the second step. 
 
[13] The paper proposed Dynamic and 
Intelligent Traffic Signal Control 
System (DITLCS) that dynamically 
modified the traffic signal length based 
on real-time traffic information. 
Additionally, there were three modes of 
operation for the planned DITLCS: Fair 
Mode (FM), Priority Mode (PM), and 
Emergency Mode (EM). 
 
Using an open-source 
simulator called 
Simulation of Urban 
MObility (SUMO), they 
conducted a realistic 
simulation on an Indian 
city map called Gwalior 
to assess DITLCS. The 
outcomes of the 
simulation demonstrated 
the effectiveness of 
DITLCS when compared 
to other cutting-edge 
algorithms across a range 
of performance metrics. 
 
Complicated 
implementation, reliance 
on precise real-time data, 
possible 
hardware/software 
malfunctions, and 
difficulties connecting 
with current infrastructure 
Complicated 
implementation, reliance 
on precise real-time data, 
possible 
software/hardware 
malfunctions, and 
difficulties connecting 
with current 
infrastructure. 
 
[14] The research suggested a dynamic 
TSK-type RBF-based neural-fuzzy 
(DTRN) system, in which the learning 
algorithm modified the parameters 
online in addition to creating and 
pruning the fuzzy rules online. Next, a 
supervisory compensator and DTRN 
controller comprise the Supervisory 
Adaptive Dynamic RBF-based 
Neural-Fuzzy Control (SADRNC) 
system. 
 
To demonstrate the 
usefulness of the 
proposed SADRNC 
system, it was utilized to 
control an inverted 
pendulum and a chaotic 
system. The suggested 
SADRNC scheme's 
stability was analytically 
shown, and several 
simulations demonstrated 
its efficacy. 
 
Computational overhead 
may result from the 
complexity of online rule 
creation and parameter 
modification. 
Compensator design may 
need to be complicated in 
order to provide stability, 
which might lead to 
increased complexity of 
the system and 
implementation 
difficulties. 
 
 
2   Robotic deep neural networks 
2.1 Robotic deep neural network framework 
The framework Caffe, located in the framework of deep 
neural network systems for robots, is based on the Alexnet 
model to design accordingly. The Alex network model, 
which won first place in the ImageNet 2012 competition 
for classification of multimedia information data network 
systems, is also a deep convolutional type neural network 
system (CNN) [15]. 
The Caffe framework system is used as a special 
implementation of the Alexnet network system model. 
Caffe uses the C++ language for the related writing, which 
has the advantage of faster computer computing, relatively 
good modeling type, and strong support from the 
open-source community, has a sizable user base even in 
academia and industry. The Caffe framework system also 
has eight layers of neural networks. The first 5 layers are 
used as convective convolution and the last 3 layers are 
158   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
used as fully connected layers. Its specific network structure is shown in Figure 1. 
Figure 1: Caffe network structure 
 
The network framework structure of Caffe in Fig 1 shows 
the first convolutional layer, the second convolutional 
layer and the fifth convolutional layer of the Caffe 
framework followed by a pooling layer as well. soft-max 
layer is located at the last layer of the overall framework 
structure, which also serves as the specific the outcome 
layer of the entire architecture, as seen in Figure 2-5.  
 
 
Figure 2: First convolutional layer network structure 
 
Figure 3: Second convolutional layer network structure 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   159                                                                                                                                             
 
Figure 4: Structure of the sixth fully-connected layer 
network 
 
Figure 5: The seventy layer fully connected layer network 
architecture 
2.2 Content-based multimedia data retrieval 
Content-Based Image Retrieval (CBIR), a multimedia 
information system for relevant retrieval of image data, 
has been the most prominent hot topic in computer vision 
research in the last decade. The primary analysis 
performed by the based-on content multi-media 
information retrieval system is the visual characteristics of 
different multimedia information data and retrieves similar 
multimedia information data from downloadable 
databases using specific algorithms that are relatively 
close to compatibility. Content-based multimedia data 
information retrieval system, in essence, is more like a 
matching-related technology, which effectively combines 
computer image vision, effective processing of 
multimedia information data, multimedia information data 
understanding, database and other relatively mature 
technical achievements in various fields [16]. 
In some previous research applications, content-based 
multimedia data information retrieval systems mainly use 
more low-level correlation attribute features, such as 
global color feature attributes, relative edge feature 
attributes, texture attribute features, GIST and CENTRIST 
feature attributes, and more local attribute features, such as 
using graphs with locality features (SIFT, SURF) package 
of words correlation model (Bow). The distance-related 
algorithms used in traditional content-based multimedia 
data information retrieval systems are relatively fixed, 
which mainly include the common Euclidean distance 
formula and the cosine similarity equation and other 
similar operations [17]. 
The content-based multimedia data information retrieval 
system based on the robotic deep neural network system 
uses the feature functions extracted from the robotic deep 
neural network as a certain information index. In several 
experiments, the Alex network system model is used in 
parallel, which has eight different layers of neural 
networks, five types of convolutional layers and three 
relatively complete connectivity layers. The last three 
layers are effectively distinguished by high-level feature 
attributes of multimedia information data. The first five 
type convolutional layers remove the relatively low-level 
vision-related features of the multimedia information data 
system. In the subsequent experiments, this paper uses the 
last layer as the functional specific representation of 
multimedia information data. The work of Mr. Ji Wan and 
other researchers and other results show that the last two 
layers are the best resolution layers for multimedia 
information data related functions regarding retrieval. In 
the Alex network system model, the last layer is used as 
the softmax layer, which spreads the logistic regression 
model and the multiclassification problem to the extent. 
The specific calculation of Equation 1 is its specific 
mathematical expression, which quickly calculates the 
relative probabilities about the multimedia information 
data, the results of which belong to each of the different 
categories. There are 20 categories of model applied to the 
training during the experiments, so the dimensionality of 
the last layer is 20 dimensions, while the sum of 20 
dimensions is equal to 1. 
160   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
                        (1) 
During the process of machine correlation learning, many 
relevant algorithms are based on the calculation of the 
distance between two sampling points, and in the process 
of retrieving multimedia information data, the correlation 
learning of distance has been studied systematically by 
many scholars in a relatively extensive manner. The 
specific performance of retrieving multimedia information 
data depends not only on the nature of multimedia 
information data, but also on the characteristic properties 
of a series of equation-related metric functions. The 
equation metric function to some extent directly 
determines the specific results of multimedia information 
data search and its efficiency properties. Content-based 
multimedia information data search system technology is 
different from text-based multimedia information data 
search system technology, and the relevant search of 
multimedia information data is conducted mainly by the 
following ways: calculating the specific visual 
characteristics of the multimedia information data present 
in the relevant examples and checking the certain 
similarity between them and the multimedia information 
data contained in the search library in order to determine 
the relevant search results[18]. The multimedia 
information data retrieval technology based on robotic 
deep neural network system forms a feature attribute 
vector after removing the feature attributes of multimedia 
information data, and then represents the corresponding 
relevant multimedia information data based on this feature 
attribute vector. The search for multimedia information 
data is conducted mainly by comparing the certain 
similarity existing between two different multimedia 
information data feature attribute vectors (minimum 
distance) to evaluate the maximum similarity between 
different multimedia information data. In other words, the 
distance comparison of feature attribute vectors of 
multimedia information data is considered as a valid 
comparison of similarity of multimedia information data. 
Obviously, a relatively good feature attribute vector and a 
more appropriate distance learning correlation algorithm 
play a central role in finding the correlation of multimedia 
information data. 
2.3 Robot deep neural network control 
algorithm 
AlexNet is a well-known CNN architecture, its powerful 
feature extraction capabilities are used for multimedia data 
retrieval. By utilizing the Caffe deep learning framework's 
efficiency, performance may be improved by adjusting 
hyper parameters like batch size and learning rate. This 
improvement guarantees accurate feature extraction and 
quick retrieval, which are essential for applications 
involving multimedia data. AlexNet is a flexible solution 
for multimedia data retrieval jobs because it can 
effectively extract features from multimedia data while 
retaining quick retrieval times by fine-tuning these 
parameters. 
 
The robot deep neural network control equation can be 
expressed as 
    (2) 
In the above equations, q represents the angular vector of 
the manipulator arm,  represents the angular velocity 
vector and represents the angular acceleration 
vector;  represents the inertia matrix with 
symmetry positive definite;  represents the 
average centripetal force and the Coriolis force term; the 
gravity term is represented by the equation , 
 represents the dynamic friction matrix 
coefficients,  represents the symbol of the 
static friction vector, and  represents the value 
from external disturbances. 
Setting as a certain representation of the 
state vector, then the system specific form of the 
associated affine nonlinearity performed for the system (2) 
can be expressed as follows (3). 
                       (3) 
In this equation (3), , 
; ;the specific trajectory of the 
specific motion coordinates of the terminal operator is 
represented by the symbol , and u=τ is used as the 
representative value for the control input. 
 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   161                                                                                                                                             
The attribute array of characteristic functions 
corresponding to the nominal values of each parameter is 
represented by the symbols , , , and , 
respectively, which in turn is determined to be based in 
some way on the linear attribute characteristics that the 
robot itself has. 
     
In this operator formula, it is known that the matrix 
function used as a regression is , and the 
function  represents the value of the parameter 
vector corresponding to the physical dimension that the 
robot has. 
 
                            (4) 
In which, , ; 
In view of the certain influence that some uncertainties in 
the current design system itself may have on the system, 
the operation of Eq. (4) used as the basis for a variant 
according to which Eq. (2) can be converted into a specific 
system form carrying uncertainties and unknown types of 
parameter values that mimic emission nonlinearity. 
   (5) 
In which 
: . 
Also, for the purpose of effectively demonstrating that the 
system (1) can control the so-called desired output rate 
according to the exponential law even in the presence of 
uncertainty factors, some of the following lemmas that can 
be used to demonstrate this are given. 
Proof of lemma 1: According to the existing content of the 
nominal system (2), assuming that the relative order of the 
system can satisfy the formula of r ≤ n, then it can be 
further analyzed from the point of view of differential 
geometry theory to derive the operation formula of the 
local differential homogeneous embryo 
as , which satisfies the condition 
, , and then the formula v = 
B(x) + A(x) u obtained from the transformation is input 
into the system, thus the canonical model after successful 
transformation of system (2) can be obtained as follows. 
                           (6) 
Where ： ；
。 
Proof of Lemma 2: According to the nonlinear dynamics 
of the system's own properties, assume that 
one of them as a sufficiently smooth function as well as the 
relative norm satisfies , ε, while , 
prompting the satisfaction of the condition 
, then it is operated as follows: 
 
     
Then the system state x(t) is converging exponentially, i.e. 
           (7) 
Where the convergence rate . 
The control management related to the deep neural 
network robot's system is mostly dependent on the 
existence of its own uncertainty factors and then a certain 
design, but the boundary with uncertainty is mainly based 
on the designer's own knowledge and experience to make 
the relevant assessment and established, which inevitably 
has a certain degree of subjective factors, which usually 
leads directly to the great reduction of the accuracy of data 
and information control. Secondly, the RBF neural 
network learning system has an unknown upper limit of 
uncertainty, which can be used to improve the accuracy of 
the control system for management. 
A formula is set in which the weights 
are expressed by , and the estimated vector of weights is 
expressed by , the specific formula of a Gaussian 
function is . The function is 
special without missing the generality that it has, thus 
enabling the following hypothetical data information to be 
derived. 
Hypothesis 1: Setting an arbitrary normal number as v, 
which is arbitrarily small, and at the same time there exists 
a relatively optimal value of the weights, which is denoted 
by Beijing, this value θ*, then it can be derived that the 
neural network system exists with a great approximation 
error value δ(x), which can satisfy the following operation. 
162   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
             (8) 
Hypothesis 2: For v in equation (8), the upper bound ρ(x) 
of the uncertainty ϕ(x) satisfies 
                       (9) 
Theorem 2 can be proved: the data of the nonlinear system 
calculated according to equation (4), when it can satisfy 
several assumptions from 1 to 5, from which the output 
equation can be derived. The formula satisfies 
any initial value condition as well as relatively arbitrary 
expected values with bounds, which yield the relevant 
feedback control laws as (9a), (9b), respectively. 
 
 
 
 
When the data information system to control the existence 
of the error function Lyapunov infinitely converge to a 
relatively smooth state, while it can be satisfied with the 
closed-loop system in its structure for maintaining a 
consistent endpoint value, it has a certain bounded state. 
Proof: The Lyapunov function is chosen 
as
 . and  is any given positive number, is the 
estimate of . 
Calculating its differential, then 
 
 
 
Let, , , then 
 
 
According to the above description, can 
be obtained from 0<α<1, the Lyapunov-based correlation 
theorem formula with certain stability can be used to 
derive that the error value of the data information system 
for control is relatively stable for Lyapunov. The specific 
state always maintains an endpoint value with certain 
bounds related to prove the theorem as 1. 
3   Design of multimedia information 
retrieval system based on deep 
learning and information fusion 
Relatively speaking, traditional methods of searching 
multimedia information data generally involve very 
tedious and very complex related learning behaviors, 
resulting in a very vague purpose and many uncertainties, 
which directly lead to problems related to later 
semi-structuring. Based on statistical principles, it is 
difficult to establish a relevant search model for the 
relatively more traditional multimedia information 
education. When analyzed from a cybernetic point of view, 
it is also difficult to quickly and accurately monitor 
information data in teaching and learning in real time 
[19-20]. Therefore, it can be concluded that it is necessary 
to design a hardware system for retrieving multimedia 
information data for teaching based on deep learning and 
information fusion. In the whole multimedia teaching 
process, the multimedia information data mainly used for 
deep learning and rapid integration of knowledge is 
designed to help students quickly understand the 
information data related to relatively representational 
learning behavior. In the process of deep learning and 
information integration, multimedia information data is 
effectively retrieved from a series of hardware systems for 
instruction and learning as well as the primary goal of 
planning for monitoring data sources is to monitor 
students' specific multimedia information learning in real 
time, and to collect timely information about students' 
deep learning behavior. The most critical design elements 
are: the length of time learners is engaged in learning, the 
amount of learning learners can master, real-time 
student-teacher interaction, positive student responses to 
teachers' questions, and real-time monitoring of learning 
progress. Different sources of multimedia information 
data are available, such as the status of different test results 
and the information data points displayed in the results. 
Most of these sources of information data come from the 
memory storage system of the multimedia IT server 
terminal, and the information data is automatically saved 
every three minutes. This ensures that the errors in the data 
sources collected are relatively small, which has a very 
good impact on the monitoring of the information sources. 
The essential component of the system architecture for 
effective multimedia information data retrieval is the 
selection of a specific knowledge base, which is 
essentially a specific set of rules. The algorithm of data 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   163                                                                                                                                             
information mining using data entropy as a specific basis 
can effectively extract a variety of extremely powerful 
information data. 
When N= (Q, E, R, T) is set as a technical system for 
retrieval of multimedia information data, the formula 
can be obtained, where p is used as its 
specific coefficient, then 
            (10) 
           (11) 
Then the data mining information of object a with respect 
to E is 
                         (12) 
In the above formula, H€ represents the information 
entropy data of E. The information entropy of E after the 
data is continuously mined for object a is represented by 
. 
Provided the informational data mining industry's fast 
expansion, new rules are added to the knowledge data 
information base to limit the relevant intelligent behavior 
of the system. The effective design of mining multimedia 
information data based on data entropy can not only 
demonstrate the specific method from the perspective of 
the database in many aspects, but also analyze the results 
of the acquired information data more effectively. 
The system is mainly divided into four different modules, 
including: the module for quick recovery of multimedia 
information data, the model for building multimedia 
information database, the module for effective training of 
the model and the module for reasonable maintenance of 
the system [21-22]. The structure diagram shown in Figure 
6 is the overall specific structure of multimedia data 
retrieval system. 
Figure 6: Structure of multimedia data retrieval system. 
 
3.1 Multimedia data retrieval module 
The extracted paired multiple different sets of sample data 
are used to perform certain analysis of the characteristic 
attributes of multimedia information data, which are 
connected one by one to the multimedia information data 
function model vector in the downloaded retrieval 
database, so as to obtain the specific distance existing 
between each multimedia information data in the 
downloaded retrieval library and the sample multimedia 
164   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
information data, and then sort them from smallest to 
largest according to the relevant display requirements of 
users, and quickly display the last best results. The system 
block diagram of multimedia data retrieval is shown in 
Figure 7. 
 
Figure 7: System block diagram of multimedia data retrieval module. 
The inference process in step (2) is based on the extraction 
of the multimedia information data feature attributes that 
exist in the data samples obtained. The relevant method of 
extraction is performed by introducing a deep neural 
network system structure. After performing certain 
operations on each layer of the neural network system, the 
feature attribute vectors are finally obtained through the 
source layer [23]. The feature attribute vector obtained in 
this work is a 20-dimensional feature vector. In this 
research algorithm, step (4) is the specific matching 
algorithm by using the Euclidean spacing as follows. 
                          
                                              (13) 
The algorithmic process of step (5) is the effective 
matching of multimedia information data present in the 
download retrieval library one by one, and finally the 
resulting relevant results are ranked (e.g., distance) and the 
resulting data are returned to a set of results with the 
closest similar values. 
The workflow of multimedia data retrieval is shown in 
Figure 8. 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   165                                                                                                                                             
 
Figure 8: Multimedia data retrieval workflow. 
On the user page, you can click the "Select Files" button in 
the system interface, enter the specific number of results 
returned (for example, the 100 most similar multimedia 
information data will be returned), and then click "Submit" 
directly (before the user can perform the search operation, 
the user must create a folder library of their own). In case 
of searching for multimedia information data, you can 
create a folder library directly by sending the relevant 
folder). The framework Caffe server can select the vector 
of features of the multimedia information data that can be 
identified and match them with the corresponding folder 
library. Finally, a set of results with the most similarity is 
returned. 
3.2 Multimedia data retrieval library 
building module 
The multimedia data information search library is 
compared with the multimedia information data obtained 
from the search in the multimedia information data 
retrieval technology system, which carries out the storage 
mainly through the neural network system to obtain the 
feature attribute vectors of the relevant multimedia 
information data [24]. 
Similarity calculation formula: Similarity=1/(distance+1) 
Distance calculation formula: 
Distance=  
(m is the dimension of the feature vector) 
It can be concluded that when the distance existing 
between two multimedia information data is close to 0, the 
similarity it has is infinitely close to 100%, which leads to 
the conclusion that the greater the distance between two 
multimedia information data, the lower its specific 
similarity. 
Based on the above discussion information, it can be found 
that the most influential factor for the equation is the 
vector of the feature function corresponding to the 
multimedia information data. After all, it is a question of 
whether the training-related model is accurate or not. If the 
model is more accurate, it can represent the difference 
 (relatively small) that exists between two 
multimedia information data values with similarity at the 
same scale, and thus the calculated distance is also small, 
and the results are relatively more accurate. 
166   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
The multimedia data retrieval library is built in two steps: 
Step 1: For the extraction of feature attributes of 
multimedia information data, which leads to the schematic 
diagram of the framework for extracting feature attribute 
vectors of multimedia information data shown in Figure 9, 
the neural network system framework available is the 
framework Caffe mentioned in detail above. 
Figure 9: Extraction of multimedia data feature vectors. 
Step 2: The last layer of the neural network framework 
Caffe in the deep neural network system possessed by 
the robot is the soft-max data layer, so that the specific 
probabilities for each different category are obtained 
for calculation. Thus, a certain data information table 
"TABLES-I" can be formed based on the subscript 
index i (index) of the largest dimension of the obtained 
different feature attributes. By creating such an 
information database, there are m different information 
data tables, and m corresponds to different dimensions 
of the feature attribute vector. When the relevant 
information search is carried out in this way, the paper 
can search for "TABLES-I" in the corresponding table 
of data tables based on the extracted multimedia feature 
attribute vectors between different samples to calculate 
the maximum value of the corresponding subset. This 
avoids scanning the entire search information database 
to a certain extent, and increases the time efficiency by 
a factor of about m. Figure 10 shows the detailed 
architecture of the entire specific information database. 
Figure 10: Feature vector deposited into the database. 
3.3 Model training module 
When the user inputs a multimedia information data set, 
the minimum value of each different category of 
multimedia information data is not less than 100, and the 
number of its categories is at least more than one, so as to 
carry out effective and fast training for the corresponding 
information data model. 
The factors that may have some influence on the training 
of the model include the following: the ultimate purpose of 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   167                                                                                                                                             
training the model is to train a model for the training of 
multimedia information data [25]. The final result is 
essentially a binary file whose main function is to store the 
values of the weight parameters that exist between the 
layers of the neural network system, and the size of this 
binary file is large at about 227M. 
The model is being effectively trained in question, the 
most important factor that has an impact on the training 
model is most likely the long duration of the model under 
training. The main reason for a lengthy training period due 
to the neural network's many factors system (about 65 
million parameters) and the time-consuming matrix layer 
operations between each different layer (hundreds of 
millions of operations have been performed on the matrix 
layers). There are forward and backward matrix 
multipliers on each of the different layers, so the relevant 
performance of the machine can significantly affect the 
time to perform the training. In this paper, we are planning 
to use GPU parallel computing to speed up the training of 
the model. This is why GPU performance is a core 
component in this system. The server designed and built in 
this paper mainly uses the Tesla K20c GPU, which can 
reach approximately 700 times the speed of the Quadro 
K20m. Figure 11: Comparison chart of Tesla K20c and 
Quadro K2100m training time.  
The performance curve for comparison shows that the 
slope of the Tesla K20c curve is relatively low. Figure 11 
shows the specific curve variations related to the different 
performances of the two models, Quadro K210 m and 
Tesla K20c, for the comparison. 
 
Figure 11: Comparison chart of Tesla K20c and Quadro 
K2100m training time. 
Figure 12 shows a schematic diagram of the specific 
framework for training on the multimedia information data 
model. It can be seen that the model consists of four main 
components, which include: a data browser, a Web web 
server, a Caffe framework server, and support for the data 
information repository. The Web web server sends the 
acquired multimedia information data to the Caffe 
Framework server. After pre-processing the multimedia 
information data, quickly tuning its parameters, generating 
the relevant training information data and validating the 
information data, the Caffe Framework server sends a user 
interface inviting the Caffe Framework to train the model 
in question and starts the training iteration. Finally, the 
data and information model for training is effectively 
stored in the information database, into which the 
information data about the training is input to inform the 
client about the specific development of the training. 
168   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
Figure 12: Training model architecture. 
Figure 13 shows the specific training steps for different 
multimedia information data models, e.g., the user sends 
training information data on the system interface by 
directly clicking Send Data, and the web data server sends 
the training information data from the user to the Caffe 
framework server quickly to be processed, and then starts 
the relevant and effective training on the information data. 
The Caffe Framework server efficiently sends relevant 
training information to customers in real time and informs 
users of the training progress of data information in real 
time. After finally completing all training on the data 
information, the Caffe Framework server will return to the 
client all basic information about the training model 
(specific to the trainer, specific training time, number of 
iterations generated, relevant configuration accuracy 
information, etc.). 
 
Figure 13: Flow of training model operation. 
 
 
 
 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   169                                                                                                                                             
3.4 System maintenance module 
Important information databases exist in each system, 
such as template databases, download and search template 
databases, etc. Information on the performance (i.e., the 
exact number and duration of training iterations) of the 
training conducted by the various servers needs to be 
retained and an interface provided to a user-selectable 
interface for use. 
Figure 14 shows a detailed diagram of the structure of the 
system's maintenance module. The maintenance structure 
is mainly composed of three different parts: first, empty 
and rebuild an effective index base; Secondly, eliminate 
the invalid model content. Third, obtain the specific curve 
of the training performance of the system training server. 
Figure 14: System information maintenance structure. 
4. Analysis and outcomes of the 
experiment 
To successfully evaluate the system's functionality for 
constructing educational multimedia information data 
retrieval technology proposed and designed in this paper, 
the multimedia information data of a local school which is 
reasonably developed based on Linux/Windows CE 
technology is selected as a specific research object. By 
testing the current performance of BIM information data 
retrieval system, the function 
hpel432_CreateChannel-Group () is mainly used to 
determine the list of related modules, different channel 
numbers and other parameters in the education related 
multimedia information data retrieval technology system.  
 
 
The data information port is set as the local bus 
information, which is used to read each sample 
information data, and its RESAMP_data interface 
frequency is set as 14.8 kHz. According to the 
experimental environment and parameters mentioned 
above, the performance of the retrieval technology system 
for multimedia information data used in education is tested 
to a certain extent. 
Specific contents of Experiment 1: Different methods are 
used to test the specific speed of the relevant system, 
focusing on the retrieval of multimedia information data. 
The experimental comparison in Figure 15 shows the 
specific comparison results obtained by different methods. 
170   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
 
Figure 15: Comparison of experimental results. 
Figure 15 clearly show that the proposed and designed for 
deep learning and related information data fusion type 
teaching of multimedia information retrieval technology 
of the system can retrieved effective information rapidly in 
a shorter period of time, and the more traditional 
multimedia data retrieval speed is about half the time 
slower than the model of the design put forward. 
The specific content of Experiment 2: the data retrieval 
method performed quickly from the behavior module 
related to the retrieval of multimedia information data, 
such as forum, teaching course, teaching task, teaching 
resources, user message and learning chat room. Select the 
teaching video content of this semester for teachers and 
students, and collect the experimental specific information 
in the following aspects: the dialogue involved, the 
specific completion of students' homework, the number of 
information resources browsed, and the video 
communication between teachers and students. 
The algorithm of vertical crossover is adopted: 
 
 
 
 
                 (14) 
The above formula contains the content of the extracted 
multimedia information data in the field of education 
represented by A. P represents the coefficient of data 
correction factor; W represents the specific azimuth 
parameter value of information data, N represents the 
specific value of the difference parameter between 
students' scores; The results of fast search are represented 
by the letter x, and x 'is the ideal parameter value for fast 
retrieval of data representation. ΔX represents a certain 
difference between the required data; A1: The number of 
video sessions between students and teachers and the 
correction of this data; A2: Amendments to the number of 
effective teachers in the course; A3: A series of 
amendments to the number of students attending the 
lecture. 
The retrieval of relevant information data based on the 
calculation of formula (14) is quickly collected, so as to 
obtain the effective accuracy of the retrieval of this model, 
as shown in the comparison in Figure 16. 
 
Figure 16: Precision comparison of multimedia 
information retrieval with different methods. 
Figure 16 shows the exact comparison between different 
systematic methods of searching. In contrast, traditional 
multimedia information methods and deep-based learning 
and effective information fusion methods are used to 
search relevant educational information. As the number of 
different experiments continues to increase, the accuracy 
of the detection performed by the relatively more 
traditional methods remains between 10% and 40%. 
Although the variation during the period has been small, 
the accuracy obtained is also very low. In contrast, the 
accuracy of the detection method for multimedia 
information data designed in this paper is much higher 
than that of the traditional method. Even though the 
number of relevant experiments is still increasing, the 
detection accuracy of the proposed method in this paper 
has been at a high level, which also to some extent 
represents its relatively good stability. Even if it fluctuates, 
the range of fluctuation is relatively small, and the 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   171                                                                                                                                             
accuracy of its retrieval is always stable in the range of 
80%-90%. 
Based on the results of the above experimental process, it 
can be concluded that the search design of multimedia 
information data for teaching and learning based on depth 
learning and effective integration of knowledge is 
relatively efficient compared with the traditional search 
methods. In addition, the amount of multimedia 
information data that can be obtained by using deep 
learning and effective fusion of information for 
educational multimedia information data into the search 
system is very satisfactory. The schematic diagram of 
Figure 17 explains in more detail the specific advantages 
about the system approach envisaged in this paper. 
According to the comparison of the two different 
approaches shown in Figure 17, the amount of multimedia 
information obtained from the learning media information 
system through deep learning and effective information 
fusion is much higher than the amount of information 
obtained from the traditional approach. It can be seen that 
the system method proposed in this paper not only has 
relatively fast data retrieval speed and high accuracy, but 
also has strong data retrieval ability, which fully proves 
that the system method designed in this paper has high 
performance and practical application value to a large 
extent. 
 
Figure 17: Comparison of the number of multimedia 
information data models retrieved under the two different 
methods. 
According to the comparison results of the two methods in 
Table 2, the teachers who chose the more traditional 
multimedia information data retrieval method accounted 
for half of the total number, while the teachers who 
applied to choose the method of learning and effective 
integration of knowledge based on depth proposed in this 
paper were quite large, already accounting for 90% of the 
total number. The number of students who chose more 
traditional methods for e-learning was 45% of the total, 
less than half, while those who chose to engage in deeper 
for learning and effective integration of information made 
up 95% of the total number of students. The pass rate for 
students educated through the use of traditional 
multimedia information data methods is approximately 
40%, while the pass rate for students educated through 
deep learning and knowledge integration as a certain 
multimedia information data technology is 80%. Among 
the number of surveys conducted by parents in support of 
the two methods, the multimedia information data system 
based on deep learning and knowledge fusion was selected 
with up to 100% support. Therefore, it can be concluded 
that the system of multimedia information search 
technology based on deep learning and effective fusion of 
information is superior in performance. 
Table 2: Comparison of multimedia information data 
retrieved under different methods. 
Retrieval object 
Traditional 
method 
The method of 
this paper 
Number of 
teachers online 
10~15 15~20 
Number of 
students online 
20~45 50~75 
Parental support 
rate/% 
20 100 
 
4.1 Recall  
Recall is a measure of a model's ability to identify each 
positive instance. In other places, it's referred to as the 
sensitivity rate or the rate of true positives. A comparison 
between the recall rates of the suggested approach with the 
current methods is shown in Figure 18. For MRT, 
IRI-RAS, and DLMNN, the equivalent recall rates were 
70%, 72.50%, and 74.06%. Compared to other 
approaches, the proposed methodology deep learning and 
information fusion (DLIF) has a 76% recall rate. Our 
suggested approach works better than the current ones, 
according to the results.  
172   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
 
Figure 18: outcome of recall 
4.2 F1-score 
As a periodic means of recall and precision, the F1-score 
provides an accurate evaluation of a method's efficiency. It 
helps to achieve an accuracy and memory balance. 
Comparing and evaluating the suggested methods is 
shown in Figure 19. 80.60%, 84.98%, and 87.44% were 
the corresponding f1-score values for MRT, IRI-RAS, and 
DLMNN, and deep learning and information fusion is a 
suggested process that yields a 89% f1-score rate when 
compared to current methods. The outcomes demonstrate 
how well our suggested approach works in comparison to 
the existing methods.  
 
Figure 19: Outcome of F1-score 
4.3 Computation time  
The speed parameter, which is commonly expressed in 
terms of meters per second (m/s) or kilometers per hour 
(km/h), indicates the velocity at which an item travel. It is 
a scalar number that only expresses the speed and not the 
direction of motion. Comparing and evaluating the 
suggested methods is shown in Figure 20. The distance 
traveled divided by the time required yields the speed. It 
has an impact on performance, safety, and efficiency in a 
variety of applications, making it a critical factor in 
physics, engineering, and transportation, among other 
professions. Table 3 displays the, recall, f1-score, and 
speed. For MRT, IRI-RAS, and DLMNN, the equivalent 
speed values were 320, 210, and 175. Compared to 
existing approaches, a suggested procedure including deep 
learning and information fusion delivers a 
150-computation time.  
 
Figure 20: outcome of Computation time 
Table 3: Performance values 
 
 
 
Application of MOOC Data Based on Autonomous Intelligent…                      Informatica 48 (2024) 155–174   173                                                                                                                                             
4.4 Discussion 
The drawbacks of MRT include its possible inefficiency 
when processing big datasets, noise sensitivity, and 
dependence on human parameter adjustment, which limits 
its resilience and scalability. IRI-RAS may encounter 
difficulties such limited flexibility to a variety of datasets, 
reliance on predefined rules that might not capture all 
subtleties, and complexity in incorporating semantic 
understanding. DLMNN drawbacks include the possibility 
of overfitting brought on by intricate designs, the need for 
sizable datasets for efficient training, and the computing 
resource intensity that affects scalability and usefulness. 
The suggested approach efficiently bridges the semantic 
gap by utilizing deep learning and information fusion to 
provide quick and accurate multimedia data retrieval. It 
improves system performance by optimizing the software 
and hardware architecture. 
5   Conclusion 
With the continuous development of computer networks, 
deep learning and information fusion technology have 
developed rapidly, and multimedia digital education with 
strong flexibility and high accuracy has gradually become 
the mainstream. But for multimedia data retrieval and 
analysis of the need to conform to the large amount of 
information, quick efficiency, low cost and effectiveness 
of four big principles, this paper puts forward the retrieval 
method of the robotic deep neural networks multimedia 
data, making the students realize the track of the data in the 
process of data retrieval and analysis of data and 
information, to analyze the teaching information with the 
software design of the system. In the education course, it is 
necessary to record and store the teacher's teaching 
method and content system in real time, provide a way for 
future students to review, and the design of multimedia 
data retrieval under deep learning and information 
integration lays a solid foundation for Chinese education. 
The example analysis results show that the deep learning 
and information fusion technology can extract the 
semantic features of the information according to the 
initial multimedia data, and the robot deep neural network 
method has good robustness. For the multimedia data 
downloaded online, the retrieval result accuracy is high. 
Some drawbacks include be the need for substantial 
computer resources, difficulties in fine-tuning hyper 
parameters, and scalability problems with big datasets. 
Subsequent investigations may concentrate on augmenting 
the model's scalability, strengthening its generalization 
across various datasets, and investigating innovative 
fusion methodologies for superior feature extraction. 
Furthermore, using cutting-edge technology such as edge 
computing might improve real-time retrieval capabilities 
and increase application in a variety of fields.  
Data availability 
The data used to support the findings of this study are 
included within the article. 
Conflicts of interest 
The authors declare no conflicts of interest. 
Funding statement 
This study did not receive any funding in any form. 
 
References 
[1] Li, S., Choo, K. K. R., Tan, Z., He, X., Hu, J., & 
Qin, T., Ieee access special section editorial: 
security and trusted computing for industrial internet 
of things: research challenges and opportunities. 
IEEE Access, vol. 8, no. 2, pp. 145033-145036, 
2020. 
[2] Yang, W., & Zhang, P., Research on barrier free 
design of the landscape environment of the city 
walking street based on computer multimedia: a 
security perspective. RISTI - Revista Iberica de 
Sistemas e Tecnologias de Informacao, vol. 2016,no. 
1, pp. 292-301, 2016. 
[3] Panwei, Z., & Zhenjiang, W. U., Ta-ons — new 
enquiry system of internet of things. Journal of 
Computer Applications, vol. 30, no. 8, pp. 
2202-2206, 2010. 
[4] Cai, S. , Xia, J. , Sun, K. , & Wang, Z. , [ieee 2013 
ieee international conference on green computing 
and communications (greencom) and ieee internet of 
things(ithings) and ieee cyber, physical and social 
computing(cpscom) - beijing, china 
(2013.08.20-2013.08.23)] 2013 ieee international 
conference on green computing and 
communications and ieee internet of things and ieee 
cyber, physical and social computing - eigencrime: 
an algorithm for criminal network mining based on 
trusted computing. 1325-1329, 2013. 
[5] Maene, P., Gotzfried, J., Clercq, R. D., Muller, T., 
Freiling, F., & Verbauwhede, I., Hardware-based 
trusted computing architectures for isolation and 
attestation. IEEE Transactions on Computers, vol. 
67, no. 3, pp. 361-374, 2018. 
[6] Zhang, Y., Technology framework of the internet of 
things and its application. IEEE, vol. 6, no. 1, pp. 
4109-4112, 2011. 
[7] Ansari, N., & Sun, X., Mobile edge computing 
empowers internet of things. Ice Transactions on 
174   Informatica 48 (2024) 155–174                                                               X. Guo et al. 
Communications, vol. 101, no. 3, pp. 604-619, 
2018. 
[8] Zhang, P., Durresi, M., & Durresi, A., Multi-access 
edge computing aided mobility for privacy 
protection in internet of things. Computing, vol. 101, 
no. 7, pp. 729-742, 2019. 
[9] Adegbija, T., Rogacs, A., Patel, C., & Gordon-Ross, 
A., Microprocessor optimizations for the internet of 
things: a survey. IEEE Transactions on 
Computer-Aided Design of Integrated Circuits and 
Systems, vol. 1, no. 99, pp. 1-1, 2017. 
[10] Palattella, M. R., Dohler, M., Grieco, A., Rizzo, G., 
Torsner, J., & Engel, T., et al., Internet of things in 
the 5g era: enablers, architecture and business 
models. IEEE Journal on Selected Areas in 
Communications, vol. 34, no. 3, pp. 510-527, 2016. 
[11] Bertino, E., & Islam, N., Botnets and internet of 
things security. Computer, vol. 50, no. 2, pp. 76-79, 
2017. 
[12] Gao, L., Song, J., Zou, F., Zhang, D. and Shao, J., 
2015, October. Scalable multimedia retrieval by 
deep learning hashing with relative similarity 
learning. In Proceedings of the 23rd ACM 
international conference on Multimedia (pp. 
903-906). 
[13] Kumar, N., Rahman, S. S., & Dhakad, N., Fuzzy 
inference enabled deep reinforcement 
learning-based traffic light control for intelligent 
transportation system. IEEE Transactions on 
Intelligent Transportation Systems, vol. 7, no. 99, pp. 
1-10, 2020. 
[14] Hsu, C. F., Lin, C. M., & Yeh, R. G., Supervisory 
adaptive dynamic rbf-based neural-fuzzy control 
system design for unknown nonlinear systems. 
Applied Soft Computing Journal, vol. 13, no. 4, pp. 
1620-1626, 2013. 
[15] Huang, M. T., Lee, C. H., & Lin, C. M., Type-2 
fuzzy cerebellar model articulation controller-based 
learning rate adjustment for blind source separation. 
International Journal of Fuzzy Systems, vol. 16, no. 
3, pp. 411-421, 2014. 
[16] Xiao, C., Wang, L., Zhu, M., & Wang, W., A 
resource-efficient multimedia encryption scheme for 
embedded video sensing system based on unmanned 
aircraft. Journal of Network & Computer 
Applications, vol. 59, no. 1, pp. 117-125, 2016. 
[17] Yan, L., Jeong, Y. S., Shin, B. S., & Park, J. H., 
Crowdsensing multimedia data: security and privacy 
issues. IEEE MultiMedia, vol. 24, no. 4, pp. 58-66, 
2017. 
[18] Dziech, A., Leszczuk, M., & Baran, R., Ranking 
based approach for noise handling in recommender 
systems, [Communications in computer and 
information science] multimedia communications, 
services and security volume, vol. 566, no. 
10.1007/978-3-319-26404-2(Chapter 4), pp. 46-58, 
2015. 
[19] Choi, K. H., & Lee, D. H., A study on strengthening 
security awareness programs based on an rfid access 
control system for inside information leakage 
prevention. Multimedia Tools & Applications, vol. 
74, no. 20, pp. 8927-8937, 2015. 
[20] Hurrah, N. N., Parah, S. A., Loan, N. A., Sheikh, J. 
A., Elhoseny, M., & Muhammad, K., Dual 
watermarking framework for privacy protection and 
content authentication of multimedia. Future 
generation computer systems, vol. 94, no. 5, pp. 
654-673, 2019. 
[21] Ghadi, M., Laouamer, L., & Moulahi, T., Securing 
data exchange in wireless multimedia sensor 
networks: perspectives and challenges. Multimedia 
Tools and Applications, vol. 75, no. 6, pp. 
3425-3451, 2016. 
[22] Dziech, A., Leszczuk, M., & Baran, R., A 
multi-agent approach for intrusion detection in 
distributed systems, [Communications in computer 
and information science] multimedia 
communications, services and security volume, vol. 
566, no. 10.1007/978-3-319-26404-2(Chapter 6), pp. 
72-82, 2015. 
[23] Hao, H., Zhang, H., Liu, Y., & Wang, Y., 
Quantitative method for network security situation 
based on attack prediction. Security & 
Communication Networks, vol. 24, no. 1, pp. 
181-186, 2017. 
[24] Qin, L. N., The network security situation prediction 
based on artificial immune algorithm. Journal of 
Changchun Institute of Technology (Natural 
Sciences Edition), vol. 79, no. 11, pp. 7299-7318, 
2018. 
[25] Hu, J., Ma, D., Chen, L., Yan, H., & Hu, C., An 
improved prediction model for the network security 
situation. Springer, Cham, vol. 8, no. 4, pp. 292-301, 
2019. 
[26]    Prasanth, T. and Gunasekaran, M., Effective 
big data retrieval using deep learning modified 
neural networks. Mobile Networks and 
Applications, vol. 24, no. 1, pp.282-294, 2019.