https://doi.org/10.31449/inf.v48i13.6063 Informatica 48 (2024) 155–174 155 Retrieval and Analysis of Multimedia Data of Robot Deep Neural Network Based on Learning and Information Fusion Xian Guo, Jianing Yang * , Libao Yang National Industrial Information Security Development Research Center, Beijing, China, 100040 E-mail: jn_young90@126.com, wcbbrxeoxf474@163.com, lib_yong@163.com * Corresponding author Keywords: big data, multimedia, teaching, data mining, information retrieval, system design Recieved: April 19, 2024 In view of many problems of slow data information retrieval speed and low retrieval accuracy in the use of traditional data information retrieval systems, this research proposed a robotic deep neural network multimedia data retrieval methodology using information fusion and deep learning. By using deep learning combined with information fusion algorithms, we obtain a combination of lower-level features to form more abstract salient features in order to analyze the feature distribution characteristics of data information. This method can successfully address the "semantic gap" issue during the retrieval procedure and analysis of multimedia data from robotic deep neural networks. At the same time, the robot deep neural network can realize optimization of the system hardware from multimedia data tracking, data mining and retrieval system warning to design the corresponding software design process. Finally, the results of the analysis by example show that: teaching multimedia information retrieval as an example for analysis, the multimedia information retrieval system proposed in this paper has fast retrieval speed and high accuracy, which can provide a perfect platform for the field of education and will become an important part of media data retrieval in the future. Povzetek: Razvita je metodologijo za pridobivanje in analizo multimedijskih podatkov iz globokih nevronskih mrež robotov, ki temelji na učenju in združevanju informacij. S tem je rešen problem "semantične vrzeli" pri iskanju multimedijskih podatkov. Rezultati analize kažejo, da ima sistem hiter čas iskanja in visoko točnost, kar je obetavno za uporabo na področju izobraževanja in drugih aplikacij za pridobivanje medijskih podatkov. 1 Introduction In the continuous deepening of the impact on society brought about by current educational reform, various universities have made appropriate use of multimedia information in the process of practical teaching for management reform, and its effective analysis of relevant operational methods has become a hot topic of discussion in the current society. The effective use of multimedia information retrieval technology in the teaching process can, to some extent, measure all aspects of students' abilities, which is an important indicator in examining the practical teaching and management leadership of the school in terms of practical teaching and management leadership is carried out by conducting practical teaching information retrieval technology platform [1-3]. The effective integration of deep learning concepts and knowledge, as multimedia data information retrieval technology to a certain extent, can break the relatively traditional multimedia data information retrieval model of an emerging model. Due to the effective coverage brought by multimedia data information technology, It offers an increasing number of options for learning for people's daily life. This is also because multimedia information data education is not only for students, but also provides certain educational opportunities for many other identities in the society [4-5]. Multimedia information retrieval technology systems are designed to reflect, to a certain extent, the relevant data and information parameters and to present them to the designer in the simplest and clear format. The designers use the appropriately extracted data information to theoretically and effectively analyze the relevant hardware and software that the system itself has, and to quickly record the real-time data information and the certain changes that arise between the parameters; thus making it possible to have a certain warning function relative to other systems: if the students' learning development is in a relatively backward state, then they must immediately activate the relevant modes used for warning [6]. The literature [7] discusses in great detail the current standards and principles of domestic and foreign methods of retrieval of multimedia information technologies, on the basis of which several main tools and some models for the design of multimedia information data systems are proposed; the literature [8] examines the security level of systems that currently use multimedia 156 Informatica 48 (2024) 155–174 X. Guo et al. information technologies for retrieval, and analyzes the security aspects of multimedia information data retrieval systems in terms of security evaluation metrics; the study of the literature [9] focuses on a mature concept of anti-virus cooperation with the aim of creating an effective security wall for a series of designs of systems related to multimedia information data retrieval. The contribution of the literature [10] lies in the relatively virtual physical experiments carried out in the actual design of reference-related information systems, and certain conclusions are drawn based on the experimental summary, and has a very great possible development for the development of multimedia information data for retrieval reference systems [7-10]. In contrast, traditional content-based multimedia information data retrieval systems mainly use color, shape, texture, and other such categories of lower-level, visually relevant features. Most of the classification systems in this are relatively superficial classification systems, such as the svm system that has been developed and implemented. The main problem with these systems that have entered applications is their inability to deal more effectively with the semantic gap [11], that is, there is some differences between the similarities that machine systems obtain from relatively low-level visual properties and the similarities that humans obtain from relatively high-level semantic properties. Although current techniques related to multimedia information data retrieval have been proposed and effective results have been achieved to some extent, the problem of real-time retrieval of multimedia information data is still very challenging due to the existence of this uncrossable semantic gap [11]. This also means that at a higher level, the search based on multimedia information data content belongs to a more artificial and intelligent field, and it also means that the discussion is whether there can be such a machine that can recognize multimedia information data content as effectively as a human can do. Of all the currently available technologies and a range of literature studies, machine learning-related intelligent technologies are somewhat expected to be the largest approach to addressing the language gap. The methods of searching for multimedia information data in relatively traditional teaching are very limited, which results in not reflecting very adequately the learning situation that most students are currently in. On such a basis, this paper proposes the design of a system for teaching multimedia information search as a way to achieve a deeper integration of learning and knowledge. The first thing to do is to monitor each component from the data source, the components related to mining through data information and the components of the system for timely alerts, which are the three main components designed to carry out the hardware of the search system, and focus on the analysis of the algorithms related to data mining; the second thing to do is to effectively analyze the software part of the search system from the flow chart of the software related design, and make full use of the algorithms related to information search so as to obtain the specific model functions; the last thing to do is to conduct some experimental comparison between the traditional methods and the teaching methods systematically discussed in this paper. The results of several experiments were conducted to demonstrate that the design of the application system is essential for the development of the application. Table 1: Related works Reference objective findings limitations [12] A semi-supervised deep learning hashing (DLH) technique for quick multimedia retrieval was presented in the study. To be more explicit, they used label and visual information in the first component to create a relative similarity graph that better reflected the relationship between the training data. Based on the graph, they then generated the hash codes. In order to concurrently train a decent multimedia representation and hash functions, they employ a deep convolutional neural network (CNN) in The DLH outperformed both supervised and unsupervised hashing techniques, as shown by extensive testing on three widely used datasets. The proposed DLH method's reliance on labeled data may limit its applicability in scenarios where acquiring labeled data was challenging or costly. Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 157 the second step. [13] The paper proposed Dynamic and Intelligent Traffic Signal Control System (DITLCS) that dynamically modified the traffic signal length based on real-time traffic information. Additionally, there were three modes of operation for the planned DITLCS: Fair Mode (FM), Priority Mode (PM), and Emergency Mode (EM). Using an open-source simulator called Simulation of Urban MObility (SUMO), they conducted a realistic simulation on an Indian city map called Gwalior to assess DITLCS. The outcomes of the simulation demonstrated the effectiveness of DITLCS when compared to other cutting-edge algorithms across a range of performance metrics. Complicated implementation, reliance on precise real-time data, possible hardware/software malfunctions, and difficulties connecting with current infrastructure Complicated implementation, reliance on precise real-time data, possible software/hardware malfunctions, and difficulties connecting with current infrastructure. [14] The research suggested a dynamic TSK-type RBF-based neural-fuzzy (DTRN) system, in which the learning algorithm modified the parameters online in addition to creating and pruning the fuzzy rules online. Next, a supervisory compensator and DTRN controller comprise the Supervisory Adaptive Dynamic RBF-based Neural-Fuzzy Control (SADRNC) system. To demonstrate the usefulness of the proposed SADRNC system, it was utilized to control an inverted pendulum and a chaotic system. The suggested SADRNC scheme's stability was analytically shown, and several simulations demonstrated its efficacy. Computational overhead may result from the complexity of online rule creation and parameter modification. Compensator design may need to be complicated in order to provide stability, which might lead to increased complexity of the system and implementation difficulties. 2 Robotic deep neural networks 2.1 Robotic deep neural network framework The framework Caffe, located in the framework of deep neural network systems for robots, is based on the Alexnet model to design accordingly. The Alex network model, which won first place in the ImageNet 2012 competition for classification of multimedia information data network systems, is also a deep convolutional type neural network system (CNN) [15]. The Caffe framework system is used as a special implementation of the Alexnet network system model. Caffe uses the C++ language for the related writing, which has the advantage of faster computer computing, relatively good modeling type, and strong support from the open-source community, has a sizable user base even in academia and industry. The Caffe framework system also has eight layers of neural networks. The first 5 layers are used as convective convolution and the last 3 layers are 158 Informatica 48 (2024) 155–174 X. Guo et al. used as fully connected layers. Its specific network structure is shown in Figure 1. Figure 1: Caffe network structure The network framework structure of Caffe in Fig 1 shows the first convolutional layer, the second convolutional layer and the fifth convolutional layer of the Caffe framework followed by a pooling layer as well. soft-max layer is located at the last layer of the overall framework structure, which also serves as the specific the outcome layer of the entire architecture, as seen in Figure 2-5. Figure 2: First convolutional layer network structure Figure 3: Second convolutional layer network structure Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 159 Figure 4: Structure of the sixth fully-connected layer network Figure 5: The seventy layer fully connected layer network architecture 2.2 Content-based multimedia data retrieval Content-Based Image Retrieval (CBIR), a multimedia information system for relevant retrieval of image data, has been the most prominent hot topic in computer vision research in the last decade. The primary analysis performed by the based-on content multi-media information retrieval system is the visual characteristics of different multimedia information data and retrieves similar multimedia information data from downloadable databases using specific algorithms that are relatively close to compatibility. Content-based multimedia data information retrieval system, in essence, is more like a matching-related technology, which effectively combines computer image vision, effective processing of multimedia information data, multimedia information data understanding, database and other relatively mature technical achievements in various fields [16]. In some previous research applications, content-based multimedia data information retrieval systems mainly use more low-level correlation attribute features, such as global color feature attributes, relative edge feature attributes, texture attribute features, GIST and CENTRIST feature attributes, and more local attribute features, such as using graphs with locality features (SIFT, SURF) package of words correlation model (Bow). The distance-related algorithms used in traditional content-based multimedia data information retrieval systems are relatively fixed, which mainly include the common Euclidean distance formula and the cosine similarity equation and other similar operations [17]. The content-based multimedia data information retrieval system based on the robotic deep neural network system uses the feature functions extracted from the robotic deep neural network as a certain information index. In several experiments, the Alex network system model is used in parallel, which has eight different layers of neural networks, five types of convolutional layers and three relatively complete connectivity layers. The last three layers are effectively distinguished by high-level feature attributes of multimedia information data. The first five type convolutional layers remove the relatively low-level vision-related features of the multimedia information data system. In the subsequent experiments, this paper uses the last layer as the functional specific representation of multimedia information data. The work of Mr. Ji Wan and other researchers and other results show that the last two layers are the best resolution layers for multimedia information data related functions regarding retrieval. In the Alex network system model, the last layer is used as the softmax layer, which spreads the logistic regression model and the multiclassification problem to the extent. The specific calculation of Equation 1 is its specific mathematical expression, which quickly calculates the relative probabilities about the multimedia information data, the results of which belong to each of the different categories. There are 20 categories of model applied to the training during the experiments, so the dimensionality of the last layer is 20 dimensions, while the sum of 20 dimensions is equal to 1. 160 Informatica 48 (2024) 155–174 X. Guo et al. (1) During the process of machine correlation learning, many relevant algorithms are based on the calculation of the distance between two sampling points, and in the process of retrieving multimedia information data, the correlation learning of distance has been studied systematically by many scholars in a relatively extensive manner. The specific performance of retrieving multimedia information data depends not only on the nature of multimedia information data, but also on the characteristic properties of a series of equation-related metric functions. The equation metric function to some extent directly determines the specific results of multimedia information data search and its efficiency properties. Content-based multimedia information data search system technology is different from text-based multimedia information data search system technology, and the relevant search of multimedia information data is conducted mainly by the following ways: calculating the specific visual characteristics of the multimedia information data present in the relevant examples and checking the certain similarity between them and the multimedia information data contained in the search library in order to determine the relevant search results[18]. The multimedia information data retrieval technology based on robotic deep neural network system forms a feature attribute vector after removing the feature attributes of multimedia information data, and then represents the corresponding relevant multimedia information data based on this feature attribute vector. The search for multimedia information data is conducted mainly by comparing the certain similarity existing between two different multimedia information data feature attribute vectors (minimum distance) to evaluate the maximum similarity between different multimedia information data. In other words, the distance comparison of feature attribute vectors of multimedia information data is considered as a valid comparison of similarity of multimedia information data. Obviously, a relatively good feature attribute vector and a more appropriate distance learning correlation algorithm play a central role in finding the correlation of multimedia information data. 2.3 Robot deep neural network control algorithm AlexNet is a well-known CNN architecture, its powerful feature extraction capabilities are used for multimedia data retrieval. By utilizing the Caffe deep learning framework's efficiency, performance may be improved by adjusting hyper parameters like batch size and learning rate. This improvement guarantees accurate feature extraction and quick retrieval, which are essential for applications involving multimedia data. AlexNet is a flexible solution for multimedia data retrieval jobs because it can effectively extract features from multimedia data while retaining quick retrieval times by fine-tuning these parameters. The robot deep neural network control equation can be expressed as (2) In the above equations, q represents the angular vector of the manipulator arm, represents the angular velocity vector and represents the angular acceleration vector; represents the inertia matrix with symmetry positive definite; represents the average centripetal force and the Coriolis force term; the gravity term is represented by the equation , represents the dynamic friction matrix coefficients, represents the symbol of the static friction vector, and represents the value from external disturbances. Setting as a certain representation of the state vector, then the system specific form of the associated affine nonlinearity performed for the system (2) can be expressed as follows (3). (3) In this equation (3), , ; ;the specific trajectory of the specific motion coordinates of the terminal operator is represented by the symbol , and u=τ is used as the representative value for the control input. Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 161 The attribute array of characteristic functions corresponding to the nominal values of each parameter is represented by the symbols , , , and , respectively, which in turn is determined to be based in some way on the linear attribute characteristics that the robot itself has. In this operator formula, it is known that the matrix function used as a regression is , and the function represents the value of the parameter vector corresponding to the physical dimension that the robot has. (4) In which, , ; In view of the certain influence that some uncertainties in the current design system itself may have on the system, the operation of Eq. (4) used as the basis for a variant according to which Eq. (2) can be converted into a specific system form carrying uncertainties and unknown types of parameter values that mimic emission nonlinearity. (5) In which : . Also, for the purpose of effectively demonstrating that the system (1) can control the so-called desired output rate according to the exponential law even in the presence of uncertainty factors, some of the following lemmas that can be used to demonstrate this are given. Proof of lemma 1: According to the existing content of the nominal system (2), assuming that the relative order of the system can satisfy the formula of r ≤ n, then it can be further analyzed from the point of view of differential geometry theory to derive the operation formula of the local differential homogeneous embryo as , which satisfies the condition , , and then the formula v = B(x) + A(x) u obtained from the transformation is input into the system, thus the canonical model after successful transformation of system (2) can be obtained as follows. (6) Where : ; 。 Proof of Lemma 2: According to the nonlinear dynamics of the system's own properties, assume that one of them as a sufficiently smooth function as well as the relative norm satisfies , ε, while , prompting the satisfaction of the condition , then it is operated as follows: Then the system state x(t) is converging exponentially, i.e. (7) Where the convergence rate . The control management related to the deep neural network robot's system is mostly dependent on the existence of its own uncertainty factors and then a certain design, but the boundary with uncertainty is mainly based on the designer's own knowledge and experience to make the relevant assessment and established, which inevitably has a certain degree of subjective factors, which usually leads directly to the great reduction of the accuracy of data and information control. Secondly, the RBF neural network learning system has an unknown upper limit of uncertainty, which can be used to improve the accuracy of the control system for management. A formula is set in which the weights are expressed by , and the estimated vector of weights is expressed by , the specific formula of a Gaussian function is . The function is special without missing the generality that it has, thus enabling the following hypothetical data information to be derived. Hypothesis 1: Setting an arbitrary normal number as v, which is arbitrarily small, and at the same time there exists a relatively optimal value of the weights, which is denoted by Beijing, this value θ*, then it can be derived that the neural network system exists with a great approximation error value δ(x), which can satisfy the following operation. 162 Informatica 48 (2024) 155–174 X. Guo et al. (8) Hypothesis 2: For v in equation (8), the upper bound ρ(x) of the uncertainty ϕ(x) satisfies (9) Theorem 2 can be proved: the data of the nonlinear system calculated according to equation (4), when it can satisfy several assumptions from 1 to 5, from which the output equation can be derived. The formula satisfies any initial value condition as well as relatively arbitrary expected values with bounds, which yield the relevant feedback control laws as (9a), (9b), respectively. When the data information system to control the existence of the error function Lyapunov infinitely converge to a relatively smooth state, while it can be satisfied with the closed-loop system in its structure for maintaining a consistent endpoint value, it has a certain bounded state. Proof: The Lyapunov function is chosen as . and is any given positive number, is the estimate of . Calculating its differential, then Let, , , then According to the above description, can be obtained from 0<α<1, the Lyapunov-based correlation theorem formula with certain stability can be used to derive that the error value of the data information system for control is relatively stable for Lyapunov. The specific state always maintains an endpoint value with certain bounds related to prove the theorem as 1. 3 Design of multimedia information retrieval system based on deep learning and information fusion Relatively speaking, traditional methods of searching multimedia information data generally involve very tedious and very complex related learning behaviors, resulting in a very vague purpose and many uncertainties, which directly lead to problems related to later semi-structuring. Based on statistical principles, it is difficult to establish a relevant search model for the relatively more traditional multimedia information education. When analyzed from a cybernetic point of view, it is also difficult to quickly and accurately monitor information data in teaching and learning in real time [19-20]. Therefore, it can be concluded that it is necessary to design a hardware system for retrieving multimedia information data for teaching based on deep learning and information fusion. In the whole multimedia teaching process, the multimedia information data mainly used for deep learning and rapid integration of knowledge is designed to help students quickly understand the information data related to relatively representational learning behavior. In the process of deep learning and information integration, multimedia information data is effectively retrieved from a series of hardware systems for instruction and learning as well as the primary goal of planning for monitoring data sources is to monitor students' specific multimedia information learning in real time, and to collect timely information about students' deep learning behavior. The most critical design elements are: the length of time learners is engaged in learning, the amount of learning learners can master, real-time student-teacher interaction, positive student responses to teachers' questions, and real-time monitoring of learning progress. Different sources of multimedia information data are available, such as the status of different test results and the information data points displayed in the results. Most of these sources of information data come from the memory storage system of the multimedia IT server terminal, and the information data is automatically saved every three minutes. This ensures that the errors in the data sources collected are relatively small, which has a very good impact on the monitoring of the information sources. The essential component of the system architecture for effective multimedia information data retrieval is the selection of a specific knowledge base, which is essentially a specific set of rules. The algorithm of data Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 163 information mining using data entropy as a specific basis can effectively extract a variety of extremely powerful information data. When N= (Q, E, R, T) is set as a technical system for retrieval of multimedia information data, the formula can be obtained, where p is used as its specific coefficient, then (10) (11) Then the data mining information of object a with respect to E is (12) In the above formula, H€ represents the information entropy data of E. The information entropy of E after the data is continuously mined for object a is represented by . Provided the informational data mining industry's fast expansion, new rules are added to the knowledge data information base to limit the relevant intelligent behavior of the system. The effective design of mining multimedia information data based on data entropy can not only demonstrate the specific method from the perspective of the database in many aspects, but also analyze the results of the acquired information data more effectively. The system is mainly divided into four different modules, including: the module for quick recovery of multimedia information data, the model for building multimedia information database, the module for effective training of the model and the module for reasonable maintenance of the system [21-22]. The structure diagram shown in Figure 6 is the overall specific structure of multimedia data retrieval system. Figure 6: Structure of multimedia data retrieval system. 3.1 Multimedia data retrieval module The extracted paired multiple different sets of sample data are used to perform certain analysis of the characteristic attributes of multimedia information data, which are connected one by one to the multimedia information data function model vector in the downloaded retrieval database, so as to obtain the specific distance existing between each multimedia information data in the downloaded retrieval library and the sample multimedia 164 Informatica 48 (2024) 155–174 X. Guo et al. information data, and then sort them from smallest to largest according to the relevant display requirements of users, and quickly display the last best results. The system block diagram of multimedia data retrieval is shown in Figure 7. Figure 7: System block diagram of multimedia data retrieval module. The inference process in step (2) is based on the extraction of the multimedia information data feature attributes that exist in the data samples obtained. The relevant method of extraction is performed by introducing a deep neural network system structure. After performing certain operations on each layer of the neural network system, the feature attribute vectors are finally obtained through the source layer [23]. The feature attribute vector obtained in this work is a 20-dimensional feature vector. In this research algorithm, step (4) is the specific matching algorithm by using the Euclidean spacing as follows. (13) The algorithmic process of step (5) is the effective matching of multimedia information data present in the download retrieval library one by one, and finally the resulting relevant results are ranked (e.g., distance) and the resulting data are returned to a set of results with the closest similar values. The workflow of multimedia data retrieval is shown in Figure 8. Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 165 Figure 8: Multimedia data retrieval workflow. On the user page, you can click the "Select Files" button in the system interface, enter the specific number of results returned (for example, the 100 most similar multimedia information data will be returned), and then click "Submit" directly (before the user can perform the search operation, the user must create a folder library of their own). In case of searching for multimedia information data, you can create a folder library directly by sending the relevant folder). The framework Caffe server can select the vector of features of the multimedia information data that can be identified and match them with the corresponding folder library. Finally, a set of results with the most similarity is returned. 3.2 Multimedia data retrieval library building module The multimedia data information search library is compared with the multimedia information data obtained from the search in the multimedia information data retrieval technology system, which carries out the storage mainly through the neural network system to obtain the feature attribute vectors of the relevant multimedia information data [24]. Similarity calculation formula: Similarity=1/(distance+1) Distance calculation formula: Distance= (m is the dimension of the feature vector) It can be concluded that when the distance existing between two multimedia information data is close to 0, the similarity it has is infinitely close to 100%, which leads to the conclusion that the greater the distance between two multimedia information data, the lower its specific similarity. Based on the above discussion information, it can be found that the most influential factor for the equation is the vector of the feature function corresponding to the multimedia information data. After all, it is a question of whether the training-related model is accurate or not. If the model is more accurate, it can represent the difference (relatively small) that exists between two multimedia information data values with similarity at the same scale, and thus the calculated distance is also small, and the results are relatively more accurate. 166 Informatica 48 (2024) 155–174 X. Guo et al. The multimedia data retrieval library is built in two steps: Step 1: For the extraction of feature attributes of multimedia information data, which leads to the schematic diagram of the framework for extracting feature attribute vectors of multimedia information data shown in Figure 9, the neural network system framework available is the framework Caffe mentioned in detail above. Figure 9: Extraction of multimedia data feature vectors. Step 2: The last layer of the neural network framework Caffe in the deep neural network system possessed by the robot is the soft-max data layer, so that the specific probabilities for each different category are obtained for calculation. Thus, a certain data information table "TABLES-I" can be formed based on the subscript index i (index) of the largest dimension of the obtained different feature attributes. By creating such an information database, there are m different information data tables, and m corresponds to different dimensions of the feature attribute vector. When the relevant information search is carried out in this way, the paper can search for "TABLES-I" in the corresponding table of data tables based on the extracted multimedia feature attribute vectors between different samples to calculate the maximum value of the corresponding subset. This avoids scanning the entire search information database to a certain extent, and increases the time efficiency by a factor of about m. Figure 10 shows the detailed architecture of the entire specific information database. Figure 10: Feature vector deposited into the database. 3.3 Model training module When the user inputs a multimedia information data set, the minimum value of each different category of multimedia information data is not less than 100, and the number of its categories is at least more than one, so as to carry out effective and fast training for the corresponding information data model. The factors that may have some influence on the training of the model include the following: the ultimate purpose of Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 167 training the model is to train a model for the training of multimedia information data [25]. The final result is essentially a binary file whose main function is to store the values of the weight parameters that exist between the layers of the neural network system, and the size of this binary file is large at about 227M. The model is being effectively trained in question, the most important factor that has an impact on the training model is most likely the long duration of the model under training. The main reason for a lengthy training period due to the neural network's many factors system (about 65 million parameters) and the time-consuming matrix layer operations between each different layer (hundreds of millions of operations have been performed on the matrix layers). There are forward and backward matrix multipliers on each of the different layers, so the relevant performance of the machine can significantly affect the time to perform the training. In this paper, we are planning to use GPU parallel computing to speed up the training of the model. This is why GPU performance is a core component in this system. The server designed and built in this paper mainly uses the Tesla K20c GPU, which can reach approximately 700 times the speed of the Quadro K20m. Figure 11: Comparison chart of Tesla K20c and Quadro K2100m training time. The performance curve for comparison shows that the slope of the Tesla K20c curve is relatively low. Figure 11 shows the specific curve variations related to the different performances of the two models, Quadro K210 m and Tesla K20c, for the comparison. Figure 11: Comparison chart of Tesla K20c and Quadro K2100m training time. Figure 12 shows a schematic diagram of the specific framework for training on the multimedia information data model. It can be seen that the model consists of four main components, which include: a data browser, a Web web server, a Caffe framework server, and support for the data information repository. The Web web server sends the acquired multimedia information data to the Caffe Framework server. After pre-processing the multimedia information data, quickly tuning its parameters, generating the relevant training information data and validating the information data, the Caffe Framework server sends a user interface inviting the Caffe Framework to train the model in question and starts the training iteration. Finally, the data and information model for training is effectively stored in the information database, into which the information data about the training is input to inform the client about the specific development of the training. 168 Informatica 48 (2024) 155–174 X. Guo et al. Figure 12: Training model architecture. Figure 13 shows the specific training steps for different multimedia information data models, e.g., the user sends training information data on the system interface by directly clicking Send Data, and the web data server sends the training information data from the user to the Caffe framework server quickly to be processed, and then starts the relevant and effective training on the information data. The Caffe Framework server efficiently sends relevant training information to customers in real time and informs users of the training progress of data information in real time. After finally completing all training on the data information, the Caffe Framework server will return to the client all basic information about the training model (specific to the trainer, specific training time, number of iterations generated, relevant configuration accuracy information, etc.). Figure 13: Flow of training model operation. Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 169 3.4 System maintenance module Important information databases exist in each system, such as template databases, download and search template databases, etc. Information on the performance (i.e., the exact number and duration of training iterations) of the training conducted by the various servers needs to be retained and an interface provided to a user-selectable interface for use. Figure 14 shows a detailed diagram of the structure of the system's maintenance module. The maintenance structure is mainly composed of three different parts: first, empty and rebuild an effective index base; Secondly, eliminate the invalid model content. Third, obtain the specific curve of the training performance of the system training server. Figure 14: System information maintenance structure. 4. Analysis and outcomes of the experiment To successfully evaluate the system's functionality for constructing educational multimedia information data retrieval technology proposed and designed in this paper, the multimedia information data of a local school which is reasonably developed based on Linux/Windows CE technology is selected as a specific research object. By testing the current performance of BIM information data retrieval system, the function hpel432_CreateChannel-Group () is mainly used to determine the list of related modules, different channel numbers and other parameters in the education related multimedia information data retrieval technology system. The data information port is set as the local bus information, which is used to read each sample information data, and its RESAMP_data interface frequency is set as 14.8 kHz. According to the experimental environment and parameters mentioned above, the performance of the retrieval technology system for multimedia information data used in education is tested to a certain extent. Specific contents of Experiment 1: Different methods are used to test the specific speed of the relevant system, focusing on the retrieval of multimedia information data. The experimental comparison in Figure 15 shows the specific comparison results obtained by different methods. 170 Informatica 48 (2024) 155–174 X. Guo et al. Figure 15: Comparison of experimental results. Figure 15 clearly show that the proposed and designed for deep learning and related information data fusion type teaching of multimedia information retrieval technology of the system can retrieved effective information rapidly in a shorter period of time, and the more traditional multimedia data retrieval speed is about half the time slower than the model of the design put forward. The specific content of Experiment 2: the data retrieval method performed quickly from the behavior module related to the retrieval of multimedia information data, such as forum, teaching course, teaching task, teaching resources, user message and learning chat room. Select the teaching video content of this semester for teachers and students, and collect the experimental specific information in the following aspects: the dialogue involved, the specific completion of students' homework, the number of information resources browsed, and the video communication between teachers and students. The algorithm of vertical crossover is adopted: (14) The above formula contains the content of the extracted multimedia information data in the field of education represented by A. P represents the coefficient of data correction factor; W represents the specific azimuth parameter value of information data, N represents the specific value of the difference parameter between students' scores; The results of fast search are represented by the letter x, and x 'is the ideal parameter value for fast retrieval of data representation. ΔX represents a certain difference between the required data; A1: The number of video sessions between students and teachers and the correction of this data; A2: Amendments to the number of effective teachers in the course; A3: A series of amendments to the number of students attending the lecture. The retrieval of relevant information data based on the calculation of formula (14) is quickly collected, so as to obtain the effective accuracy of the retrieval of this model, as shown in the comparison in Figure 16. Figure 16: Precision comparison of multimedia information retrieval with different methods. Figure 16 shows the exact comparison between different systematic methods of searching. In contrast, traditional multimedia information methods and deep-based learning and effective information fusion methods are used to search relevant educational information. As the number of different experiments continues to increase, the accuracy of the detection performed by the relatively more traditional methods remains between 10% and 40%. Although the variation during the period has been small, the accuracy obtained is also very low. In contrast, the accuracy of the detection method for multimedia information data designed in this paper is much higher than that of the traditional method. Even though the number of relevant experiments is still increasing, the detection accuracy of the proposed method in this paper has been at a high level, which also to some extent represents its relatively good stability. Even if it fluctuates, the range of fluctuation is relatively small, and the Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 171 accuracy of its retrieval is always stable in the range of 80%-90%. Based on the results of the above experimental process, it can be concluded that the search design of multimedia information data for teaching and learning based on depth learning and effective integration of knowledge is relatively efficient compared with the traditional search methods. In addition, the amount of multimedia information data that can be obtained by using deep learning and effective fusion of information for educational multimedia information data into the search system is very satisfactory. The schematic diagram of Figure 17 explains in more detail the specific advantages about the system approach envisaged in this paper. According to the comparison of the two different approaches shown in Figure 17, the amount of multimedia information obtained from the learning media information system through deep learning and effective information fusion is much higher than the amount of information obtained from the traditional approach. It can be seen that the system method proposed in this paper not only has relatively fast data retrieval speed and high accuracy, but also has strong data retrieval ability, which fully proves that the system method designed in this paper has high performance and practical application value to a large extent. Figure 17: Comparison of the number of multimedia information data models retrieved under the two different methods. According to the comparison results of the two methods in Table 2, the teachers who chose the more traditional multimedia information data retrieval method accounted for half of the total number, while the teachers who applied to choose the method of learning and effective integration of knowledge based on depth proposed in this paper were quite large, already accounting for 90% of the total number. The number of students who chose more traditional methods for e-learning was 45% of the total, less than half, while those who chose to engage in deeper for learning and effective integration of information made up 95% of the total number of students. The pass rate for students educated through the use of traditional multimedia information data methods is approximately 40%, while the pass rate for students educated through deep learning and knowledge integration as a certain multimedia information data technology is 80%. Among the number of surveys conducted by parents in support of the two methods, the multimedia information data system based on deep learning and knowledge fusion was selected with up to 100% support. Therefore, it can be concluded that the system of multimedia information search technology based on deep learning and effective fusion of information is superior in performance. Table 2: Comparison of multimedia information data retrieved under different methods. Retrieval object Traditional method The method of this paper Number of teachers online 10~15 15~20 Number of students online 20~45 50~75 Parental support rate/% 20 100 4.1 Recall Recall is a measure of a model's ability to identify each positive instance. In other places, it's referred to as the sensitivity rate or the rate of true positives. A comparison between the recall rates of the suggested approach with the current methods is shown in Figure 18. For MRT, IRI-RAS, and DLMNN, the equivalent recall rates were 70%, 72.50%, and 74.06%. Compared to other approaches, the proposed methodology deep learning and information fusion (DLIF) has a 76% recall rate. Our suggested approach works better than the current ones, according to the results. 172 Informatica 48 (2024) 155–174 X. Guo et al. Figure 18: outcome of recall 4.2 F1-score As a periodic means of recall and precision, the F1-score provides an accurate evaluation of a method's efficiency. It helps to achieve an accuracy and memory balance. Comparing and evaluating the suggested methods is shown in Figure 19. 80.60%, 84.98%, and 87.44% were the corresponding f1-score values for MRT, IRI-RAS, and DLMNN, and deep learning and information fusion is a suggested process that yields a 89% f1-score rate when compared to current methods. The outcomes demonstrate how well our suggested approach works in comparison to the existing methods. Figure 19: Outcome of F1-score 4.3 Computation time The speed parameter, which is commonly expressed in terms of meters per second (m/s) or kilometers per hour (km/h), indicates the velocity at which an item travel. It is a scalar number that only expresses the speed and not the direction of motion. Comparing and evaluating the suggested methods is shown in Figure 20. The distance traveled divided by the time required yields the speed. It has an impact on performance, safety, and efficiency in a variety of applications, making it a critical factor in physics, engineering, and transportation, among other professions. Table 3 displays the, recall, f1-score, and speed. For MRT, IRI-RAS, and DLMNN, the equivalent speed values were 320, 210, and 175. Compared to existing approaches, a suggested procedure including deep learning and information fusion delivers a 150-computation time. Figure 20: outcome of Computation time Table 3: Performance values Application of MOOC Data Based on Autonomous Intelligent… Informatica 48 (2024) 155–174 173 4.4 Discussion The drawbacks of MRT include its possible inefficiency when processing big datasets, noise sensitivity, and dependence on human parameter adjustment, which limits its resilience and scalability. IRI-RAS may encounter difficulties such limited flexibility to a variety of datasets, reliance on predefined rules that might not capture all subtleties, and complexity in incorporating semantic understanding. DLMNN drawbacks include the possibility of overfitting brought on by intricate designs, the need for sizable datasets for efficient training, and the computing resource intensity that affects scalability and usefulness. The suggested approach efficiently bridges the semantic gap by utilizing deep learning and information fusion to provide quick and accurate multimedia data retrieval. It improves system performance by optimizing the software and hardware architecture. 5 Conclusion With the continuous development of computer networks, deep learning and information fusion technology have developed rapidly, and multimedia digital education with strong flexibility and high accuracy has gradually become the mainstream. But for multimedia data retrieval and analysis of the need to conform to the large amount of information, quick efficiency, low cost and effectiveness of four big principles, this paper puts forward the retrieval method of the robotic deep neural networks multimedia data, making the students realize the track of the data in the process of data retrieval and analysis of data and information, to analyze the teaching information with the software design of the system. In the education course, it is necessary to record and store the teacher's teaching method and content system in real time, provide a way for future students to review, and the design of multimedia data retrieval under deep learning and information integration lays a solid foundation for Chinese education. The example analysis results show that the deep learning and information fusion technology can extract the semantic features of the information according to the initial multimedia data, and the robot deep neural network method has good robustness. For the multimedia data downloaded online, the retrieval result accuracy is high. Some drawbacks include be the need for substantial computer resources, difficulties in fine-tuning hyper parameters, and scalability problems with big datasets. Subsequent investigations may concentrate on augmenting the model's scalability, strengthening its generalization across various datasets, and investigating innovative fusion methodologies for superior feature extraction. Furthermore, using cutting-edge technology such as edge computing might improve real-time retrieval capabilities and increase application in a variety of fields. Data availability The data used to support the findings of this study are included within the article. Conflicts of interest The authors declare no conflicts of interest. Funding statement This study did not receive any funding in any form. References [1] Li, S., Choo, K. K. R., Tan, Z., He, X., Hu, J., & Qin, T., Ieee access special section editorial: security and trusted computing for industrial internet of things: research challenges and opportunities. IEEE Access, vol. 8, no. 2, pp. 145033-145036, 2020. [2] Yang, W., & Zhang, P., Research on barrier free design of the landscape environment of the city walking street based on computer multimedia: a security perspective. RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, vol. 2016,no. 1, pp. 292-301, 2016. [3] Panwei, Z., & Zhenjiang, W. U., Ta-ons — new enquiry system of internet of things. Journal of Computer Applications, vol. 30, no. 8, pp. 2202-2206, 2010. [4] Cai, S. , Xia, J. , Sun, K. , & Wang, Z. , [ieee 2013 ieee international conference on green computing and communications (greencom) and ieee internet of things(ithings) and ieee cyber, physical and social computing(cpscom) - beijing, china (2013.08.20-2013.08.23)] 2013 ieee international conference on green computing and communications and ieee internet of things and ieee cyber, physical and social computing - eigencrime: an algorithm for criminal network mining based on trusted computing. 1325-1329, 2013. [5] Maene, P., Gotzfried, J., Clercq, R. D., Muller, T., Freiling, F., & Verbauwhede, I., Hardware-based trusted computing architectures for isolation and attestation. IEEE Transactions on Computers, vol. 67, no. 3, pp. 361-374, 2018. [6] Zhang, Y., Technology framework of the internet of things and its application. IEEE, vol. 6, no. 1, pp. 4109-4112, 2011. [7] Ansari, N., & Sun, X., Mobile edge computing empowers internet of things. Ice Transactions on 174 Informatica 48 (2024) 155–174 X. Guo et al. Communications, vol. 101, no. 3, pp. 604-619, 2018. [8] Zhang, P., Durresi, M., & Durresi, A., Multi-access edge computing aided mobility for privacy protection in internet of things. Computing, vol. 101, no. 7, pp. 729-742, 2019. [9] Adegbija, T., Rogacs, A., Patel, C., & Gordon-Ross, A., Microprocessor optimizations for the internet of things: a survey. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 1, no. 99, pp. 1-1, 2017. [10] Palattella, M. R., Dohler, M., Grieco, A., Rizzo, G., Torsner, J., & Engel, T., et al., Internet of things in the 5g era: enablers, architecture and business models. IEEE Journal on Selected Areas in Communications, vol. 34, no. 3, pp. 510-527, 2016. [11] Bertino, E., & Islam, N., Botnets and internet of things security. Computer, vol. 50, no. 2, pp. 76-79, 2017. [12] Gao, L., Song, J., Zou, F., Zhang, D. and Shao, J., 2015, October. Scalable multimedia retrieval by deep learning hashing with relative similarity learning. In Proceedings of the 23rd ACM international conference on Multimedia (pp. 903-906). [13] Kumar, N., Rahman, S. S., & Dhakad, N., Fuzzy inference enabled deep reinforcement learning-based traffic light control for intelligent transportation system. IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 99, pp. 1-10, 2020. [14] Hsu, C. F., Lin, C. M., & Yeh, R. G., Supervisory adaptive dynamic rbf-based neural-fuzzy control system design for unknown nonlinear systems. Applied Soft Computing Journal, vol. 13, no. 4, pp. 1620-1626, 2013. [15] Huang, M. T., Lee, C. H., & Lin, C. M., Type-2 fuzzy cerebellar model articulation controller-based learning rate adjustment for blind source separation. International Journal of Fuzzy Systems, vol. 16, no. 3, pp. 411-421, 2014. [16] Xiao, C., Wang, L., Zhu, M., & Wang, W., A resource-efficient multimedia encryption scheme for embedded video sensing system based on unmanned aircraft. Journal of Network & Computer Applications, vol. 59, no. 1, pp. 117-125, 2016. [17] Yan, L., Jeong, Y. S., Shin, B. S., & Park, J. H., Crowdsensing multimedia data: security and privacy issues. IEEE MultiMedia, vol. 24, no. 4, pp. 58-66, 2017. [18] Dziech, A., Leszczuk, M., & Baran, R., Ranking based approach for noise handling in recommender systems, [Communications in computer and information science] multimedia communications, services and security volume, vol. 566, no. 10.1007/978-3-319-26404-2(Chapter 4), pp. 46-58, 2015. [19] Choi, K. H., & Lee, D. H., A study on strengthening security awareness programs based on an rfid access control system for inside information leakage prevention. Multimedia Tools & Applications, vol. 74, no. 20, pp. 8927-8937, 2015. [20] Hurrah, N. N., Parah, S. A., Loan, N. A., Sheikh, J. A., Elhoseny, M., & Muhammad, K., Dual watermarking framework for privacy protection and content authentication of multimedia. Future generation computer systems, vol. 94, no. 5, pp. 654-673, 2019. [21] Ghadi, M., Laouamer, L., & Moulahi, T., Securing data exchange in wireless multimedia sensor networks: perspectives and challenges. Multimedia Tools and Applications, vol. 75, no. 6, pp. 3425-3451, 2016. [22] Dziech, A., Leszczuk, M., & Baran, R., A multi-agent approach for intrusion detection in distributed systems, [Communications in computer and information science] multimedia communications, services and security volume, vol. 566, no. 10.1007/978-3-319-26404-2(Chapter 6), pp. 72-82, 2015. [23] Hao, H., Zhang, H., Liu, Y., & Wang, Y., Quantitative method for network security situation based on attack prediction. Security & Communication Networks, vol. 24, no. 1, pp. 181-186, 2017. [24] Qin, L. N., The network security situation prediction based on artificial immune algorithm. Journal of Changchun Institute of Technology (Natural Sciences Edition), vol. 79, no. 11, pp. 7299-7318, 2018. [25] Hu, J., Ma, D., Chen, L., Yan, H., & Hu, C., An improved prediction model for the network security situation. Springer, Cham, vol. 8, no. 4, pp. 292-301, 2019. [26] Prasanth, T. and Gunasekaran, M., Effective big data retrieval using deep learning modified neural networks. Mobile Networks and Applications, vol. 24, no. 1, pp.282-294, 2019.