https://doi.org/10.31449/inf.v46i6.3864 Informatica 46 (2022) 1–7 1 Multimodal Machine Learning for Major League Baseball Playoff Prediction Aliaa Saad Yaseen 1* , Ali Fadhil Marhoon 2 , and Sarmad Asaad Saleem 1 E-mail: aliaa.yaseen@uobasrah.edu.iq, ali.marhoon@uobasrah.edu.iq, sarmad.saleem@uobasrah.edu.iq 1 College of Computer Science and Information Technology, University of Basrah, Iraq 2 College of Engineering, University of Basrah, Iraq Keywords: machine learning, prediction, sport, baseball Received: December 11, 2021 The introduction on sabermetrics has changed the way Major League Baseball (MLB) teams valued their players. Since then, new baseball stats have been made to make various predictions for MLB teams. This domain contains an immense amount of data on baseball players, teams, and scores. Using various supervised machine learning algorithms, we plan to see how well we can accurately predict which teams will make it to the playoff for year 2019. For this research, we have gathered data from the last 20 years. The features that we will utilize for our machine learning algorithm include Runs, Batting Average, Homeruns, Strikeouts, Innings Pitched, Earned Runs, and Earned Runs average. We decided to use a Logistic Regression model and a Support Vector Classifier (SVC) as the two machine learning algorithms for our features. After running our tests, our models showed that our trained algorithms were only able to predict accurately 77% of the teams correctly. Of those 77% accurately predicted, 59% was recalled correctly. This led to our overall projected model being only 60% accurate. The projected model was only able to correctly predict 6 out of 10 teams that made the 2019 playoffs. We believe that we can improve upon our findings by using other machine learning algorithms or including more features that can increase the overall accuracy of our training model. Povzetek: Za Major League Baseball so z različnimi algoritmi strojnega učenja skušali na osnovi statističnih podatkov napovedati, katere ekipe se bodo uvrstile v zaključne tekme. 1 Introduction In 2004 Michael Lewis published a New York Times bestseller, Moneyball: The Art of Winning an Unfair Game, which was made into a movie in 2011 featuring Brad Pitt and Jonah Hill [14]. The book is about a General manager Billy Beane of the Oakland Athletics, who used sabermetrics [9] to measure undervalued players. Due to budget constraints he figured out a more strategic way to measure players. Under Billy Beane, the Oakland A’s won 20 consecutive games in 2002. The A’s was the first team in American League baseball in over 100+ years to win that many games consecutively. After his world breaking record, he was offered $12.5 million from the Boston Red Sox to become their General manager (GM) which he declined. Billy Beane’s tactic has changed the way baseball teams value their players [2]. The use of statistical analysis in baseball has always been used since the early days. General managers and baseball scouts used to measure players based on traditional batting and pitching statistics such as batting averages (BA), at bats, home runs, strikeouts etc. The popularity of sabermetrics has brought better measurements to light whether from a batter’s statistics to a pitcher. We’ll use sabermetrics to help us collect data from each team’s stats from 1998 to 2018. Although there * Corresponding author is data, we can gather from 50+ years of baseball, we’re just going to evaluate the last 20 years [11]. Making predictions in baseball is relatively tough and it can never be perfect. “Years ago, sabermetrician Tom Tango researched the amount of talent and luck that go into team winning percentages and found that chance explains one- third of the difference between two teams’ record.”. [16] Which makes it hard to predict how many wins a team will make in a season. With each team playing 162 games, there’s a lot of factors that can affect predictions on any given day. There are tons of baseball fans and diehard fans. Like football, there’s fantasy baseball where you build a team and accumulate points based on your player’s up-to-date stats [4]. Fans alone will make bets on which teams will win or not. “Teams at all levels are improving their ability to evaluate players, make decisions on personnel and game plan by taking a fresh look at data. The field continues to grow and change, with the integration of video analysis, defensive statistics and health analytics the advancement of baseball through data manipulation has no end in sight.”. [11] Whether someone is a baseball fan or not or breaking into the analytics part of it, this paper takes 2 Informatica 46 (2022) 1–7 A. S. Yaseen et al. an interesting measure of how stats contributes to the world of baseball. 2 Related works There are many researchers presents papers in the field of Predicting Major League Baseball. Brandon Tolbert et al. [23] uses Data Mining technique to Predict Major League Baseball Championship Winners. They will attempt to develop classifiers using Support Vector Machines (SVM’s) to predict the winners on multi- levels. 42 teams were labeled as World Series champions and 42 teams were labeled as World Series losers. The classifier that produced the highest accuracy of 77.1 was the Gaussian Kernel RBF SVM. Justine Jones et al. [24] presented a Sports Illustrated (SI) Predictions pre-season forecasting accuracy for four major North American sport championships over the last 30 years. While results varied across leagues. SI was generally more successful at predicting divisional winners compared to conference and league champions. The study had some limitations, such that, 30 years of data, presents a longitudinal data source but provided a relatively small number of data points, some of it was missing because of league lockouts, missing pages in the magazine, or SI may not make a prediction that year. Finally, the authors were unable to determine how SI predictions were/are made. Soto Valero, C. [25] employs sabermetrics statistics with the purpose of assessing the predictive capabilities of four data mining methods (classification and regression based) for predicting outcomes (win or loss) in MLB regular season games. The model approach uses only past data when making a prediction, corresponding to ten years of publicly available data. The obtained results prove that the classification predictive scheme forecasts game outcomes better than the regression scheme. Among the four data mining methods used, SVMs give the best predictive results with a mean of about 60% prediction accuracy for each team. Chia-Hao Chang [26] used the Markov process method and the runner advancement model to estimate the expected runs in an MLB match for the teams based on the batting lineup and the pitcher. The source of data was the 70 MLB matches with most batter versus pitcher matchup stats in 2018. During the theoretical analysis for these matches in this article, when they restore the very moment of betting, where the outcomes were unknown, and the total return for the 70 matches according to the prediction probability models of NIP, and NIP–NBD are 23.89 and 22.69, respectively, converted to return on investment (ROI) as 34.13% and 32.41%. Ting-Chun Yu and et al. [27] utilized both of the GA and Support vector machine (SVM) for the prediction purposes; it can avoid the over- fitting, local minima and help to do the classification. They collect all the baseball team’s batting, pitching and fielding records for the period 1995-2016 to establish the classification model. Finally, they compare between the performance of GA-SVM with SVM and C4.5 methods. The comparison result has higher accuracy of 92.34, than the C4. 5 and traditional SVM. From all of the above, one can summarize the results obtained from previous research, as shown in the table 1. 3 Methodology For our project we’re going to gather each MLB teams end of year statistics of important batting and pitching averages to try and predict if they will make it to the playoffs. We’ll build our model to collect data for the last 20 years up until 2018 and test the results with what the outcome was for 2019’s data. We believe that given enough data almost anything can be calculated or predicted. By doing this research we aim to see how we can take numerous amounts of information and data and create a model that can predict future or recent news/data. We combine this concept (known as Predictive Modeling) with machine learning techniques to better enhance and “streamline” our models [14]. By incorporating machine learning, we hope to be able to create an algorithm that can predict the future performance of baseball teams if given a year. This research is aimed to entice and attract those who see the benefit of combining machine learning concepts with real world data (baseball data). This research is meant to attract sports fans, data analysists, marketers, and etc. This research was motivated on the team’s overall interests in combining baseball data (a sport that is widely watched and globally played) with knowledge of various machine learning techniques (namely, Supervised Learning techniques). With the development of this research, the team’s overall goal is to be able to create a working algorithm that can predict what baseball teams will be able to make it to the 2019 playoffs. We hope to create a model that can achieved more than 50% accuracy in prediction. Our organization in terms of how the research is structured (organized) is that there will be two Supervised Machine Learning techniques that we will incorporate (Logistic Regression and Support Vector Classifier). We will train and test these models using various baseball data and create our models, to improve upon these two models, we will then optimize them using a Grid Search algorithm with the hope that it improves the accuracy of our previous Table 1: Summary of the related works results. Ref. No. year data Technique Accuracy [23] 2016 42 TEAMS Data Mining- SVM 77.1% [24] 2021 30 YEARS Sports Illustrated unable to determine [25] 2016 10 previous years Data mining- SVM 60 % [26] 2021 70 MLB matches 2018 Markov process 34.13 % [27] 2017 GA- SVM GA-SVM 92.34 Multimodal Machine Learning for Major League Baseball... Informatica 46 (2022) 1–7 3 models. The idea of using baseball data to create predictable models is not a new concept. Various groups have proposed and implemented their version of such models. Some groups have used other machine learning concepts such as Random Forests and Gradient Boosting to create their models [12]. However, the most common machine learning algorithm that all of these groups have used are Logistic Regression models. This is because when it comes to baseball (or any other sport), you can only have a team that wins or lose. This makes using a Logistic Regression model the most ideal model to incorporate [5]. We decided to utilize a Logistic Regression model in our research because of this reasoning. We decided to also include the use of a Support Vector Classifier (SVC) because as our research deals with using labeled training data (supervised learning), we want to find the separation in our classes and create a hyperplane. Thus, the use of an SVC gives us the capability of doing both classification and regression. Dealing with a collection of data to draw some conclusion from observed values we’ll use a few classification models. Given one or more inputs the classification models will try to predict the value of one or more outcomes. The models we’ll test are Logistic Regression and Support Vector Classifier (SVC). We chose to focus on these two models, because the concept of our project deals with whether the Atlanta Braves or another team is going to make it to playoffs or the world series and that in itself can be consider a type of classification project. Either a team makes it to playoffs ([1]) or not ([0]). Since you only have two possible outcomes (makes it to playoffs or not), we decided to incorporate a Logistic Regression. We can also utilize an SVC by adding a hyperplane to divide the data. With the predictions being made based on where the point sits relative to the hyperplane. In order to optimize our hyper parameters, we will incorporate a Grid Search (a function within the SciKit- learn library). Grid-searching is the process of scanning the data to configure optimal parameters for a given model [13]. We will use it cross-validate our models and refit it with our training and testing dataset. The algorithm will build a model for each parameter combination possible. It iterates through every parameter and stores a model for each combination. Before we import our models into the Grid Search function, we will begin by defining the hyperparameters for both models. For the logistic regression model, we will first start by creating the regularization for the penalty space. We specify that we want to utilize an “L1” regularization to improve the generalization performance for any new unseen data. We set our regularization hyperparameter space to be around “[0.33, 0.67, 1.0]” (i.e. we want the sequence to start at value ~0.33 ending at value ~0.67, while generating “1” sample for each iteration. We set the random state to “60” for the random number generator. For the “solver” parameters, we will use “’newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, and ‘sage’” as they work better for larger datasets and can handle L2 regularization (see Fig. 1). Similar to the logistic regression model, we will also set the parameters for the support vector classifier as well. We start by specifying the regularization parameter, similar to the logistic regression model, we will use the same values (“[0.33, 0.67, 1.0]”), we set the heuristics to enable “shrinking”, our “probability” to ‘True’ for the model to use probability estimates, along with having the “gamma” parameter set to ‘scale’ and ‘auto’, with a random state set at “60” (see Fig. 2). Once these parameters are set, we will incorporate them into the Grid Search function. We will also utilize a Scaler (MinMaxScaler) library to scale each team down to either they did “really well this [insert year]”, indicated by the scale of “1” or “they did average or not so well”, indicated by a scale of “0”. For this project we will be utilizing several python libraries to analyze the gathered baseball data. We will be using a “Pandas” to store our data as data-frames and “NumPy” to develop our models when used in combination to “SciKit-learn” to develop our regression and SVC computations. To gather data, we used baseball-reference.com (Fig. 3) which is the complete source for current and historical baseball players, teams, scores, and leaders. We took data Figure 1: Hyperparameters for Logistic Regression prior to Grid-Search. Figure 2: Hyperparameters for Support Vector Classifier (svc) prior to Grid-Search. Figure 3: Raw Baseball dataset sorted based on year [Baseball-reference.com]. 4 Informatica 46 (2022) 1–7 A. S. Yaseen et al. over the last 20 years from 1998 - 2018 of influential batting and pitching stat averages per team. The data we will be collecting are as follows: Hitters/Batters: R: Runs BA: Batting Average HR: Homeruns SO: Strikeouts Pitchers: SO: Strikeouts IP: Innings pitched HR: Homeruns ER: Earned runs ERA: Earned runs average The following stats from baseball- reference.com will be imported into two separates excel files, one for pitching and one for batting. For the teams that made it to the playoffs we created a separate excel file from 1998 – 2019 (referenced from Wikipedia). See Fig. 4, 5, 6. After gathering the data, we will combine the pitching and batting datasets into one large dataset. It is during this stage that we will also sort out our playoff dataset. In our excel file for playoffs and world series, we will organize the teams that made it to playoffs from teams that didn’t make it to playoffs between the years of 1998 and 2018, this eliminates the other teams that didn’t make it to playoffs from our dataset. We will further sort out our data by organizing the teams that won the world series for a particular year. In total, for each year every team will be label “1” if they made it to the playoffs and/or world series, and “0” if they didn’t make it to either one or both. We’ll create two set of arrays (playoffs and world_series) that will store the teams that made it to the world series and playoffs. In baseball, a single game usually consists of nine innings, but there are situations where a game can more or less than nine. This difference can skew our dataset. In order to combat this, we will be Standardizing our data so that each data entry is adjusted to the difference in number of innings/games. We will also take the time to scale down our data, to account for each new season (year), in each season a team may do extremely well or extremely average or worse. By scaling each team down to either “1”, the team did above average or “0”, the team did average to below average based on the maximum or minimum of each feature column, we can gauge each team’s performance. We will create a new data-frame for these new scaled datasets before testing. Once our new data-frame is made, we will Playoffs Array World Series Array Batting Data (Fig. 4) Pitching Data (Fig. 5) Playoffs Data (Fig. 6) Training Dataset Testing Dataset Figure 1: Methodology for obtaining the testing and training datasets. Figure 4: Batting Stats for each MLB team from 1998 to 2018, organized into an Excel spreadsheet. Figure 2: Pitching Stats for each MLB team from 1998 to 2018, organized into an Excel spreadsheet. Figure 6: The playoffs data created using information gathered from Wikipedia [12]. Multimodal Machine Learning for Major League Baseball... Informatica 46 (2022) 1–7 5 begin splitting up our data into two sets, training, and testing, with an 80% training and a 20% testing. A simplistic diagram explaining our method is provided below (see Fig. 7). 4 Experimental results To evaluate how successful our models were, we compare the results with different types of metrics. These metrics include the Precision and Recall, where if both the Precision and Recall are high it can be concluded that our output is very accurate, a low Precision or Recall can mean low accuracy or high false negatives within our output. We will also utilize an F1-Score, which is the average of our Precision and Recall, and the score it gives us tells us if our test is accurate or not based on how closed to “1” it is. We will also generate an AUCROC (Area Under Curve Receiving Operating Characteristic) Score, which will help us understand how well our algorithm is able to distinguish between false positives and true positives, based on how close our score is “1” for each model. For this project we will do a total of four tests to gather our conclusion. We will do two tests using the Logistic Regression (Fig. 8) and SVC models (Fig. 9) using the scaled and standardize dataset, and two additional tests, where we will run each of the model through the Grid Search algorithm (Fig. 10, 11). When testing our datasets using an SVC and Logistic Regression Model, we found that there was no difference in ROC AUC score when using either the SVC or the Logistic Regression Models. The same can be said when running both models again into the Grid Search algorithm. For all four tests, we were given a ROC Score of .76, meaning our model was only able to distinguish ~76% of the data as false positives and true positives. Below show the results we’ve obtained for each model, with the first two being where we tested both models without running them through the Grid Search. We can see from the figures above that for each test, of the teams that made it to the playoffs, ~77% of the teams predicted were accurate (Precision), vice versa for predicting the teams that didn’t make it. Of the teams that were predicted correctly (of making playoffs), ~59% of them were correctly identified (recalled). From the results above, we can create an AUCROC graph (fig. 12) that reflects our figures. From the graph, we can see that both SVC and Logistic Regression models have a very similar curve, likewise, they’re also very close to our true positive rate. This means our test model, was for the most part mostly accurate. From this we then ran our predictions (using the Figure 3: ROC AUC Curve. Figure 4: Probability for each Team. Figure 14: Playoff Probability by Team. 6 Informatica 46 (2022) 1–7 A. S. Yaseen et al. SVC model) to see what probability each team has of getting to playoffs (see Fig. 13). From Fig. 13, we can see what the probability of each team getting to the playoffs are. Fig. 14 gives us a better idea of who the algorithm thinks will make into the playoffs (based on Fig 13). In Jordan Bean’s research he provided 48 years of data, from 1969 – 2017 to make MLB playoff predictions for 2018 [7]. Bean measured baseball data using 10+ of pitching and batting stats whereas our model we only measured 8 total. He trained five different classification models including Logistic Regression, Random Forest, KNeighbors Classifier, Support Vector classifier (SVC) and an XGBoost Classifier. For his model prediction, based on his tuned grid search models, the best results came from the XGBoost Classifier, which had a precision of 1.0 meaning that all predictions made the playoffs and a recall of 0.80 which means that the model correctly identified 8 out of the 10 teams that made it to the playoffs. From all of his models the one that performed the best was the SVC model. Using the SVC model he created a data frame and plotted the playoff probabilities by team (Figure 15). Bean’s results correctly predicted 9 out of 10 teams that made it to the playoffs in 2018. The only mistake made was predicting Washington would make it to the playoffs and Colorado would not. 5 Introduction The original goal of this project was to be able to predict which teams would make it to the playoffs, from our testing we can see that teams that had a probability higher than ~.3 were selected by our algorithm. When we take fig. 6 and convert it into a graph (fig. 7) we can see that the top 10 predicted teams are as followed (teams to the left of the “Predictor Divider”). From here, we can see that the algorithm was only able to predict ~60% of the teams correctly (teams in blue did not make playoffs) which is comparable with that results obtained by Soto [25] which used the nearest technique we used. While the other results of [23, 24, 26, 27] get results better than our because of additional optimization techniques used which in turn increase the latency. We find it interesting that for all four of our tests, regardless of which model we used nor if we optimized them or not returned with an exact output for each. This may be caused by very low variation in our dataset. We hypothesized that at least 20 years’ worth of data will give us a close enough result as Jordan Bean’s research due to the amount of games played per season and the many factors that can change a single game. Comparing both results it is clear that using more data would’ve given us a more accurate prediction. References [1] "2019 MLB Team Statistics," 16 March 2020. [Online]. Available: https://www.baseball- reference.com/leagues/MLB/2019.shtml. [Accessed 17 March 2020]. [2] Adams, Mark. “The Man Behind Moneyball: The Billy Beane Story: Domo.” Connecting Your Data, Systems & People, Domo, 24 Feb. 2015, www.domo.com/blog/the-man-behind-moneyball- the-billy-beane-story/. [3] "A Guide to Sabermetric Research," [Online]. Available: https://sabr.org/sabermetrics. [4] Blackburn, Ghoji. “What Is Fantasy Baseball? How Do I Play It?” Fake Teams, Fake Teams, 16 Mar. 2017, Figure 5: Standard Logistic Regression Model. Figure 6: Standard SVC Model. Figure 7: Grid Search Logistic Regression Model. Figure 8: Grid Search SVC Model. Figure 15: Jordan Bean’s Playoff Probability by Team. Multimodal Machine Learning for Major League Baseball... Informatica 46 (2022) 1–7 7 www.faketeams.com/2017/3/16/14942064/what-is- fantasy-baseball. [5] D. Prasetio and D. Harlili, "Predicting football match results with logistic regression," 2016 International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA), George Town, 2016, pp. 1-5. https://doi.org/10.1109/icaicta.2016.7803111 [6] J. Bean, "Modeling MLB's 2018 Playoff Teams," 9 October 2018. [Online]. Available: https://towardsdatascience.com/modeling-mlbs- 2018-playoff-teams-b3c67481edb2. [Accessed 17 March 2020]. [7] J. Bean, "Modeling MLB's 2018 Playoff Teams," 9 October 2018. [Online]. Available: https://towardsdatascience.com/modeling-mlbs- 2018-playoff-teams-b3c67481edb2. [Accessed 17 March 2020]. [8] J. Dutcher, "Book Review: Moneyball: The Art of Winning an Unfair Game," 28 March 2014. [Online]. Available: https://datascience.berkeley.edu/moneyball-book- review/. https://doi.org/10.5860/choice.41-4733 [9] J. Silverman, "How Sabermetrics Works," 21 January 2009. [Online]. Available: https://entertainment.howstuffworks.com/sabermetri cs.htm. [10] K. Fuchs, "Machine Learning: Classification Models," 28 March 2017. [Online]. Available: https://medium.com/fuzz/machine-learning- classification-models-3040f71e2529. [Accessed 17 March 2020]. [11] Lashbrook, Lynn. “Why Baseball Analytics Matters and How You Can Make It into a Career.” , SportsManagementWorldwide, 20 Jan. 2017, www.sportsmanagementworldwide.com/content/wh y-baseball-analytics-matters-and-how-you-can- make-it-career. [12] “List of Major League Baseball Postseason Teams.” Wikipedia, Wikimedia Foundation, 1 Nov. 2019, en.wikipedia.org/wiki/List_of_Major_League_Baseb all_postseason_teams. [13] Lutins, Evan. “Grid Searching in Machine Learning: Quick Explanation and Python Implementation.” Medium, Medium, 5 Sept. 2017, medium.com/@elutins/grid-searching-in-machine- learning-quick-explanation-and-python- implementation-550552200596. [14] “Major League Baseball Team Win Totals.” Baseball, Baseball-Reference, www.baseball- reference.com/leagues/MLB/. [15] Micahmelling@gmail.com. “Using Machine Learning to Predict Baseball Hall of Famers.” Baseball Data Science, 27 Sept. 2017, www.baseballdatascience.com/using-machine- learning-to-predict-baseball-hall-of-famers/. [16] “Moneyball.” Moneyball (2011), IMDb.com, 23 Sept. 2011, www.imdb.com/title/tt1210166/. [17] N. Paine, "The Imperfect Pursuit of a Perfect Baseball Forecast," 27 March 2014. [Online]. Available: https://fivethirtyeight.com/features/the-imperfect- pursuit-of-a-perfect-baseball-forecast/. [18] Pharr, Roger D. “Predicting MLB Game Outcomes with Machine Learning.” Medium, Towards Data Science, 3 Aug. 2019, towardsdatascience.com/predicting-mlb-game- outcomes-with-machine-learning-594eac9484e9. [19] Raschka, Sebastian. “Predictive Modeling, Supervised Machine Learning, and Pattern Classification.” Dr. Sebastian Raschka, 25 Aug. 2014, sebastianraschka.com/Articles/2014_intro_supervise d_learning.html [20] R. Ribeiro, "Houston Astros Strive for Balance Between Quantitative and Qualitative Data Analytics," 3 July 2014. [Online]. Available: https://biztechmagazine.com/article/2014/07/houston -astros-strive-balance-between-quantitative-and- qualitative-data-analytics. [21] S. Banerjee, "Linear Regression: Moneyball - Part 1," 15 April 2018. [Online]. Available: https://towardsdatascience.com/linear-regression- moneyball-part-1-b93b3b9f5b53. [22] S. Banerjee, "towardsdatascience," 1 June 2018. [Online]. Available: https://towardsdatascience.com/linear- regression-moneyball-part-2-175a9dc72e89. [23] Brandon Tolbert, Theodore Trafalis ” Predicting Major League Baseball Championship Winners through Data Mining” 2016. Thens Journal of Sports - Volume 3, Issue 4– Pages 239-252 https://doi.org/10.30958/ajspo.3.4.1 [24] Jones, J.; Johnston, K.; Farah, L.; Baker, J. 2021“Predicting Seasonal Performance in Professional Sport: A 30-Year Analysis of Sports Illustrated Predictions”. Sports 2021, 9, 63. https://doi.org/10.3390/ sports9120163. [25] Soto Valero, C.” Predicting Win-Loss outcomes in MLBRegular season games – A comparative study using data mining methods” 2016, International Journal of Computer Science in Sport Volume 15, Issue 2. https://doi.org/10.1515/ijcss-2016-0007 [26] Chia-Hao Chang, “Construction of a Predictive Model for MLB Matches”, 2021. Forecasting 2021, 3, 102–111. https://doi.org/10.3390/forecast3010007 [27] Ting-Chun Yu and Jui-Chung Hung,” Forecasting MLB Playoff Teams Using GA-SVM”, 2017. IEEE- ICASI 2017. https://doi.org/10.1109/icasi.2017.7988450. [28] Al, Noor M. Al-Moosawi M., and Raidah Salim Khudeyer. "ResNet-34/DR: A Residual Convolutional Neural Network for the Diagnosis of Diabetic Retinopathy." Informatica 45.7 (2021). https://doi.org/10.31449/inf.v45i7.3774. [29] Raheem, Sabreen Fawzi, and Maytham Alabbas. "Dynamic Artificial Bee Colony Algorithm with Hybrid Initialization Method." Informatica 45.6 (2021). https://doi.org/10.31449/inf.v45i6.3652. [30] Saddam, Saba Abdual Wahid. "Wind Sounds Classification Using Different Audio Feature 8 Informatica 46 (2022) 1–7 A. S. Yaseen et al. Extraction Techniques." Informatica 45.7 (2022). https://doi.org/10.31449/inf.v45i7.3739. [31] Ampomah, Ernest Kwame, et al. "Stock Market Prediction with Gaussian Naïve Bayes Machine Learning Algorithm." Informatica 45.2 (2021). https://doi.org/10.31449/inf.v45i2.3407. Multimodal Machine Learning for Major League Baseball... Informatica 46 (2022) 1–7 9