https://doi.org/10.31449/inf.v46i6.3827 Informatica 46 (2022) 21–31 21 
Geo-Spatial Disease Clustering for Public Health Decision Making 
Atta-ur-Rahman
1
, Munir Ahmed
2
, Gohar Zaman
3
, Tahir Iqbal
4
, Muhammed Aftab Alam Khan
5
, Mehwash Farooqui
5
, 
Mohammed Imran Basheer Ahmed
5
, Mohammed Salih Ahmed
5
, Majd Nabeel
1
 and Abdullah Omar
1 
E-mail: aaurrahman@iau.edu.sa 
1
Department of Compute Science (CS), College of Computer Science and Information Technology (CCSIT) 
Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441, Saudi Arabia  
2
Barani Institute of Information Technology (BIIT), PMAS Arid Agriculture University, Rawalpindi, 46000, Pakistan 
3
Faculty of Computer Science and Information Technology 
Universiti Tun Hussein Onn Malaysia (UTHM), Batu Pahat, 86400, Malaysia 
4
Department of Business Administration, College of Business Administration (CBA) 
Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia 
5
Department of Computer Engineering (CE), College of Computer Science and Information Technology (CCSIT) 
Imam Abdulrahman Bin Faisal University (IAU), P.O. Box 1982, Dammam, 31441, Saudi Arabia 
Keywords: geo-spatial mapping, public healthcare, decision making, clustering  
Received: November 15, 2021 
An explosion of interest has been observed in disease mapping with the developments in advanced spatial 
statistics, data visualization and geographic information system (GIS) technologies. This technique is 
known as “Geo-Spatial Disease Clustering,” mainly used for visualization and future disease expansion 
prediction. Its importance has been overwhelmingly observed since the COVID-19 pandemic outbreak. 
Government, Medical Institutes, and other medical practices gather large amounts of data from surveys 
and other sources. This data is in the form of notes, databases, spread sheets and text data files. Mostly 
this information is in the form of feedback from different groups like age group, gender, provider 
(doctors), region, etc. Incorporating such heterogeneous nature of data is quite challenging task. In this 
regard, variety of techniques and algorithms have been proposed in the literature, but their effectiveness 
varies due to data types, volume, format and structure of data and disease of interest. Mostly, the 
techniques are confined to a specific data type. To overcome this issue, in this research, a data 
visualization technique combined with data warehousing and GIS for disease mapping is proposed. This 
includes data cleansing, data fusion, data dimensioning, analysis, visualization, and prediction. 
Motivation behind this research is to create awareness about the disease for the guidance of patients, 
healthcare providers and government bodies. By this, we can extract information that describes the 
association of disease with respect to age, gender, and location. Moreover, the temporal analysis helps 
earlier prediction and identification of disease, to be care of and necessary avoiding arrangements can 
be taken. 
Povzetek: Analizirana je vloga vizualizacije podatkov in rešitev za naloge prostorskega gručenja bolezni, 
tj. ugotavljanje in napovedovanje širjenja bolezni. 
1 Introduction
Data mining and visualization techniques have been 
extensively used by many organizations especially in 
healthcare. It provides very interesting pattern and useful 
health information, based on which many critical 
decisions could be made. While visualizing these patterns 
on the map will be very helpful for the healthcare 
researchers and stakeholders to observe and predict the 
disease spread. The importance and application of such 
studies have been tremendously observed since the 
COVID-19 pandemic outbreak. Several studies are 
evident since late 2019 [1-2] in this regard. This has been 
greatly helping the healthcare professionals and 
government bodies to observe and control the disease 
spread. Almost every infected country in the world has 
similar type of systems and mobile phone applications for 
this purpose. Map base analysis are more interactive and 
usable for the end user. The analysis of the healthcare data 
in different dimensions like time, gender, age group and 
location, make it more useful for sake of decision making. 
Moreover, different stakeholders may be interested in 
looking at a different perspective of same data. For 
example, diabetes could be analyzed and visualize in time, 
gender, or state dimension, to see how disease varied on 
monthly, quarterly, or yearly basis [3-5]. This research is 
intended to provide a diverse healthcare data analysis and 
visualization through maps using data warehouse and 
Tableau (www.tableau.com) to pull out the unique 
patterns and interesting information which could lead to 
the efficient healthcare analysis, successful solutions, and 
predictions for the future of healthcare in USA.  
Rest of the paper is organized as follows. Section 2 
contains the review of literature with a focus on existing 
and related work in the near past. Section 3 thoroughly 
covers the proposed approach. Section 4 is dedicated to 
22 Informatica 46 (2022) 21–31 A.u. Rahman et al. 
results and discussions, while section 5 concludes the 
paper. 
2 Review of literature 
The current era is about application of information and 
communication technologies (ICT) in various fields. 
Healthcare is one the most studied and investigated field 
in this regard [6-8]. This section reviews the related work 
regarding data visualization of diseases related healthcare 
data. Data visualization is an important factor in any field 
and analyzing this data is also very important in decision 
making. After the brief description of the work carried out 
by different researchers in the area, their contributions and 
conclusions have been presented. 
2.1 Spatial and Temporal Dynamics  
Aedes aegypti, the yellow fever mosquito along with their 
other types have been most widely studied in terms of its 
growth rate. Dengue has been imitated a very fast 
spreading mosquito-borne viral disease in the world and is 
now the most prevalent human arbovirus infection with 
roughly one half of world’s population living in various 
countries especially in tropic zones. Reducing mosquito 
vector populations and human-vector interactions are 
currently existing dengue prevention approaches [3-5].  
In [3], authors investigated the geo-spatial 
relationship for dengue virus disease and found interesting 
patterns related to growth in near future and its possible 
causes. The study showed that petrol pumps, workshops, 
rice paddy, marsh/swamp and deciduous forests played a 
significant role in dengue vector growth. In [4], authors 
presented a very useful mapping of Aedes mosquitoes 
breeding habitats in urban and peri-urban areas using a 
fuzzy analytical hierarchical process. The parameters of 
the process were climatic and physical. The dataset was 
comprised of satellite imagery and geospatial data 
collected on the ground. In [5], authors presented a near 
real-time geospatial data analysis and visualization for 
dengue fever and its possible growth and infected areas in 
near future. So that the authorities can take avoidance and 
prevention measures well before time. The results were 
classified based on gender, age-group, population density 
and region. It was concluded that age group (5-24) years 
was more vulnerable, almost 67%; students, workers and 
laborers are more vulnerable in terms of profession, 
almost 88%; in general, males were more vulnerable than 
female, nearly 10% and based on proximity, public places, 
market, and religious places were more prone to be 
infected. In [9], a study is made for the Brazilian oceanic 
island (urban areas only), Fernando de Noronha (located 
at S3 ◦45 to S3 ◦57 and W32 ◦19 to W32 ◦41 and 545 km 
at northeast of Recife City, the capital of Brazil 
(Pernambuco)) which is shown in Figure 1. The 
monitoring system (SMCP-Aedes) describes island areas 
of urban population, the temporal and spatial distribution 
of the dengue vector that is based upon on a 103-trap 
network for Aedes egg samples, by using the analysis 
tools of spatial statistics and GIS. The research was 
implemented in a combined effort between the staff, local 
health managers and the scientific team. This report of 
island characteristics of the infestation by dengue vector 
provides basic information for the analysis of relationships 
between dengue cases, the vector of spatial distribution 
and for the development of integrated the control 
strategies of vector [9]. 
KDE maps in Figure 2, present smooth egg density 
for the same period, as generated for a commonly used 
legend (from left) marking to compare maps from 
different months and the changed legend KDE map (from 
right) for the (similar) month, highlighting portions for 
high egg population. 
2.2 Geographical Information System   
According to [10], dengue and chikungunya zika are the 
diseases mostly were considered as there are certain areas 
where these diseases exist like Latin America and other 
tropical areas, but due to migration and tourism these 
diseases became also endemic even where it did not exist. 
National and international levels institutes are working 
collectively on the academic study of epidemiological of 
infectious diseases information in local and national 
scales. Geographical information systems (GIS) are being 
develop for epidemiological maps which has been used for 
dengue, however not in other developing arboviral 
diseases also not in the Central America. During the study 
period of (2015), in Honduras reported cases were 19,289 
and 85,386 for dengue and chikungunya respectively and 
with median of ranging with 291 to 1789, 726 cases 
reported per week for dengue and chikungunya has 1460 
with ranging 387 to 4175. Dengue cases were reached at 
highest ratio that us 25th week during epidemiological 
while 27th was at whilst for chikungunya. Projected cases 
ratio for dengue and chikungunya in Honduras for year 
2015 by department(s). These maps were generated with 
the help of a Kosmo GIS. Furthermore, national GIS-
based had generated the maps for departments wise and 
municipalities wise, for distribution of the chikungunya 
and dengue. The Microsoft Access® Software was the 
 
Figure 1: The study site Fernando de Noronha [9]. 
 
Figure 2: Spatial distribution of egg density [9]. 
Geo-Spatial Disease Clustering for Public Health Decision Making Informatica 46 (2022) 21–31 23 
platform to design and develop the spatial databases which 
are used for the improvement of rates in frequency by 
municipalities, disease, and departments as well for the 
software of GIS. The “Client GIS” is an open-source 
software was used, named “Kosmo Desktop 3.0RC1®”. 
To generate digital maps of yearly cases rates by 
departments, municipalities and three shapefiles of 
departments which were joined to a data table (database) 
through spatial linked operation used. The map is shown 
in Figure 3. 
2.3 Global maps 
According to [11], nearly “one third” population of the 
human are unprotected against the danger of dengue 
(Figure 4). Every year almost fifty to hundred million 
occurrences are recorded with fever of dengue and 500000 
occurrences of severe dengue like dengue fever 
hemorrhagic requiring hospitalization and from 20,000 to 
25,000 cases of deaths, many of them are children. A 
dengue global risk map has bene generated, based upon 
one database having global amount for disease, Dengue’s 
data predicted has vector species (two main) Aedes 
aegypti and Aedes albopictus and human population 
density as well. Three different sets of dengue fever have 
been used to made almost 100 bootstrap models by 
making sub-sampling of severe dengue like dengue 
hemorrhagic fever, DHF and all-dengue, output results 
were used to make a single global risk map for reach type 
of dengue. It includes predictor variables like Land 
Surface Temperature (weather), thermal data layers 
having both day and night times, population of human 
density and a variation of rainfall. The map for dengue 
global risk presents risk in South America Asia, India, 
Central America, and parts of coastal South America, but 
in generally some areas of Africa. High human population 
density is the key significance for all dengue risk maps 
made here. The risk of dengue from Europe at present is 
measured to be little, but sufficiently ambiguous to 
warrant monitoring in those zones of greatest predicted 
suitability of environmental, particularly in counties of 
northern Greece, portions of Austria, Croatia, Slovenia, 
Bosnia and Herzegovina, southeastern France, Serbia, 
Albania, Montenegro, Germany, Italy, and Switzerland, 
and in smaller areas elsewhere. This map is generated by 
using the data from all dengue databases and included the 
dengue’s vector species of modelled distributions of (two 
vector), Aedes aegypti and Ae. albopictus, as predictor 
data layers (which are not selecting in all models). The 
map represents the probability of suitability, like the 
relationship from each pixel and any of the other clusters 
included in the model. The gray parts are different from 
any of the occurrence or absence clusters that data will has 
no predictions are made for them. 
Approaches identical to above mentioned, have been 
made for other regions of the world and for various 
diseases. For example, in [12-13], authors proposed map-
based data analysis and visualization technique for 
Alzheimer's disease. In [14], authors proposed a neural 
correlation between the psychedelic state and psilocybin 
determined by functional Magnetic Resonance Imaging or 
functional MRI (fMRI) studies. In [15], authors studied 
the geospatial relationship of chikungunya epidemic in 
South India. Similarly, in [16], the aim was to disease 
estimation and visualization for small area using R 
language. 
 
Figure 3: GIS-based map (Geographic Distribution) [10].  
 
Figure 4: Global Risk map (for dengue) [9]. 
24 Informatica 46 (2022) 21–31 A.u. Rahman et al. 
2.4 COVID-19 disease clustering 
Thousands of studies have been conducted around the 
globe since the covid-19 pandemic has emerged [17]. 
These studies can be categorized in various types. For 
example, disease prediction based on the symptoms like 
cough sound, chest x-ray, and other historic patient data 
using variety of techniques [18-21]. Similarly, various 
apps and systems were developed to observe and forecast 
the disease spread on temporal and spatial bases [22]. 
3 Proposed approach 
This section contains the whole description about the 
proposed approach used in this research. Data is taken 
from a Healthcare IT Company. Data preparation and data 
pre-processing (warehouse) are the strategies used to 
generate results and data representations. Data warehouse 
has been used to pull out the information from the system. 
Because it is very complex so for understanding, ER 
diagram is given in Figure 5. We pick the data of year 
2016 for warehouse in this experiment. This system will 
help to create awareness about the disease for the guidance 
for patient and health care stakeholders. By this, we can 
extract data that describes source of disease in terms of 
age, gender, and the state as well for an arbitrary disease. 
3.1 Data warehouse   
An enormous impact has been created by data 
warehousing technology in the business world, with its 
help data turned into information for big competitive 
benefits. Data warehousing in a medical field, have 
traditionally been administrative in nature, focusing on 
patient management and billing, organizational aspects of 
hospitals that were improved by using the data warehouse 
techniques are not much different from the contemporary 
enterprises.  
Technology though changed rapidly, and now more 
difficult areas of medical data management could be 
handled. The information technology maintained fetching 
process of historic medical data analysis, particularly in 
universities and hospitals. In healthcare system, the 
information about patient’s age, gender, location, and 
gender is taken at the time of service. Figure 5 describes 
database schema. The characteristics of the warehouse 
being built for sake of data visualization and analysis are 
enlisted in Table 1. It also includes the description of each 
type. 
3.2 Data warehouse integration process 
All the available data must be stored in a way that it 
consolidated into an information base functional data for 
company. This process is known as ETL (Extract, 
Transform, Load) [23-25] that as certain steps which are 
given below. 
3.2.1 Extraction  
This step refers to data eliminating from its tale and 
making it available for further use. All the required data 
are fetched without disturbing the performance of source 
systems like response time and locking in a negative way. 
Cleaning phase is the first step in the process of ETL, in 
which data quality is confirmed from unification of data. 
The unification rules are making unique identifiers like 
gender, categories, phone numbers and zip codes changes 
into standard format and validation applied on address 
fields converting then into the proper format. 
 
Figure 5: Proposed Database Schema. 
Table 1: Warehouse Characteristics. 
Name  Description 
Subject 
Oriented 
The situation when data/information is 
referring to a specific subject. For 
example, if an origination wants to 
analyze data for marketing department 
only. The Data warehouse devotion for a 
particular subject is the key factor for the 
subject oriented warehousing. 
Integration When data is fetched from different 
sources and stored coherently, by 
identifying differing sets of data into 
standard format. Origination can easily 
resolve their problems, discrepancies 
among units of measurement and will 
produce better output. 
Nonvolatile If data remains unchanged for the new 
developments. Once data stored into 
database, data should not be modified. It 
can be ensured by the data comparison. 
Time 
variant 
When data is stored into the system for 
the specific time and can be modified in 
different time intervals. Using a huge size 
of data and its spread over a long-time 
interval then analysts can divide it into 
different patterns and business 
associations. 
 
Geo-Spatial Disease Clustering for Public Health Decision Making Informatica 46 (2022) 21–31 25 
3.2.2 Transformation  
In this step, we applied several rules to modify data from 
source into same dimensions so it can have same 
measurement of units. This ETL step also includes the 
joining of data from different sources that creates 
aggregates and surrogate keys also the validation process 
and new keys. 
3.2.3 Loading  
In this phase first disable all the constraints, then the 
indexes. Then starts the data loading process and then 
enables both constraints and indexes after the data loading 
is complete. This step normally targets the loading from a 
database. 
3.2.4 Design of dimensional model 
Design of dimensional model needs to meet the 
requirements of industry standards which must have all 
the business needs and covers information that can be 
easily available. Components of the model are given in 
Table 2. 
3.2.5 Data preprocessing   
Data pre-processing is the important step, mainly used in 
identifying the missing values, false data, and repeated 
information from the dataset. We use the data that 
describes disease. The description is given in Table 3. The 
location (state) value is extracted from place of service 
where doctor office is located. For gender single character 
is used. Data is divided into four quarters. It can also be 
viewed month wise. 
3.3 Tools 
In this research, two tools are used, SQL is used for the 
data warehouse and Tableau is used for data 
representation and visualization. Tableau is the leading 
visualization tool in market and is being used by several 
well reputed companies. It has so many visualization types 
and quite interactive interface.  It also has a lot of 
functionalities to perform advanced filtrations and 
advance aggregations etc. 
Figure 6 shows the Tableau home screen pertaining to 
the proposed system. On the left column, there are the 
navigation controls, in the middle column recently opened 
workbooks and sample workbooks can be accessed while 
in the third and last column pattern discovery options are 
enlisted. 
In Figure 7, Tableau reader for the proposed 
warehouse can be seen where patient diagnosis code 
Table 2: Components of Dimensional Model. 
Name  Description Value 
Dimension Dimensions is the 
major component of 
design comprised of 
the individual keys 
and non-overlapping 
keys. The main 
purposes of 
dimensions are 
filtering, grouping, 
and labeling the 
dataset. Dimension 
tables can contain 
textual descriptions. 
Next column shows 
the dimensions. 
Dim_Bill_Date, 
Dim_Entry_Dat
e, Dim_Practice, 
Dim_Claim, 
Dim_Payments, 
Dim_Charge_C
ode, 
Dim_Location, 
Dim_Providers, 
Dim_Charges, 
Dim_Patient 
and 
Dim_Submissio
n 
Fact Table Fact table data has 
measures or 
dependent attributes. 
Here the fact table is 
providing statistics 
for financial data 
broken by patient, 
claims, charges, and 
locations 
dimensions etc. Fact 
table usually 
contains historical 
data from 
operational system, 
it mainly has foreign 
key values which 
have many 
dimensions and 
numeric measure 
values on which the 
aggregation can be 
performed. The 
attributes in 
proposed Fact Table 
named 
FACT_FINANCIA
LS, are given in next 
column.  
Foreign Key 
Column:  
practice_id, 
provider_id, 
patient_id, 
submission_id, 
claim_id, 
aging_id, 
payments_id, 
charges_id, 
bill_date_id, 
doe_id, 
entry_date_id, 
location_id. 
Measure:  
Patient Account. 
 
Table 3: Data Description. 
Name  Description Value (Type) 
Location  The location (state) 
value is extracted 
from place of service 
where doctor office 
is located. 
String (Var 
char) 
Gender Gender of the Patient 
exists in database as 
male/female. In 
dataset that is used 
for analysis purpose 
M/F values are there. 
Char [M/F] 
Month Month wise data is 
dumped in data 
warehouse.  
Number 
Quarters Data is divided in 
four equal quarters of 
a year. 
Number  
 
26 Informatica 46 (2022) 21–31 A.u. Rahman et al. 
analysis is presented. Here the main screen shows the 
outcome on the map while the right-side column contains 
the control of attribute values that can be changed to view 
the data on the map. For example, the controls given are 
years in terms of date of service, gender, age range, 
quarters and month containing date of service. 
Explanation of the controls is given in the Figure 8 (a-
e). 
 
Figure 6: Tableau Home Screen. 
 
Figure 7: Main Screen. 
 
Figure 8 (a): Year selection. 
 
Figure 8  (b): Year selection. 
 
Figure 8 (c): Year selection. 
 
Figure 8 (d): Months of selected year. 
Geo-Spatial Disease Clustering for Public Health Decision Making Informatica 46 (2022) 21–31 27 
4 Results and discussion 
In this section, results derived from the HealthCare IT 
Company datasets for year 2016 are presented. The data is 
taken from the designed data warehouse. The dataset 
contains the data for 2016 (the year of service) of patients 
for all the USA states. Following sections are dedicated to 
various analysis and visualization performed on various 
dimensions of the data warehouse. 
4.1 Location based 
Location has been extracted from the state (address) string 
of patient. Where patient live and not where they take 
service or the states of the service provider 
(Doctor/Clinic/Hospital). This is important because 
disease mapping/clustering is with respect to the location 
of patient. The results in Figure 9 depict that the patients 
(Male and Female) from Kansas (KS) State have been 
reported for the diagnostic (Dx) code (M79.1) which 
corresponds to a disease called Myalgia which is about 
muscular pain. It did include the time. Moreover, it is 
based on whole provided data of all quarters with age limit 
10 to 40 years. 
4.2 Temporal  
Time is one of the most important factors in USA Medical 
Health Sector. It is used to analyze various aspects for 
revenue generation for insurances and billing companies. 
Specially the first three months are the key to such 
analyses [26]. The results shown in Figure 10 (a & b) are 
from the two different states Iowa (IA) and New 
Hampshire (NH), respectively. The visual statistics 
illustrate that disease rate is much lower in these states 
based on the dataset. In IA, the patients are reported for 
Dx code I73.9, that corresponds to peripheral vascular 
disease (PVD). Similarly, in NH, the patients are reported 
for Dx code G47.33, that corresponds to Obstructive sleep 
apnea disease were reported during the given time range. 
4.3 Age based 
Below results are patients above then age 65. In USA 
mostly, patient can get Governmental insurances 
(Medicare and Medicaid) for treatment of various 
diseases. Because Government funds the patients above 
65 years. The results given in Figure 11 (a & b) are for 
states California (CA) and New York (NY) for the patient 
with age more than 65 respectively. The results in Figure 
11a, show that during year 2016, there were 10686 
patients were reported for Top Dx code B35.1 that 
corresponds to Tinea Unguium disease which is most 
common fungus infection of the nails. That concludes that 
 
Figure 8 (e): Data In table.  
 
Figure 9: Location based analysis. 
Table 4: Comparison. 
Parameter [5]   [9] [10] Proposed  
Disease Dengue 
fever 
Dengue 
larval 
growth  
Dengue and 
Chikungunya 
All types 
of 
diseases 
with 
registered 
US 
diagnostic 
code (Dx 
Code) 
Region 
(Fixed or 
Multiple) 
Fixed:  
City 
Muang of 
Phitsanulok 
Province, 
Thailand 
Fixed: 
Fernando 
de 
Noronha, 
Brazilian 
oceanic 
island 
Honduras 
Multiple 
states but not 
all states of 
US 
Multiple 
All states 
of US 
Gender 
based 
analysis 
Yes No No Yes 
Time based 
analysis 
No  Yes Yes Yes 
Age based 
analysis 
Yes No No  Yes 
Location 
based 
analysis 
Yes No Yes Yes 
 
28 Informatica 46 (2022) 21–31 A.u. Rahman et al. 
patient above 65 years are prone to this disease in 
California. Similarly, according to Figure 11b, 4979 
patients were reported for Top Dx code I10 that 
corresponds to hypertension disease. That concludes that 
patient above 65 years are prone to this disease in New 
York [27-30]. 
4.4 Monthly analysis 
There are two types to report the insurance i.e., electronic 
and paper. USA government prefers to its doctors to use 
electronic way for fast processing. Figure 12 shows that in 
Utah (UT) state, the top Dx code for the month of July was 
R41.844 that corresponds to Frontal lobe and executive 
function deficit disease [31-40]. 
4.5 Comparison 
In this section, the proposed scheme has been compared 
with similar techniques in the literature qualitatively. The 
selection of the technique is based on the data, map, 
mapping, and visualization type. Table 4 shows the said 
comparison. From comparison is it apparent that proposed 
scheme has two major advantages over the other schemes. 
First one is complete range of diseases not just one type 
[41-46]. Second advantage is that the data can be analyzed 
and visualized for many dimensions like gender, location, 
time, and age-group. Moreover, it covers all the states of 
US while the schemes in [5] and [9] are for a specific city 
or zone. The scheme in [10], though provides analysis for 
several states of US, just works for Dengue and 
Chikungunya diseases analyses only for time and location 
based. Analysis for age-group and gender is mentioned as 
their future work. 
The analysis made above, can be beneficial for 
healthcare stakeholders and government bodies for better 
decision making. In the current covid-19 pandemic 
outbreak, the proposed scheme can be beneficial in variety 
of ways. The spread can be visualized and observed in 
spatial and temporal perspectives. Further benefits may be 
obtained like: 
• The patients may be guided based on their 
disease to refer to specialist doctors based on map 
history. 
• Based on the analysis, advertisement for 
awareness on location of patients for a particular 
disease can be done.  
 
Figure 10 (a): Time based analysis (IA State). 
 
Figure 10 (b): Time based analysis (NH State). 
 
Figure 11: Age based analysis (CA State). 
 
Figure 11: Age based analysis (NY State). 
 
Figure 12: Month based analysis. 
Geo-Spatial Disease Clustering for Public Health Decision Making Informatica 46 (2022) 21–31 29 
• Based on temporal analysis disease growth rate 
can be monitored and remedies like vaccinations 
etc. can be initiated. 
Based on gender, age group, state etc., alerts may be 
sent to public to take precautionary measures before they 
enter in that age group etc. 
5 Conclusion 
In this paper, Geospatial disease clustering has been 
proposed and designed which focuses on data 
visualization of healthcare domain. Map visualization is 
done by using data warehouse and Tableau tool. Data is 
collected and prepared from a Health Care IT Company. 
Datasets of different patients, states, genders, locations, 
doctors, clinics, and hospitals are transformed into data 
warehouse for service of year 2016 records consisting of 
different tables that contain all information related disease 
diagnostic code (Dx Codes). Data is transformed into 
desired shape for visualization and analysis purpose. The 
system has been developed which reads data of services, 
according to desired dimensions (Gender, Age, Month and 
Quarters) from dataset. Different results have been created 
which display association between different factors that 
can be used for decision making by the medical 
authorities. There are many dimensions that show disease 
trend. For decision making, in future, further analyses on 
these trends can be conducted. Data from other sources 
may also be gathered and data mining techniques may be 
investigated for further analysis. For example, to 
incorporate the covid-19 analysis, the appropriate 
datasets, databases, and other forms of data can be 
incorporated into the designed data warehouse easily and 
the analyses can be obtained on the fly. 
References  
[1] Khalid Farooq, Rai; Ashiq, Murtaza; Siddique, 
Nadeem; Rehman, Shafiq Ur; Adil, Hafiz 
Muhammad; and Ajmal Khan, Muhammad, "A 
Bibliometric Review of Highly Cited and Hot Papers 
on Coronavirus and COVID 19" (2021). Library 
Philosophy and Practice (e-journal). 5238. 
https://digitalcommons.unl.edu/libphilprac/5238.  
[2] Shueb, S., Gul, S., Nisa, N.T., Shabir, T., Ur 
Rehman, S. and Hussain, A. (2021), "Measuring the 
funding landscape of COVID-19 research", Library 
Hi Tech, Vol. ahead-of-print No. ahead-of-print.  
https://doi.org/10.1108/LHT-04-2021-0136.  
[3] Sarfraz, M.S., Tripathi, N.K., Tipdecho, T., 
Thongbu, T., Kerdthong, P. & Souris, M. (2012) 
Analyzing the spatio-temporal relationship between 
dengue vector larval density and land-use using 
factor analysis and spatial ring mapping. BMC 
Public Health 2012, 12:853 
[4] Sarfraz, M.S., Tripathi, N.K., Faruque, F.S., Bajwa, 
U.I., Kitamoto, A. & Souris, M. (2014) Mapping 
urban and peri-urban breeding habitats of Aedes 
mosquitoes using a fuzzy analytical hierarchical 
process based on climatic and physical parameters. 
Geospatial Health 8(3), 2014, pp. S685-S697. 
[5] Sarfraz, M.S., Tripathi, N.K. & Kitamoto, A. (2014) 
Near real-time Characterization of urban 
environments: a holistic approach for monitoring 
dengue fever risk areas, International Journal of 
Digital Earth, 7:11, 916-934. 
[6] A. Rahman, M.H. Salam, S. Jamil (2013) Virtual 
Clinic: A Telemedicine Proposal for Remote Areas 
of Pakistan, Conference: 3rd World Congress on 
Information and Communication Technologies, 
Vietnam. 
[7] A. Rahman, A. Bakry, K. Sultan, M.A.A. Khan, M. 
Farooqui, D. Musleh (2018) Clinical Decision 
Support System in Virtual Clinic, Journal of 
Computational and Theoretical Nanoscience 
15(6):1795-1804. 
[8] A. Rahman, J. Alhiyafi (2018) Health Level Seven 
Generic Web Interface, Journal of Computational 
and Theoretical Nanoscience 15(4),  
DOI: 10.1166/jctn.2018.7302. 
[9] Regis, L. N., Acioli, R. V., Silveira Jr, J. C., de Melo-
Santos, M. A. V., da Cunha, M. C. S., Souza, F., ... 
& Monteiro, A. M. V. (2014). Characterization of the 
spatial and temporal dynamics of the dengue vector 
population established in urban areas of Fernando de 
Noronha, a Brazilian island. Acta tropica, 137, 80-
87.  
[10] Zambrano, L. I., Sierra, M., Lara, B., Rodríguez-
Núñez, I., Medina, M. T., Lozada-Riascos, C. O., & 
Rodríguez-Morales, A. J. (2017). Estimating and 
mapping the incidence of dengue and chikungunya 
in Honduras during 2015 using Geographic 
Information Systems (GIS). Journal of infection and 
public health, 10(4), 446-456. 
[11] Rogers, D. J., Suk, J. E., & Semenza, J. C. (2014). 
Using global maps to predict the risk of dengue in 
Europe. Acta tropica, 129, 1-14. 
[12] Sanz-Arigita, E. J., Schoonheim, M. M., 
Damoiseaux, J. S., Rombouts, S. A., Maris, E., 
Barkhof, F., ... & Stam, C. J. (2010). Loss of ‘small-
world ‘networks in Alzheimer's disease: graph 
analysis of FMRI resting-state functional 
connectivity. PloS one, 5(11), e13788. 
[13] Keihaninejad, S., Ryan, N. S., Malone, I. B., Modat, 
M., Cash, D., Ridgway, G. R., ... & Ourselin, S. 
(2012). The importance of group-wise registration in 
tract based spatial statistics study of 
neurodegeneration: a simulation study in 
Alzheimer's disease. PloS one, 7(11), e45996.  
[14] Carhart-Harris, R. L., Erritzoe, D., Williams, T., 
Stone, J. M., Reed, L. J., Colasanti, A., ... & Hobden, 
P. (2012). Neural correlates of the psychedelic state 
as determined by fMRI studies with 
30 Informatica 46 (2022) 21–31 A.u. Rahman et al. 
psilocybin. Proceedings of the National Academy of 
Sciences, 109(6), 2138-2143.  
[15] Talawar, A. S., & Pujar, H. S. (2010). An outbreak 
of chikungunya epidemic in South India-
Karnataka. International Journal of Research and 
Reviews in Applied Sciences, 5(3), 229-34. 
[16] Moraga, P. (2018) Small Area Disease Risk 
Estimation and Visualization Using R. The R Journal 
Vol. 10(1), pp. 495-506. July. 
[17] R. A. Naqvi, M. F. Mushtaq, N. A. Mian, M. A. 
Khan, A. Rahman et al., “Coronavirus: a “mild” 
virus turned deadly infection,” Computers, Materials 
& Continua, vol. 67, no.2, pp. 2631–2646, 2021. 
[18] M. I. B. Ahmed, A. Rahman, M. Farooqui, F. 
Alamoudi, R. Baageel, A. Alqarni, “Early 
Identification of COVID-19 Using Dynamic Fuzzy 
Rule Based System,” Mathematical Modelling of 
Engineering Problems, vol. 8, no. 5, pp. 805-812, 
2021. 
[19] K. S. Alqudaihi, N. Aslam, I. U. Khan, A. M. 
Almuhaideb, S. J. Alsunaidi et al., “Cough sound 
detection and diagnosis using artificial intelligence 
techniques: challenges and opportunities,” IEEE 
Access, vol. 9, pp. 102327-102344, 2021. 
[20] R. Zagrouba, M. A. Khan, A. Rahman, M. A. 
Saleem, M. F. Mushtaq et al., “Modelling and 
simulation of covid-19 outbreak prediction using 
supervised machine learning,” Computers, Materials 
& Continua, vol. 66, no.3, pp. 2397–2407, 2021. 
[21] A. Rahman, K. Sultan, I. Naseer, R. Majeed, D. 
Musleh et.al., “Supervised Machine Learning-based 
Prediction of COVID-19,” Computers, Materials & 
Continua, vol. 69, no.1, pp. 21-34, 2021.  
[22] N. Min-Allah, B. A. Alahmed, E. M. Albreek, L. S. 
Alghamdi, D. A. Alawad et al., “A survey of 
COVID-19 contact-tracing apps,” Computers in 
Biology and Medicine, vol. 137, p. 104787, 2021. 
[23] A. Rahman, F.A. Alhaidari, “Querying RDF Data”, 
Journal of Theoretical and Applied Information 
Technology 26(22):7599-7614, 2018. 
[24] A. Rahman, F.A. Alhaidari, “The Digital Library and 
the Archiving System for Educational Institutes”, 
Pakistan Journal of Information Management and 
Libraries (PJIM&L), vol. 20 (1), pp. 94-117, 2019. 
[25] M. Ahmad, M.A. Qadir, A. Rahman et al., 
“Enhanced query processing over semantic cache for 
cloud based relational databases.” J Ambient Intell 
Human Comput (2020). 
https://doi.org/10.1007/s12652-020-01943-x. 
www.appliedmedicalsystems.com 
[26] A. Rahman et. al (2019) A Comprehensive Study of 
Mobile Computing in Telemedicine: Second 
International Conference, ICAICR 2018, CCIS, pp. 
413-425, Shimla, India. 
[27] N. Aldhafferi, A. Alqahtani, A. Rahman, M. Azam 
(2018) Constraint Based Rule Mining in Patient 
Claim Data. Journal of Computational and 
Theoretical Nanoscience 15(3):1064-1071. 
[28] A. Rahman, Kiran Sultan, Dhiaa Musleh, Nahier 
Aldhafferi, Abdullah Alqahtani, and Maqsood 
Mahmud, “Robust and Fragile Medical Image 
Watermarking: A Joint Venture of Coding and 
Chaos Theories,” Journal of Healthcare Engineering, 
vol. 2018, Article ID 8137436, 11 pages, 2018.  
[29] A. Rahman, M. Mahmud, K. Sultan, N. Aldhafferi, 
D. Musleh (2018) Medical Image Watermarking for 
Fragility and Robustness: A Chaos, Error Correcting 
Codes and Redundant Residue Number System 
Based Approach. Journal of Medical Imaging and 
Health Informatics 8(1):1192-1200. 
[30] A. Rahman, Kiran Sultan, Nahier Aldhafferi, 
Abdullah Alqahtani, and Maqsood Mahmud (2018) 
“Reversible and Fragile Watermarking for Medical 
Images,” Computational and Mathematical Methods 
in Medicine, vol. 2018, Article ID 3461382, 7 pages. 
https://doi.org/10.1155/2018/3461382. 
[31] A. Rahman, A. Bakry, K. Sultan, M.A.A. Khan, M. 
Farooqui, D. Musleh, “Clinical Decision Support 
System in Virtual Clinic”, Journal of Computational 
and Theoretical Nanoscience, 15(6):1795-1804, 
2018. 
[32] M.T. Naseem, I.M. Qureshi, A. Rahman, M.Z. 
Muzaffar, “Robust and fragile watermarking for 
medical images using redundant residue number 
system and chaos,” Neural Network World, vol. 30, 
no. 3, pp. 177-192, 2020. 
[33] A. Rahman, S. Dash, & A.K. Luhach, “Dynamic 
MODCOD and power allocation in DVB-S2: a 
hybrid intelligent approach.” Telecommun Syst, vol. 
76, pp. 49–61, 2021. https://doi.org/10.1007/s11235-
020-00700-x. 
[34] A. Rahman, “GRBF-NN based ambient aware 
realtime adaptive communication in DVB-S2.” J 
Ambient Intell Human Comput (2020). 
https://doi.org/10.1007/s12652-020-02174-w. 
[35] I.A. Najm, J.M. Dahr, A.K. Hamoud, A.S. Alasady 
et al., “OLAP Mining with Educational Data Mart to 
Predict Students’ Performance,” Informatica 46 
(2022): 11–19. 
[36] A. Rahman, S. Abbas, M. Gollapalli, R. Ahmed, S. 
Aftab et al., “Rainfall Prediction System Using 
Machine Learning Fusion for Smart Cities,” Sensors, 
vol. 22, no. 9, pp. 1-15, 2022.  
https://doi.org/10.3390/s22093504. 
[37] N. M. Ibrahim, D. G. I. Gabr, A. Rahman, S. Dash, 
A. Nayyar, “A deep learning approach to intelligent 
fruit identification and family classification,” 
Multimedia Tools and Applications, 2022.  
Geo-Spatial Disease Clustering for Public Health Decision Making Informatica 46 (2022) 21–31 31 
https://doi.org/10.1007/s11042-022-12942-9. 
[38] T. M. Ghazal, H. AlHamadi, M.U. Nasir, A. 
Rahman, M. Gollapalli, M. Zubair, M.A. Khan, C.Y. 
Yeun, "Supervised Machine Learning Empowered 
Multifactorial Genetic Inheritance Disorder 
Prediction", Computational Intelligence and 
Neuroscience, vol. 2022, Article ID 1051388, 10 
pages, 2022. https://doi.org/10.1155/2022/1051388. 
[39] M Gollapalli, A. Rahman, D. Musleh, N. Ibrahim et 
al., “A Neuro-Fuzzy Approach to Road Traffic 
Congestion Prediction,” Computers, Materials and 
Continua, vol. 72, no. 3, pp. 295-310, 2022. 
[40] A. Rahman, A. Alqahtani, N. Aldhafferi, M.U. Nasir, 
M.F. Khan, M.A. Khan, and A. Mosavi. 2022. 
"Histopathologic Oral Cancer Prediction Using Oral 
Squamous Cell Carcinoma Biopsy Empowered with 
Transfer Learning" Sensors 22, no. 10: 3833. 
https://doi.org/10.3390/s22103833. 
[41] A. Rahman, S. Abbas, M. Gollapalli, R. Ahmed, S. 
Aftab, M. Ahmad, M.A. Khan, and A. Mosavi. 2022. 
"Rainfall Prediction System Using Machine 
Learning Fusion for Smart Cities" Sensors 22, no. 9: 
3504. https://doi.org/10.3390/s22093504. 
[42] G. Zaman, H. Mahdin, K. Hussain, A. Rahman, J. 
Abawajy and S. A. Mostafa, “An Ontological 
Framework for Information Extraction from Diverse 
Scientific Sources,” IEEE Access, vol. 9, pp. 42111-
42124, 2021. doi: 10.1109/ACCESS.2021.3063181. 
[43] N.A. Sajid, M. Ahmad, M.T. Afzal, A. Rahman, 
“Exploiting Papers’ Reference’s Section for Multi-
Label Computer Science Research Papers’ 
Classification,” Journal of Information & 
Knowledge Management, vol. 20 (2), pp. 1-21, 2021. 
[44] A. Rahman, S. Dash, A.K. Luhach, N. Chilamkurti, 
S. Baek, Y. Nam, “A Neuro-Fuzzy Approach for 
User Behavior Classification and Prediction”, 
Journal of Cloud Computing, 8(17), 2019. 
[45] A. Rahman, F.A. Alhaidari, D. Musleh, M. Mahmud, 
M.A. Khan, “Synchronization of Virtual Databases: 
A Case of Smartphone Contacts”, J. Comput. Theor. 
Nanosci., vol. 16 (3), pp. 1740-1757, 2019. 
 
  
32 Informatica 46 (2022) 21–31 A.u. Rahman et al.