https://doi.org/10.31449/inf.v46i1.2934 Informatica 46 (2022) 77–86 77 
Personalized Health Framework for Visually Impaired 
Megha Rathi, Shruti Sahu, Ankit Goel and Pramit Gupta 
E-mail: megha.rathi@jiit.ac.in, shrutisahu1196@gmail.com, goelankit1995@gmail.com, pramitgupta22@gmail.com 
Department of Computer Science & IT, Jaypee Institute of Information Technology, Noida, India 
Keywords: android application, computer vision, deep learning, object recognition, region-based convolutional neural 
networks, disease prediction, voice assistant 
Received: August 23, 2019 
Vision is one of the most essential human sense. The life of a visually impaired person can be 
transformed from a dependent individual to a productive and functional member of the society with the 
help of modern assistive technologies that use the concepts of deep learning and computer vision, the 
science that aims to mimic and automate human vision to provide a similar, if not better, capability to a 
computer. However, the different solutions and technologies available today have limited outreach and 
end users cannot fully realize their benefits. This research work discusses an easily-operable and 
affordable android application designed to aid the visually impaired in healthcare management. It also 
aims to resolve the challenges faced due to visual impairment in daily life and uses the concepts of 
computer vision and deep learning. Broadly, the application consists of the following modules: object 
recognition in immediate surroundings using region-based convolutional neural networks, disease 
prediction with the help of symptoms, monitoring of health issues and voice assistant for in-app 
interaction and navigation. 
Povzetek: Razvita je androidna aplikacija za pomoč ljudem z okvarami vida. 
 
1 Introduction 
In 2014, the World Health Organization estimated 285 
million people to be visually impaired worldwide, out of 
which 39 million are blind and 246 have low vision 
[1].About 90% of this population lives in low-income 
settings. Visually impaired people face several 
difficulties in their daily lives. From reading to 
navigation, be it in familiar or unfamiliar environments, 
every task is a new challenge [2]. Computer Vision is the 
science that aims to mimic and automate human vision 
and provide a similar, if not better, capability to a 
machine or computer [3]. With the combination of 
machine learning and computer vision, various 
technologies have been developed to help substitute for 
visual impairment in some manner or the other and 
enable people to live more independently. However, the 
problem of lack of accessible and affordable solutions to 
ease the routine of the visually impaired still persists. 
Expensive wearable’s, usually powered by artificial 
intelligence, are the current advancements in computer 
vision. Aiming towards affordability, a simpler approach 
is adapted. Android platform is used to develop an 
application and make it accessible in hand-held mobile 
android devices which are essential in developing an aid 
for the visually impaired [4].  
The objective of the proposed application 
significantly in facilitating an independent & healthy life 
include easy operability i.e. voice assistant for in health 
routine, object recognition in images captured via device 
camera, diagnosis of common diseases via symptoms 
spoken by the user. The modern digital era has 
revolutionized data storage. Huge volumes of data are 
available today that can be beneficially utilized for 
processing and automation an essential feature of health 
management is incorporated in the android application, 
along with health monitoring measures like body mass 
index (BMR), calorie intake, and daily steps. Before the 
breakthrough of deep learning in 2000s, PASCAL and 
conventional computer vision techniques like example-
based learning, discriminatively trained part were used 
for object recognition  structure as early deep learning 
based algorithms (for instance, R relevant regions, 
labeling each proposed region and cumulating outcomes 
from all output for the image [5] Figure 1 illustrates the 
classification error in top 5 models of the object 
recognition & image classification task of ImageNet 
Large Scale Visual Recognition [6]. It can be observed 
that deep learning even surpassing human error in 2015. 
A deep learning Region-based Convolutional Networks 
(R ‘Methodology’) is used to detect and identify objects 
from the images captured by the android device [7]. 
Figure 1. Classification Error of top 5 models of 
ILSVRC application is to provide an affordable solution 
significantly in facilitating an independent & healthy 
lifestyle for the visually-impaired voice assistant for in-
app interaction and navigation, management of daily 
health routine, object recognition in images captured via 
device camera, diagnosis of common diseases The 
modern digital era has revolutionized data storage. Huge 
volumes of data are available today that can be 
beneficially utilized for processing and automation [8]. 
Automation of disease prediction is an essential feature 
of health management is incorporated in the android 
application, along with health mass index (BMI), body 
fat percentage (BFP) and basal metabolic rate intake, and 
daily steps. Before the breakthrough of deep learning in 
78 Informatica 46 (2022) 77–86 M. Rathi et al. 
2000s, PASCAL and conventional computer vision based 
learning, discriminatively trained part-based models and 
selective search [9]. Various algorithms shared similar 
structure as early deep learning based algorithms (for 
instance, R-CNN) i.e. identification and proposal of each 
proposed region and cumulating outcomes from all 
regions to produce. Illustrates the classification error in 
top 5 models of the object recognition & image 
classification Large Scale Visual Recognition 
Competition (ILSVRC) between 2010 and 2016 . It can 
be observed that deep learning-based models fare much 
better than others, even surpassing human error in 2015. 
Salient features navigation, management of daily health 
routine, object recognition in images captured via device 
camera, diagnosis of common diseases the modern 
digital era has revolutionized data storage. Huge volumes 
of data are available today that can. Automation of 
disease prediction is an essential feature of health 
management is incorporated in the android application, 
along with health and basal metabolic rate before the 
breakthrough of deep learning in 2000s, PASCAL and 
conventional computer vision based models and selective 
search. Various algorithms shared similar CNN) i.e. 
identification and proposal of regions to produce 
illustrates the classification error in top 5 models of the 
object recognition & image classification even 2010 and 
2016 based models fare much better than others, 
(elaborated under ‘Methodology’) is used to detect and 
identify objects from the images captured by the android 
device . 
2 Background study 
Significant research has been carried out in the domain of 
developing android based apps for visually impaired 
patients. In recent research, authors have presented a 
novel technique for visually impaired user assistance 
using guidance mode activity [10]. A proposed system is 
comprised of gathering sensors data held or worn by a 
visually impaired user, these sensors collect data and 
further redirect it to the server for processing. The server 
contains a processor which is built using advanced 
artificial intelligence techniques. Basis functionality of 
the server is to send data extracted from sensors to an 
agent device, this device further gives data for vision on 
an agent interface. Finally, an agent can assist the low 
vision user in real-time navigation through audio signals 
or other feedback. 
In another research, a system named “Smartvision” 
is developed for the navigational activity of blind users. 
With the recent development in advanced computational 
technologies, one can create models which can assist 
blind users in their daily routine tasks. The main 
emphasis of the proposed model is to assist no vision 
users so that they can easily navigate into strange 
environments indoor or outdoor [11]. A user-friendly 
interface is developed for the development of 
“Smartvision” app for blind users. The proposed model 
utilizes the concepts of computer vision and machine 
learning to achieve the objective. 
In yet another novel research work, an android 
healthcare app known as “mHealth” is developed to 
assist blind users in their health tasks. Android-based 
technologies are gaining popularity in healthcare these 
days and could serve as a boon for visually impaired 
patients [12]. Smartphones allow extracting medical data 
from health sensors and then using the extracted data for 
further health analysis of the patient. Visually impaired 
people have bad health conditions than normal people 
because of poor accessibility of medical data and if a 
compatible app is present they can regularly monitor 
their health status and take preventive action accordingly. 
“mHealth” device is proven to be a boon as it captures 
patient health condition via sensors i.e. it can monitor 
blood pressure, diabetes level, etc. and further suggest 
medical action accordingly. A smartphone is the simplest 
way to access health sensors, and android developers can 
create IoT-based android applications to make medical 
sensors fully accessible on mobile devices. 
A study conducted in another significant paper 
developed a novel technique for door detection for the 
strange environment to no vision user [13]. Earlier 
algorithms were designed to detect door-like objects to 
known environments only where particular 
characteristics of the door are fed as an input. In this 
study, the author presented a novel image-based door 
detection technique whose main objective is to find 
outdoors based on consistent characteristics like edges 
and corners. Animated door structure is created for 
finding out the doors by merging features of edges and 
corners. This algorithm is also able to distinguish other 
door-like objects from the door like it can distinguish 
bookshelf or cabinet from the door. The proposed 
technique is validated under different unknown situations 
over multiple ranges of door shapes, colors, textures, 
illumination, and views. 
Another contribution presented recent advances in 
advanced homecare technologies for blind people. 
Assistive systems are required for blind users to provide 
information and allow them to safely move and complete 
their daily routine and health tasks and explore unknown 
environment [14]. For achieving this objective various 
new IOT based and other computational technologies 
have been experimented with to provide the solution to 
basic major problems of a blind user. For blind people, 
accessibility and self-determination are elementary 
 
Figure 1: Classification error of top 5 models of 
ILSVRC. 
Personalized Health Framework for Visually Impaired Informatica 46 (2022) 77–86 79 
requirements so it is desirable to provide them insight 
regarding skills to lead a happy normal life like other 
human creatures. For blind self-determination and 
accessibility means they can seek employment, get a 
good education, get normal health routines, and social 
life. So, this study focuses on providing details about 
various smart home care solutions for blind users.  
An Integrated mobile-based healthcare framework is 
used these days by medical professionals for healthcare 
tasks [15]. The usage of smartphones is expanding day 
by day and in the future, it incorporates every single 
clinical task.  An easy-to-use interface makes healthcare 
apps be used by even illiterate persons. Effective 
utilization of advanced computational techniques, proper 
verification, and validation are substantially required to 
ensure a good standard of quality, security, and privacy 
for using these mobile-based healthcare applications. 
With the execution of all such quality standards in 
mobile-based healthcare tools, the main emphasis is on 
providing correct, relevant, and appropriate information 
to the user for achieving the objective of healthcare 
outcomes. 
Within this paper extra emphasis has been put on 
enhancing the assistive technologies used for visually 
impaired people [16]. Assistive technology is used by 
many researchers to assist blind users but academicians 
are not paying attention to deriving new applications by 
amalgamating assistive technologies with computational 
intelligence for creating hybrid applications to assist the 
blind user. The socio-psychological feature is the main 
obstacle in the adoption of assistive technology for 
further research. Finding and pointing out those features 
can enhance the adoption of assistive technology by 
academicians. Visually impaired patients are heavily 
relying on these technologies for survival, this research 
focuses on finding out the socio-psychological feature 
that impacts assistive technology. 
Another significant contribution provided in the 
study, in which authors generate particle libraries for 
drug discovery using recurrent neural networks [17]. The 
recurrent neural network can be proven to be very 
effective for creating systems for molecular structures, 
features of molecular structures coordinate positively 
with the features used to train the model. The proposed 
work proposed a model with a tiny molecular dataset that 
is dynamic in opposition to the target. The proposed 
model can outline the method for producing a huge set of 
molecules for discovering drugs. 
Research conducted the study over a survey of 
Wearable Obstacle Avoidance Electronic Travel Aids for 
Blind [18]. Numerous wearable devices are developed to 
assist blind people in navigation in a known and strange 
environment. Broadly these navigational devices are 
classified into three categories: automatic inclination aid, 
electronic progression aid, and posture location tools. 
This research work presents the survey of portable 
navigational devices for visually impaired patients. This 
can help researchers in gaining insights to present 
assistive technologies for blind people and further 
amendments in these technologies.  
Recent developments in computer vision include 
expensive wearables like Aira, MyEye, eSight, and 
BrainPort. Inspired by Google Glasses, Aira, developed 
is a pair of spectacles fitted with a camera that transfers 
the current field of view to a visually-abled person i.e. 
provides visual interpreting services [19]. Unlike Aira, 
artificial intelligence is used in MyEye, launched by 
which interprets visual data from a small camera into an 
audio earpiece. eSight, designed by Conrad [20] for the 
partially blind, uses a high-resolution camera to enlarge 
images and project them on an OLED screen in front of 
the wearer’s eyes. 
Object recognition and image processing are major 
functionality of computer vision systems. Deep neural 
networks, an arrangement following deep learning are 
networks with multiple hidden layers. However, as the 
depth of the network grows and it begins to converge, 
accuracy becomes constant and then reduces rapidly. A 
residual learning framework, ResNet, to easily train 
substantially deeper neural networks was presented by 
He et.al. [21]. Consider two neural network layers x and 
y. Instead of direct mapping y=F(x), ResNet adapts a 
residual function F(x) to map y=F(x)+x. The logic is that 
in the case of identical layers y=x, it is simpler to obtain 
F(x)=0 than F(x)=x. Hence, layers are framed by learning 
residual functions. 
The author proposed a combination of convolutional 
neural networks with region proposals, called Region-
based Convolutional Networks (R-CNN) [22]. The 
model inputs an image proposes bounding boxes or 
region proposals using selective search and checks if 
each proposal is an object or not. A Support Vector 
Machine (SVM) is used for the classification of region 
proposals or bounding boxes which are tightened using 
linear regression. The architecture has significantly better 
results than earlier CNN-based architectures on datasets 
like ImageNet. 
Proposed by Girshick et.al. [23], Fast R-CNN is a 
simplified version of R-CNN. It clubbed all the 
components of R-CNN into a single network by adding a 
softmax layer and a linear regression layer parallel to the 
output layer of CNN. Softmax acts as a classifier in place 
of SVM and linear regression tightens the bounding 
boxes. In the same year, Faster R-CNN was proposed by 
Ren et.al. [24] to accelerate the region proposal process. 
The model trained a single CNN to implement region 
proposals and classification by adding a fully 
convolutional network, named Regional Proposal 
Network (RPN) on top of CNN. Anchor boxes are some 
common aspect ratios that are randomly generated to fit 
objects in an image. RPN slides a window over the CNN 
feature map outputs a bounding box per anchor and 
probability if it contains an object. These boxes are then 
passed to Fast R-CNN for classification. 
A feature pyramid is a fundamental component in 
recognition systems for detecting objects at different 
scales which is troublesome for tiny objects. The 
proposed work [25] used the framework of convolutional 
neural networks to construct feature pyramids. This 
architecture, called a Feature Pyramid Network (FPN), 
comprising of a bottom-up (downscale) and a top-down 
80 Informatica 46 (2022) 77–86 M. Rathi et al. 
(upscale) pathway. The bottom-up pathway uses ResNet 
for construction. Upsampling in the top-down pathway is 
done by convolutional filters. FPN shows significant 
improvement as a feature extractor in several 
applications. 
In another recent study, for object detection deep 
convolutional neural networks (CNNs) are trained as N-
way classifiers [26]. HMod Fast R_CNN is implemented 
for upgrading the overall computational power of R-
CNN. In yet another novel work in the domain of disease 
prediction, authors have implemented machine learning 
techniques for the prediction of Dengue [27]. From the 
results it is concluded that LogitBoost ensemble 
technique is the most accurate one with accuracy equals 
to 92%. In the work [28] analysis on diabetes data is 
performed for the comparison of various machine 
learning techniques. In another significant study in the 
domain of disease prediction, authors are forecasting the 
casual effect association between Coronary Obstructive 
Pulmonary Disease and Cardiovascular Diseases [29]. 
3 System architecture 
Visual impairment may be the result of an injury, 
disease, or some other condition. Visually impaired face 
problems in self-navigating outside known environments. 
They need to memorize details about their home 
environment. Furniture and other large obstacles must 
remain in one location to prevent injury. Individuals with 
low vision may find browsing websites problematic due 
to small fonts etc. They might also need to enlarge a 
screen significantly. Operating gadgets like mobile 
phones and tablets is also a challenge. The life of a 
visually impaired person can be transformed from a 
dependent individual to a productive and functional 
member of society who can read and write, use mobile 
phones, computers, and other gadgets efficiently with the 
help of modern assistive technologies. 
Today, different solutions and technologies are 
present which have the potential to bring substantial 
change and improvement in the lives of people with 
visual impairment, especially the aging population. 
However, they have limited outreach, and end-users 
cannot fully realize their benefits. They must acquire 
essential awareness and know-how of using these 
technologies, as well as have the resources for obtaining 
them. Moreover, amidst abundant health management 
applications available today, applications catering to the 
visually impaired section of society are absent. 
So, the need of the hour is to design affordable and 
accessible solutions to ease and significantly improve the 
daily routine and health management of the visually 
impaired. The main objective is to develop an application 
that would assist significantly in facilitating an 
independent & healthy lifestyle for the visually impaired. 
• Object Recognition - A real-time object recognition 
module via the device camera is proposed. Objects 
will be identified and communicated to the user 
when they are in the camera’s field of vision. 
Applying the concepts of computer vision, a deep 
learning-based architecture (Feature Pyramid 
Network with region Proposal Network) is used to 
identify and classify objects from the incoming 
stream of images. 
• Disease Prediction - The user will be able to tell their 
symptoms in case he/she feels unwell and the same 
Year 
R-CNN FastR-CNN Faster R-
CNN 
Year of 
Conception 
2014 2015 2016 
Input Image Image with 
region 
proposals 
Image 
(region 
proposals 
not 
needed) 
Output Bounding 
boxes and 
labels for 
each object 
in the 
image. 
Object 
classification 
of each 
region with 
more 
constricted 
bounding 
boxes. 
Classificat
ions and 
bounding 
box 
coordinate
s of 
objects in 
the images 
Components CNN 
(feature 
extractor), 
SVM 
(classifier), 
linear 
regressor 
(tighten 
bounding 
boxes). 
Single CNN 
(feature 
extractor) 
having 
softmax 
layer 
(classifier) 
with linear 
regression 
layer (output 
bounding 
boxes) in 
parallel. 
CNN 
(feature 
extractor), 
RPN 
(output 
bounding 
boxes per 
anchor and 
probability
), Fast R-
CNN 
(classifier, 
tighten 
bounding 
boxes) 
Pooling Max 
Pooling 
RoI (Region 
of Interest) 
Pooling 
-- 
Region 
Proposal 
Selective 
Search 
Selective 
Search 
Region 
Proposal 
Network 
Table 1: Comparison of existing approaches for object 
recognition. 
 
Figure 2: Control flow of the proposed android 
application. 
Personalized Health Framework for Visually Impaired Informatica 46 (2022) 77–86 81 
will be checked against the data of various diseases. 
A graph database is implemented on Neo4j. The 
voice input of the user is queried and matched 
against the database to return one or multiple 
symptoms.  
• Monitoring of Daily Health Routine - Daily health 
routine is monitored by calculating steps are taken, 
calorie intake, BMI, body fat percentage, and basal 
metabolic rate. Calorie intake for various food items 
is available to plan diet accordingly. 
• Voice Assistant - For in-app interaction and 
navigation, a voice assistant is available. Google 
Speech-to-Text API is used to develop a text-to-
speech module to communicate to the user and a 
speech-to-text module with natural language 
processing to process the user’s input. 
Figure 2 illustrates the flow of control of the android 
application. Initially, the user can select from the three 
modules - Object Recognition, Disease Prediction, and 
Health Monitor. Object Recognition is followed by 
capturing an image, which is sent to the local server 
hosting the deep learning model and then the identified 
objects are returned to the application and then to the 
user in the voice format. Disease Prediction is followed 
by taking voice input for symptoms, which are converted 
into text, sent to the local server for prediction of 
possible diseases, and returned to the user. A health 
monitor is comprised of keeping fitness records and the 
nutrition values of the food intake by the user.  
4 Methodology 
The main objective of the proposed health-based 
framework is to develop an android based app for 
visually impaired patients. The main modules of the 
proposed healthcare management tool for visually 
impaired patients are: 1) Object Detection, 2) Disease 
Prediction, 3) Real-time Monitoring of health Issues and 
4) Voice assistant for In-App communication and 
Navigation. All these modules along with Dataset 
description are discussed in detail in this section. 
4.1 Dataset description 
A disease-symptom dataset was gathered from 
“WebMD” [30] and “MedicineNet”[31] . WebMD, 
founded in 1996, is one of the most-visited healthcare 
websites and contains data about various diseases, their 
corresponding symptoms, drug information, etc. 
MedicineNet is another site owned and operated by the 
WebMD consumer network. US Board-certified 
physicians and healthcare professionals maintain up-to-
date information regarding diseases, symptoms, drugs, 
and remedies to the general masses in an easily 
understandable language. A graph database is created 
from the crawled data and the graph contains 149 disease 
nodes, 404 symptoms nodes, and 2126 “may cause” 
relationships i.e. un weighted directed edges from 
symptoms to corresponding disease. Also, the model was 
trained till 33 epochs on the Microsoft Common Objects 
in Context dataset [32] which contains 330 K images, 1.5 
million object instances and 91 object categories. 
4.2 Disease prediction 
The user will be able to tell their symptoms in case 
he/she feels unwell and same will be checked against the 
data of various diseases. A graph database is 
implemented in Neo4j.Neo4j is a graph database 
management system that contains nodes with directed 
edges or relationships between them. These nodes and 
edges can have labels and any number of attributes. 
Labels can be used to narrow searches. Querying is done 
using Cypher Query Language. 
Neo4j is scalable and supports replication. It also 
supports ACID (Atomicity, Consistency, Isolation, and 
Durability) rules. Atomicity means the database 
considers all transaction operations as one whole unit or 
atom. Consistency guarantees that a transaction never 
leaves your database in a half-finished state. Isolation 
keeps transactions separated from each other until they're 
finished. Durability guarantees that the database will 
keep track of pending changes in such a way that the 
server can recover from an abnormal termination. 
A graph database on Neo4j was implemented on the 
dataset to eliminate sparsity. The graph contains 149 
disease nodes, 404 symptom nodes, and 2126 “(: 
Symptom)-[: MAY_CAUSE]->(: Disease)” relationships 
i.e. unweighted directed edges from symptoms to the 
corresponding disease. Weighted relationships between 
symptoms and diseases could not be obtained due to the 
unavailability of proprietary datasets. 
The Neo4j database was linked with Python using 
Py2neo driver and queried using Cypher Query 
Language (CQL). Symptoms spoken by the user were 
matched with corresponding diseases and returned. The 
Python code was hosted on a Flask server to send 
information back and forth between the android 
application and the database. 
4.3 Object recognition 
The proposed model consists of a 51-layer Residual 
Network (ResNet-51) as the backbone for the Feature 
Pyramid Network (FPN). All 4 convolutional blocks of 
the Residual Network are used as a base for FPN. It has 
multiple prediction and upsampling layers. It has a lateral 
connection between the bottom-up pyramid and the top-
down pyramid. It applies a 1x1 convolution layer before 
adding each layer. A 3x3 Conv layer is then applied and 
the result is used as a feature map by upper layers. 
The pyramid of convolutional activation maps 
generated by FPN is passed to Region Proposal Network 
(RPN), eliminating the bottleneck of hardcoded 
algorithms like EdgeBoxes to get region proposals. RPN 
works as a first pass on the image and makes the binary 
decision if a region contains an object or not. It also 
outputs the confidence it has over the proposal. 
The proposed regions are sent to the RoI Align layer 
which maps the proposed regions in the image to the 
convolutional features maps of FPN. RoI Align layer, 
unlike RoI Pooling layers, does not quantize the input 
82 Informatica 46 (2022) 77–86 M. Rathi et al. 
space by not rounding off to floor value i.e. it uses a/16 
operation instead of [a/16] operation. This fixes the 
location misalignment issue. 
The feature maps of candidate regions are then used 
by classification, bounding box, and mask predicting 
heads. Time distributed Keras layer is used to pass every 
feature map of the FPN to the heads. The entire heads 
share the first 2 fully connected layers which are 
implemented as convolutional layers; the first one has a 
dropout of 0.5 for regularization. The second 
convolutional layer has average pooling across the depth 
of the activation map making it function as fully 
connected layers. 
Classification head: contains a fully connected layer 
that outputs logits for the proposal belonging to every 
class or background class. The number of outputs is 
equal to a number of classes + 1 (background class). The 
softmax layer is used to calculate unnormalized log 
probabilities of the input belonging to every class and 
background class. 
Bounding Box Regressor Head:  Outputs bounding 
box deltas over the proposed bounding box by RPN. It 
consists of fully connected layers. 
Mask Predicting head: The mask branch generates a 
mask of dimension 24 x 24 for each class for each 
proposed region. The total output is of size (number of 
classes + 1) * number of proposals. As the model tries to 
learn a mask for each class, masks are generated for 
every class. A mask is just a 24 x 24 binary grid which 
signifies the presence and absence of object instance for 
that pixel. It is implemented as 4 convolutional layers 
with batch normalization and ReLU activation followed 
by a fractionally stridden convolutional layer that 
upsamples the input. That is fed to another convolutional 
layer which has sigmoid activation to squash the input to 
-1 to 1. Loss is calculated using the mask associated with 
the ground truth class.  
Monitoring of Daily Health Routine: Daily health 
routine is monitored by calculating steps are taken, 
calorie intake, BMI, body fat percentage, and basal 
metabolic rate. Calorie intake for various food items is 
available to plan diet accordingly. The Body Mass Index 
(BMI) is calculated using the height and weight of a 
person. It is defined as the body mass divided by the 
square of the body height and is universally expressed in 
units of kg/m2, resulting from mass in kilograms and 
height in meters. The value of BMI is used to categorize 
individuals as underweight, normal weight, or 
overweight. The body fat percentage (BFP) of an 
individual is the total mass of body fat divided by total 
body mass, multiplied by 100; body fat includes essential 
body fat and storage body fat. The body fat percentage is 
a measure of fitness level since it is the only body 
measurement that directly calculates a person's relative 
body composition without regard to height or weight. 
Metabolism comprises the processes that the body needs 
to function. Basal metabolic rate (BMR) is the amount of 
energy per unit of time that a person needs to keep the 
body functioning at rest.  
 
Figure 4: Relationships of “hypertensive disease” 
disease node and “pain chest” symptom node. 
 
Figure 5: Neo4j graph database containing diseases, 
symptoms and their relationships. 
 
Figure 6: Feature pyramid network. 
 
Figure 3: Architecture of object recognition model. 
Personalized Health Framework for Visually Impaired Informatica 46 (2022) 77–86 83 
Voice Assistant: For in-app interaction and 
navigation, a voice assistant is available. The voice 
command that is input by the user in the English 
language is converted into an expression using Google 
Speech-to-Text API. WordNet is a large lexical database 
of the English language. Natural Language Toolkit is 
used to implement Part-of-Speech tagging on the 
expression. The similarities of the expression with the 
WordNet database are compared and the resulting 
command is executed or results of the operation are 
conveyed to the user using Google Text-to-Speech API.  
5 Results and findings 
The object recognition model was trained to 33 epochs 
with 1000 steps each epoch and 50 validation steps on 
the Microsoft Common Objects in Context dataset which 
contains 330K images, 1.5 million object instances, and 
91 object categories. Mean image subtraction was done 
to center align data. The system used for training was 
expanded by an NVidia 1050 Ti OC Graphic Card for 
higher processing capability. 
Object Recognition Model Hyper parameters 
• No of epochs trained = 33 
• Steps per epoch = 1000 
• Validation steps = 50 
• Threshold ratio of positive to negative RoI for 
training = 33 
• Minimum probability value to accept a detected 
instance = 0.7 
• Non-Maximum Suppression threshold = 0.3s 
• Optimiser: Stochastic Gradient Descent with 
Momentum 
o Learning Rate = 0.002 
o Learning Momentum = 0.9 
o Weight Decay Regularization Strength = 0.0001 
Mean Average Precision (MAP) is the standard 
single-number performance measure for comparing 
search algorithms. The MAP (mean Average Precision) 
score for the proposed model (ResNet-51-FPN) was 
51.8%. Generation of feature maps in the proposed 
model using Feature Pyramid Networks is evaluated 
against existing models. 
• Multi-task Network Cascades (MNC) [Dai, J., et al. 
2015] consists of three networks, respectively 
differentiating instances, estimating masks, and 
categorizing objects. These networks form a 
cascaded structure and are designed to share their 
convolutional features. 
• Fully Convolutional Instance-aware Semantic 
Segmentation (FCIS)[Li, Y., et al. 2016] proposed 
semantic segmentation and instance mask proposal. 
It detects and segments the object instances jointly 
and simultaneously. 
The use of Feature Pyramid Networks to generate 
feature maps in the model fared better than existing 
models like MNC and FCIS+OHEM but worse than 
FCIS+++OHEM due to the lesser number of layers in the 
Residual Network backbone of the architecture. 
5.1 Comparative Analysis between ANN 
and CNN 
In our experiments, ANN of 4 hidden layers of 20 
neurons each with Batch Normalization gives 93% 
accurate results. Adding dropout technique in the model 
reduces the efficiency as it works for network which can 
afford to lose neurons. On the other hand, CNN even 
without over fitting countermeasures achieves accuracy 
of 98%, and with Batch Normalization and Dropout 
applied it gives accuracy approximately equals to 99.3%. 
CNN have repetitive blocks of neurons that are applied 
across space. At the training time, the weight gradients 
 
Figure 7: Results of object recognition model. 
 
Figure 8: Performance comparison of proposed model 
with other object recognition models using mean 
average precision (mAP) score. 
 
Figure 9: ANN-accuracy and cross entropy loss. 
84 Informatica 46 (2022) 77–86 M. Rathi et al. 
learned over various image patches are averaged which 
exploits the spatial or temporal invariance in object 
recognition.  
For object recognition, the use of Feature Pyramid 
Network (FPN) provide valid proposals for most of the 
objects in the image while without FPN fails to provide 
valid proposals and misses out on a majority of objects. 
The model which has been trained till 40 epochs 
accurately captures even the small ground truth objects. 
That is why, FPN is used to generate feature maps for 
our recognition model. The test accuracy and test losses 
are visualized in the figures shown below. 
5.2 Android interface 
An Android application was developed for visually 
impaired with the following modules object recognition, 
 
Figure 11: ANN + batch normalization-accuracy and 
cross entropy loss. 
 
Figure 12: ANN + batch normalization+ dropout- 
accuracy and cross entropy loss. 
 
Figure 13: ANN- accuracy and cross entropy loss. 
 
Figure 14: CNN + batch normalization- accuracy and 
cross entropy loss. 
 
Figure 15: CNN + batch normalization+ dropout-
accuracy and cross entropy loss. 
  
 
Figure 10: voice automated android application - (a) 
real-time object recognition (b)disease prediction 
when symptoms is dizziness (c) disease prediction 
when symptoms are dizziness and fall (d) no of steps 
taken daily (e) calculation of BMI (f) calculatio 
Personalized Health Framework for Visually Impaired Informatica 46 (2022) 77–86 85 
disease prediction, real time health monitoring, and voice 
assistant. Figure 10 (a)-(f) presents the screenshots of 
developed application interface. 
6 Conclusion 
The proposed application can open a new door to the 
possibilities of creating affordable and accessible 
solutions for improving the lifestyle and healthcare 
management of the visually impaired. The life of a 
visually impaired person can be transformed from a 
dependent individual to a productive and functional 
member of the society who is able to use mobile phones 
and other gadgets efficiently, detect objects around him 
and track his health with the help of such assistive 
technologies. A similar approach can also be extended to 
people suffering from other types of impairment. 
The android application was successfully 
implemented with the following modules: object 
recognition, disease prediction, health monitor, and voice 
assistant. For object recognition, the use of Feature 
Pyramid Networks to generate feature maps in the model 
fared better than existing models like MNC and 
FCIS+OHEM but worse than FCIS+++OHEM due to the 
lesser number of layers in the Residual Network 
backbone. The graph database implementation of the 
disease-symptoms dataset gave the required results. 
However, graph algorithms could not be efficiently 
applied due to a lack of weighted edges i.e. probability 
that a symptom will cause a disease. This was due to the 
unavailability of appropriate non-proprietary datasets for 
general diseases. 
7 Future research directions 
The authors aim to improve the performance of the 
object recognition model by implementing aggregated 
residual transformations i.e. ResNeXt instead of ResNet 
for the backbone of Feature Pyramid Network. This 
model can also be extended for real-time depth 
realization to calculate the distance of the objects 
recognized. They also intend to enhance the disease 
prediction model by obtaining a more suitable weighted 
disease-symptom dataset, applying various graph 
algorithms, and adding drug suggestions. Lastly, they 
propose to upgrade the solution for healthcare 
management for the visually impaired from a voice-
automated android application to a voice-automated 
hand-held device and add features like automatically 
alerting selected contacts in case of emergency. 
References 
[1] Jonas et al., "Visual Impairment and Blindness Due 
to Macular Diseases Globally: A Systematic Review 
and Meta-Analysis", American Journal of 
Ophthalmology, vol. 158, no. 4, pp. 808-815, 2014. 
DOI: 10.1016/j.ajo.2014.06.012. 
[2] A. Gordois et al., “An estimation of the worldwide 
economic and health burden of visual impairment,” 
Glob. Public Health, vol. 7, no. 5, pp. 465–481, 
2012. DOI: 10.1080/17441692.2011.634815 
[3] Bradski, G. R. “Computer vision face tracking for 
use in a perceptual user interface”,1998. 
[4] R. Velázquez, “Wearable Assistive Devices for the 
Blind,” in Wearable and Autonomous Biomedical 
Devices and Systems for Smart Environment, 
Springer, 2010, pp. 331–349. 
[5] [A. Krizhevsky, I. Sutskever, and G. E. Hinton, 
“ImageNet Classification with Deep Convolutional 
Neural Networks,” Advances in Neural Information 
Processing Systems, vol. 25, 2012, [Online].DOI : 
https://proceedings.neurips.cc/paper/2012/hash/c399
862d3b9d6b76c8436e924a68c45b-Abstract.html. 
[6] O. Russakovsky et al., “ImageNet Large Scale 
Visual Recognition Challenge,” International Journal 
of Computer Vision, vol. 115, no. 3, pp. 211–252, 
Apr. 2015, doi: 10.1007/s11263-015-0816-y. 
[7] C. Szegedy, A. Toshev, and D. Erhan, “Deep Neural 
Networks for Object Detection,” Neural Information 
Processing Systems, 2013.  
https://proceedings.neurips.cc/paper/2013/hash/f7ca
de80b7cc92b991cf4d2806d6bd78-Abstract.html 
(accessed Mar. 11, 2022). 
[8] S.A.S, “Intelligent Heart Disease Prediction System 
Using Data Mining Techniques,” nternational 
Journal of healthcare & biomedical Research, 
Volume: 1, Issue: 3, April 2013, pp. 94-101.   
DOI: https://ijhbr.com/pdf/94-101.pdf.  
[9] P. F. Felzenszwalb, R. B. Girshick, D. McAllester 
and D. Ramanan, "Object Detection with 
Discriminatively Trained Part-Based Models," in 
IEEE Transactions on Pattern Analysis and Machine 
Intelligence, vol. 32, no. 9, pp. 1627-1645, Sept. 
2010, doi: 10.1109/TPAMI.2009.167.  
[10] Kanuganti, S., Chang, Y., & Bock, L. U.S. Patent 
No. 9,836,996. Washington, DC: U.S. Patent and 
Trademark Office, 2017. 
[11] H. Fernandes, P. Costa, V. Filipe, L. Hadjileontiadis 
and J. Barroso, "Stereo vision in blind navigation 
assistance,"World Automation Congress, 2010, pp. 
1-6. 
[12] Milne LR, Bennett CL, Ladner RE. “The 
accessibility of mobile health sensors for blind 
users”. In International Technology and Persons with 
Disabilities Conference Scientific/Research 
Proceedings (CSUN 2014), Dec 2014  ,pp. 166-175. 
[13] Y. Tian, X. Yang, and A. Arditi, “Computer Vision-
Based Door Detection for Accessibility of 
Unfamiliar Environments to Blind Persons,” Lecture 
Notes in Computer Science, pp. 263–270, 2010, doi: 
10.1007/978-3-642-14100-3_39.  
[14] B. Ando, C. O. Lombardo, and V. Marletta, “Smart 
homecare technologies for the visually impaired: 
86 Informatica 46 (2022) 77–86 M. Rathi et al. 
recent advances,” Smart Homecare Technology and 
TeleHealth, p. 9, Dec. 2014,   
doi:10.2147/shtt.s56167. 
[15] Y. Ren, R. Werner, N. Pazzi and A. Boukerche, 
"Monitoring patients via a secure and mobile 
healthcare system," in IEEE Wireless  
Communications, vol. 17, no. 1, pp. 59-65, February 
2010, doi: 10.1109/MWC.2010.5416351.  
[16] Sachdeva N, Suomi R. Assistive technology for 
totally blind− barriers to adoption. SOURCE IRIS: 
Selected Papers of the Information Systems 
Research Semina. 2013;47. 
[17] M. H. S. Segler, T. Kogej, C. Tyrchan, and M. P. 
Waller, “Generating Focused Molecule Libraries for 
Drug Discovery with Recurrent Neural Networks,” 
ACS Central Science, vol. 4, no. 1, pp. 120–131, 
Dec. 2017, doi: 10.1021/acscentsci.7b00512. 
[18] D. Dakopoulos and N. G. Bourbakis, "Wearable 
Obstacle Avoidance Electronic Travel Aids for 
Blind: A Survey," in IEEE Transactions on Systems, 
Man, and Cybernetics, Part C (Applications and 
Reviews), vol. 40, no. 1, pp. 25-35, Jan. 2010, doi: 
10.1109/TSMCC.2009.2021255. 
[19] Wexler, Y., &Shashua, A.. U.S. Patent No.  
9,025,016. Washington, DC: U.S. Patent and  
Trademark Office,2015. 
[20] Lewis, C. W., Mathers, D. R., Hilkes, R. G., 
Munger, R. J., & Colbeck, R. P.. U.S. Patent No. 
8,135,227. Washington, DC: U.S. Patent and 
Trademark Office,2012. 
[21] He K, Zhang X, Ren S, Sun J.” Deep residual 
learning for image recognition”. In Proceedings of 
the IEEE conference on computer vision and pattern 
recognition, 2016, pp. 770-778).  
[22] Girshick R. Fast r-cnn. InProceedings of the IEEE 
international conference on computer vision, 2015, 
pp. 1440-1448. 
[23] R. Girshick, J. Donahue, T. Darrell and J. Malik, 
"Region-Based Convolutional Networks for  
Accurate Object Detection and Segmentation," in 
IEEE Transactions on Pattern Analysis and Machine 
Intelligence, vol. 38, no. 1, pp. 142-158, 1 Jan. 2016, 
doi: 10.1109/TPAMI.2015.2437384. 
[24] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-
CNN: Towards Real-Time Object Detection with 
Region Proposal Networks,” Neural Information 
Processing Systems, 2015.  
[25] T.-Y. Lin et al., “Microsoft COCO: Common 
Objects in Context,” Computer Vision – ECCV 
2014, pp. 740–755, 2014, doi: 10.1007/978-3-319-
10602-1_48. 
[26] A. Chaudhuri, “Hierarchical Modified Fast R-CNN 
for Object Detection,” Informatica, vol. 45, no. 7, 
Dec. 2021, doi: 10.31449/inf.v45i7.3732. 
[27] N. Iqbal and M. Islam, “Machine learning for 
dengue outbreak prediction: A performance 
evaluation of different prominent classifiers,” 
Informatica, vol. 43, no. 3, Sep. 2019, doi: 
10.31449/inf.v43i3.1548. 
[28] A. A. Abaker and F. A. Saeed, “A Comparative 
Analysis of Machine Learning Algorithms to Build a 
Predictive Model for Detecting Diabetes  
Complications,” Informatica, vol. 45, no. 1, Mar. 
2021, doi: 10.31449/inf.v45i1.3111. 
[29] D. Panda, S. R. Dash, R. Ray, and S. Parida, 
“Predicting the Causal Effect Relationship Between 
COPD and Cardio Vascular Diseases,” Informatica, 
vol. 44, no. 4, Dec. 2020,   
doi:10.31449/inf.v44i4.3088. 
[30] WebMD, L. L. C. (2010). WebMD. 
[31] Shiel Jr, W. C. (2009). MedicineNet. com. 
[32] http://www.har-dataset.org/