Elektrotehniški vestnik 86(1-2): 68-74, 2019 Original scientific paper Non-invasive Blood-Glucose Estimation Using Smartphone PPG Signals and Subspace KNN Classifier Yuwei Zhang1, Yuan Zhang1'"!", Sarah Ali Siddiqui1, Anton Kos2 1 Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China 2Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia J( E-mail: yzhang@ujn.edu.cn Abstract. Non-invasive healthcare monitoring systems based on machine learning and wearable sensors hold the future of smart health. Ubiquitous use of smartphones makes them an excellent choice for designing or developing cost-effective and portable smart health monitoring systems. A non-invasive blood-glucose estimation system is proposed that utilizes a smartphone camera for data acquisition and generates an output using a machine-learning algorithm. The focus of the system is to (1) acquire PPG signals using a smartphone, (2) classify valid and invalid signals, (3) estimate the blood-glucose levels from the valid signals by applying a subspace KNN classifier. The system requires no re-calibration or individually dependent calibration. Its overall training accuracy is 86.2% and the accuracy of the invalid single-period classification is 98.2%. Keywords: Non-invasive blood-glucose estimation, Smartphone, Photoplethysmography (PPG), Healthcare based on machine learning, Subspace KNN classifier Neinvazivna metoda ocenjevanja ravni krvnega sladkorja s pomočjo PPG na pametnem telefonu in podprostorske KNN Prihodnost zdravstvenih sistemov je v neinvazivnih sistemih za spremljanje zdravja na osnovi nosljivih senzorjev in strojnega učenja. Vseprisotna uporaba pametnih telefonov je odlična priloznost za načrtovanje in razvoj prenosnih in stroškovno učinkovitih pametnih sistemov za spremljanje zdravja. Predlagamo neinvazivno metodo ocenjevanja ravni krvnega sladkorja, ki za pridobivanje podatkov uporablja kamero pametnega telefona in podaja rezultate na osnovi strojnega učenja. Poudarek članka je na: (1) pridobivanju fotopletizmografičnih (PPG) signalov s pametnim telefonom, (2) razvrščanju signalov v veljavne in neveljavne, (3) očeno ravni krvnega sladkorja z uporabo podprostorskega klasifika-torja KNN. Predlagani sistem ne zahteva ponovne kali-bračije ali kalibračije v odvisnosti od posameznika. Njegova skupna natančnost dosega 86,2%, natančnosti klasifikačije veljavnosti period signalov pa 98,2%. 1 Introduction Diabetes is one of the most widely spread čhronič illnesses/diseases čaused by imbalanče in glučose čon-čentration in the body. This instability čan lead to serious problems, e.g. čardiovasčular diseases, kidney failure, Received 14 December 2018 Accepted 21 Januar 2019 blindness, etc. [1]. It is expected that by 2030, diabetes will use up to 11.6% of the total expenses made in the healthcare domain [2]. Hyperglycemia is a condition in which the glucose level is higher than 180 mg/dl, whereas hypoglycemia is a condition caused by a very low glucose level, i.e. lower than 70 mg/dl [3], [4]. There is no cure for diabetes so far but monitoring under the glucose level regularly helps in keeping diabetes in control [5], [6], [7]. Self monitoring is one of the most feasible and helpful/useful solutions to control diabetes. As explained in our recent work [8], glucose monitoring can be grouped into invasive, minimally invasive and non-invasive. Conventional glucose monitoring requires a blood sample by pricking the fingertip of the patient with a needle/lancet making frequent monitoring inconvenient, painful, uncomfortable and costly for the users [9], [10]. Non-invasive glucose monitoring is the focus of the current and future research since it is pain-free, risk-free, convenient and comfortable for users [11]. Artificial intelligence and expert systems are being used in order to make such monitoring systems accurate and efficient [12], [13]. Photoplethysmographic (PPG) signals are widely used physiological signals for basic vitals monitoring [14]. Light is illuminated on a certain part of the body. A part of that light is absorbed by the body and the other part is reflected back. The amount of the reflected light varies with the varying amount of blood flowing through that body part and can be used to acquire the PPG signals NON-INVASIVE BLOOD GLUCOSE ESTIMATION USING SMARTPHONE PPG AND SUBSPACE KNN 69 [15]. Conventionally, the PPG signals are obtained by using wearable pulse oximeters on different parts of the human body, e.g. fingertip, ear, wrist, etc. [16]. Wearable sensors with the support of smartphones can be used to track the basic vitals. They acquire the data from the body and smartphones process the data [17], [18]. Instead of using these two components separately, we combine them by using a smartphone camera for data acquisition. The smartphone camera with its LED on captures a video of the fingertip that is used to extract the PPG signals. As mentioned earlier, machine learning is being extensively used in the healthcare domain for analyzing clinical data, estimating basic human vitals and managing diseases, etc. [19], [20]. In our system, the blood-glucose levels are estimated by using a machine-learning algorithm for the PPG signals acquired by using a smartphone camera. To the best of our knowledge, the existing works do not make use of machine-learning approaches to classify the valid and invalid PPG signals, while in our work we first separate the invalid data from the valid signals and them classify valid signals into two blood-glucose groups. To acquire the PPG signals, the existing systems utilize a certain hardware, e.g. a finger clip and laser light, whereas our system uses only smartphone needing no individually-based calibration. With this very convenient and non-invasive system, a user can learn if his/her blood-glucose level does not fall into a normal group. Then he/she can further get on exact value using an invasive method. Our system monitor on improved accuracy with the smallest possible number of features. Its main contributions on: • Designed and developed cost-effective non-invasive blood-glucose estimation system is of a high accuracy and robustness. • The used machine-learning classification algorithms differentiate between the valid and invalid PPG signals. • The used machine-learning classification algorithms classify the valid signals into two blood-glucose groups. • The system effectiveness is proven by comprehensive and solid experimental validation. The rest of the paper is organized as follows. Section 2 describes the proposed system. The results obtained using the proposed system are explained in section 3. Section 4 provides a discussion. Section 5 concludes the paper. 2 Description of the Proposed System To estimate the blood-glucose groups non-invasively, the system uses smartphone PPG signals combined with Table 1. Blood-glucose groups Groups Blood Glucose Ranges (mg/dl) G1 70-100 G2 101-130 Figure 1. Flowchart of the proposed system. machine-learning algorithms, such as Bagged Trees, RUS Boosted Trees, Subspace KNN and Decision Trees. A flowchart of the proposed system is shown in Fig. 1. The blood-glucose scale level from 70-130 mg/dl is divided into two groups shown in Table 1. The system classifies the PPG data from the smartphone into three groups, i.e. a group of invalid data (G0) and two groups of valid data (G1 and G2). 2.1 Signal Acquisition A smartphone camera with a frame rate of 28 fps (sampling rate 28 Hz) is used to record a 30-40 seconds long video of the left hand index finger. During the recording phase, the fingertip covers both the flash and the camera. The reflection-mode PPG signals are acquired from the recorded videos (Fig. 3) [21]. At the same time, the glucose levels from a standard glucometer are recorded for labeling (two groups). The video is then transferred to a computer for processing in MATLAB. Red, green and blue channels are extracted from individual frames of the video. The threshold is set using Eq. 1 of our previous work where we designed a 70 ZHANG, ZHANG, ALI SIDDIQUI, KOS Time(secs) Figure 2. Raw smartphone PPG signal. PPG-based algorithm for the heart rate estimation [21]. The pixels having an intensity greater than the defined threshold are summed up normally for each frame using Eq. 2 [21]. The PPG signal is obtained by plotting the calculated sum of each frame. 14 volunteering subjects were asked to keep their hands steady while video-recording and about 850 samples were collected for each subject. The age of the subjects from 20-33 years and the glucose level range of them was from 70-130 mg/dl. This glucose level range was selected after a thorough experimentation using a PPG signal acquisition lab-built device. A device showed the best classification results were obtained for the glucose level range from 70-130 mg/dl. 2.2 Signal Preprocessing The reflection-mode PPG signals are extracted from the recorded videos for each subject using the system involved in Section 3.1. The acquired raw signals are inverted because of the reflection mode shown in Figs. 2 and 3, respectively. The PPG signals are prone to the noise and motion artifacts, so the inverted signals are de-noised using a Butterworth filter to remove the frequency components higher than 12 Hz as shown in Fig. 4. After de-noising, the baseline wander is removed to bring the signal back to its normal base (x-axis). A resulted signal is shown in Fig. 5. A signal is then segmented into single periods, i.e. one complete PPG cycle using on iterative sliding-window (ISW) algorithm proposed in our paper [22]. 2.3 Feature Extraction The time-domain waveform (PPG) and its first derivative (VPG) and second derivative (APG) are utilized for feature extraction. 439 single periods obtained with segmentation are in section IIIB labeled as G0, invalid, or G1-G2, valid as described in Table I. The valid and invalid single-period examples are shown in Figs. 6 Figure 3. Inverted smartphone PPG signal. Figure 4. Filtered smartphone PPG signal. rime(secs) Figure 5. PPG signal after baseline wander removal. and 7, respectively. 26 features are then extracted from the time-domain waveform shape of each period, 11 NON-INVASIVE BLOOD GLUCOSE ESTIMATION USING SMARTPHONE PPG AND SUBSPACE KNN 71 Table 2. Extracted features from the smartphone PPG signals Feature Definition width_period The time taken for one period highest_peak_value The maximum amplitude of the signal time_highest_peak The value of the time when the amplitude is maximum peaks_first_seg The number of the peaks from the start to the peak dis_peak The amplitude of the diastolic peak time_distolicpeak The value of the time when the diastolic peak occurs notch The amplitude of the notch time_notch The value of the time when the notch occurs range_values The range of the amplitude values in a single period slope_rise The rising rate of the single period from the start to the peak slope_fall The falling rate of the single period from the peak to the end timediff_start_peak The total time taken from the start to the peak timediff_peak_notch The total time taken from the peak to the notch timediff_notch_distolicpeak The total time taken from the notch to the diastolic peak timediff_distolicpeak_end The total time taken from the diastolic peak to the end mean_value_single The mean amplitude value of the single period number_values The length of the single period Standard_deviation The standard deviation of the amplitudes mean_start_max The mean amplitude from the start to the peak mean_max_notch The mean amplitude from the peak to the notch mean_notch_distolicpeak The mean amplitude from the notch to the diastolic peak mean_distolicpeak_end The mean amplitude from the diastolic peak to the end meanslope_sp The mean slope from the start to the peak meanslope_pn The mean slope from the peak to the notch meanslope_nd The mean slope from the notch to the diastolic peak meanslope_de The mean slope from the diastolic peak to the end max_deriv1 The maximum amplitude of the first derivative time_max_deriv1 The value of the time when the amplitude of the first derivative is maximum lowpeak_deriv1 The amplitude of the second highest peak of the first derivative lowpeak_deriv1_time The value of the time when the second highest peak of the first derivative occurs diff_d1peaks_value The difference in the amplitude of a maximum and second highest peak of the first derivative diff_d1peaks_time The time taken from the maximum to the second highest peak of the first derivative valley1_derv1_value The amplitude of the first valley of the first derivative valley2_derv1_value The amplitude of the second valley of the first derivative valley1_derv1_time The value of the time when the first valley of the first derivative occurs valley2_derv1_time The value of the time when second valley of the first derivative occurs diff_valleytime_derv1 The time taken from the first to the second valley of the first derivative max_deriv2 The maximum amplitude of the second derivative time_max_deriv2 The value of the time when the amplitude of the second derivative is maximum lowpeak_deriv2 The amplitude of the second highest peak of the second derivative lowpeak_deriv2_time The value of the time when the second highest peak of the second derivative occurs diff_d2peaks_value The difference in amplitude of maximum and the second highest peak of the second derivative diff_d2peaks_time The time taken from maximum to the second highest peak of the second derivative valley1_derv2_value The amplitude of the first valley of the second derivative valley2_derv2_value The amplitude of the second valley of the second derivative valley1_derv2_time The value of the time when the first valley of the second derivative occurs valley2_derv2_time The value of the time when the second valley of the second derivative occurs diff_valleytime_derv2 The time taken from the first to the second valley of the second derivative 72 ZHANG, ZHANG, ALI SIDDIQUI, KOS Valid Single period