Radiol Oncol 2022; 56(4): 440-452. doi: 10.2478/raon-2022-0037 440 research article Detection and localization of hyperfunctioning parathyroid glands on [18F]fluorocholine PET/ CT using deep learning – model performance and comparison to human experts Leon Jarabek1, Jan Jamsek2, Anka Cuderman2, Sebastijan Rep2,3, Marko Hocevar4,5, Tomaz Kocjan5,6, Mojca Jensterle5,6, Ziga Spiclin7, Ziga Macek Lezaic8, Filip Cvetko5, Luka Lezaic2,5 1 Department of Radiology, General Hospital Novo Mesto, Slovenia 2 Department for Nuclear Medicine, University Medical Centre Ljubljana, Slovenia 3 Faculty of Health Sciences, University of Ljubljana, Ljubljana, Slovenia 4 Department of Surgical Oncology, Institute of Oncology, Ljubljana 5 Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia 6 Department for Endocrinology, Diabetes and Metabolic Diseases, University Medical Centre Ljubljana, Slovenia 7 Faculty of Electrical Engineering, University of Ljubljana, Slovenia 8 Rožna dolina, c. VI/8, Ljubljana, Slovenia Radiol Oncol 2022; 56(4): 440-452. Received 21 April 2022 Accepted 22 August 2022 Correspondence to: Assist. Prof. Luka Ležaić, M.D., Ph.D., Department for Nuclear Medicine, University Medical Centre Ljubljana, Slovenia. E-mail: luka.lezaic@kclj.si Disclosure: No potential conflicts of interest were disclosed. This is an open access article distributed under the terms of the CC-BY license (https://creativecommons.org/licenses/by/4.0/). Background. In the setting of primary hyperparathyroidism (PHPT), [18F]fluorocholine PET/CT (FCH-PET) has excellent diagnostic performance, with experienced practitioners achieving 97.7% accuracy in localising hyperfunctioning parathyroid tissue (HPTT). Due to the relative triviality of the task for human readers, we explored the performance of deep learning (DL) methods for HPTT detection and localisation on FCH-PET images in the setting of PHPT. Patients and methods. We used a dataset of 93 subjects with PHPT imaged using FCH-PET, of which 74 subjects had visible HPTT while 19 controls had no visible HPTT on FCH-PET. A conventional Resnet10 as well as a novel mPETResnet10 DL model were trained and tested to detect (present, not present) and localise (upper left, lower left, upper right or lower right) HPTT. Our mPETResnet10 architecture also contained a region-of-interest masking algorithm that we evalu- ated qualitatively in order to try to explain the model’s decision process. Results. The models detected the presence of HPTT with an accuracy of 83% and determined the quadrant of HPTT with an accuracy of 74%. The DL methods performed statistically worse (p < 0.001) in both tasks compared to human readers, who localise HPTT with the accuracy of 97.7%. The produced region-of-interest mask, while not showing a consistent added value in the qualitative evaluation of model’s decision process, had correctly identified the fore- ground PET signal. Conclusions. Our experiment is the first reported use of DL analysis of FCH-PET in PHPT. We have shown that it is pos- sible to utilize DL methods with FCH-PET to detect and localize HPTT. Given our small dataset of 93 subjects, results are nevertheless promising for further research. Key words: primary hyperparathyroidism, deep learning, nuclear medicine, fluorocholine, PET/CT Introduction Primary hyperparathyroidism (PHPT) is the third most common endocrine disorder with a reported prevalence ranging from 1 to 21 per 1,000 among the general population.1 PHPT is the result of hy- perfunctioning parathyroid tissue (HPTT), which becomes insensitive to the inhibitory effect of hy- Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning 441 percalcemia. Histologically HPTT can be either an adenoma (in approximately 80% of cases), multi- ple adenomas, hyperplasia or rarely a carcinoma (in approximately 1% of cases).2 The treatment of PHPT typically requires surgical removal of HPTT. Modern, minimally invasive surgical techniques require precise preoperative localization of HPTT. For this task, [18F]fluorocholine PET/CT (FCH-PET) is one of the most promising imaging modalities, with reported sensitivities of 94–100% and specifi- cities of 88–100%.3-13 Performance of FCH-PET was repeatedly shown to be superior to other HPTT localization methods, while at the same time hav- ing lower radiation exposure compared to other nuclear medicine modalities.14 Deep learning (DL) techniques with convolu- tional neural networks (CNN) have proven to be useful in various computer vision tasks, such as super-resolution, image synthesis, denoising, clas- sification, segmentation and object detection.15-22 In medical imaging, CNNs have shown promis- ing performance, even exceeding experts in some specific cases, such as grading diabetic retinopathy from fundus images, detecting skin cancer from photographs and detecting abnormalities on chest X-ray images.23-25 Research of CNNs in nuclear medicine showed its potential in reducing the PET radiation dose, improving image quality, lesion de- tection and segmentation as well as prediction of prognosis.21-36 Given the excellent human performance of ana- lysing FCH-PET for the presence and localisation of HPTT, an interesting opportunity to challenge DL techniques is presented. An automated analy- sis pipeline of FCH-PET that would classify HPTT presence and location would allow for efficient surgical planning and could serve to double check the experts’ reports. Such analysis would also al- low for more accurate and objective comparison of potential follow-up studies; these are not often re- quired, but unavoidable in cases of persistent or re- current hyperparathyroidism. Furthermore, if the model could visualise the pathological uptake in the study, it would provide more visual feedback to the surgeon in axial images to allow for better visualisation of HPTT and would allow faster in- terpretation of interplay of surrounding anatomi- cal structures. Our aim was to explore the perfor- mance of DL analysis of FCH-PET in the setting of PHPT, since the use of DL for FCH-PET analysis in PHPT has not yet been thoroughly investigated. To this end, we developed a classification model which classifies whether HPTT is present in the study and its location. We also attempt to model in a novel unsupervised manner the regions-of- interest fed to the model. Furthermore, we aimed to provide a preliminary comparison of the diag- nostic accuracy of the DL models to human experts to determine clinical applicability, as the model should be as accurate as an expert in evaluating FCH-PET studies to be clinically applicable. Patients and methods This was a retrospective analysis of prospective clinical trial data (NCT03203668) performed at the University Medical Centre Ljubljana and Institute of Oncology Ljubljana. The clinical trial was ap- proved by the Medical Ethics Committee of the Republic of Slovenia (approval number 77/11/12). The trial only included patients with biochemical- ly confirmed primary hyperparathyroidism; hy- percalcemic patients had elevated or inappropri- ately normal parathormone (PTH) levels, whereas normocalcemic patients had inappropriately ele- vated PTH levels. All included patients were older than 18 years and had no clinical history of onco- logical, inflammatory, or infectious disease of the head and neck. No pregnant women were includ- ed in the trial. The retrospective use of the data was approved by the Medical Ethics Committee of the Republic od Slovenia (approval number 0120- 582/2021/4) and the patient consent was waived due to the retrospective nature of the analysis. The study only included images of patients with biochemically confirmed PHPT at time of FCH-PET imaging. Since the trial did not include healthy controls, data of patients with the follow- ing criteria were chosen as “controls”: no visible HPTT in FCH-PET at time of imaging; have not un- dergone surgery in thyroid region; were biochemi- cally normocalcemic at 6 months’ follow-up. Dataset description and PET-CT image acquisition We used the data of 79 participants (22 male, 57 female) with visible HPTT lesions on FCH-PET (referred below as patients) and 19 participants (7 male, 12 female) without visible HPTT lesions on FCH-PET (referred below as controls). Average age (± SD) of patients was 58.7 ± 12.7 years and average age of controls was 60.1 ± 11.8 years. Both patients and control groups were comparable in terms of age (p = 0.659) as well as male to female ratio (p = 0.852), as determined by Student t-test and nor- malised Chi-square test, respectively.37,38 Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning442 FCH-PET imaging was performed at the Department for Nuclear Medicine of the University Medical Centre Ljubljana. The acquisition details were the same as in Cuderman et al.3 The patients fasted 6 hours prior to the examination, were well hydrated and injected with 100 MBq of [18F] Fluorocholine (FCH). Acquisition was performed on a Siemens Biograph mCT® PET/CT (Siemens Healthineers AG, München, DE) 5 minutes and 60 minutes after the FCH application. The imaging region extended from the angle of mandible to the aortic arch. The imaging consisted of a low-dose CT (120 kVp, 25 mAs, CARE Dose 4D, FBP recon- struction), followed by PET imaging (one bed posi- tion of 4 minutes). PET images were reconstructed using Siemens HD PET software with iterative TrueX + TOF OSEM method (2 iterations, 21 sub- sets) with 400 × 400 matrix, zoom 1 and Gaussian filter with FWHM of 4 mm. To train and evalu- ate DL models, we used only images acquired 60 minutes after FCH application, where the balance of image quality and target-to-backround ratio is typically highest. All patients with HPTT present on FCH-PET were surgically treated at Institute of Oncology Ljubljana. Ground truth HPTT presence and loca- tion for training the CNNs was based on the post- surgical histopathological results. Furthermore, our dataset included formatted information from FCH-PET reports as used by Cuderman et al. that we used to compare the performance of DL mod- els with human experts.3 These reports were used to guide the subsequent surgical removal of the HPTT. For simplicity, we only used patients who had single gland disease and had HPTT in the typical anatomic location of parathyroid glands. HPTT was thus in one of 4 possible locations: upper left (UL, 21 patients), lower left (LL, 27 patients), up- per right (UR, 5 patients) and lower right (LR, 26 patients). Since the UR location in our dataset con- tained only 5 patients, it was removed from the fi- nal analysis due to under-representation. For the final model development and evaluation, we used 19 controls and 74 patients, among them 21 with UL HPTT, 27 with LL HPTT and 26 with LR HPTT. Image pre-processing We used the same pre-processing pipeline for all analyzed images. First, we resampled the CT im- age using bivariate spline interpolation from scipy library to match the PET image matrix of 200 × 200 × 56.39 3D interpolation was not needed as CT was reconstructed at same slices as PET. Both images were concatenated to produce a 200 × 200 × 56 × 2 matrix representing the PET/CT. Next, we cropped the desired region of interest containing the hy- perfunctioning parathyroid tissue to the matrix of size 64 × 64 × 32. For all patients, the region was cropped at same PET/CT coordinates, which were chosen empirically, such that it contained HPTT in all studies. In this way, there are lower memory re- quirements to run deep learning models. The labels for an image were represented by a one-hot encoded vector of length 4, representing locations UL, LL, LR and a dummy variable repre- senting “healthy” controls. Modelling For modelling, we defined 2 tasks: (i) a task of clas- sifying whether the HPTT is present in the image or not (CPr, classification of presence) and (ii) a task of classifying in which quadrant the HPTT was present in the image (CLoc, classification of location). CPr is a simple binary classification task where p(HPTT) = 1 – p(healthy). CLoc is a multi- class classification task where each output of the model is analogous to the probability of HPTT be- ing present at one of three considered locations UL, LL and LR. With normalized PET-CT images represented by a matrix of shape 200 × 200 × 56 × 2 as input, the output of the model was a vector of length 4, acti- vated by SoftMax activation function, correspond- ing to p(UL), p(LL), p(LR) and p(healthy) (Figure 1). FIGURE 1. mPETResnet10 architecture. First, PET-CT images are fed into UNet with a single channel output and tanh+1 activation function. This output is the PET mask. This mask is elementwise multiplied with PET image to produce a masked PET image. Masked PET is concatenated with the original CT and the masked PET-CT is fed into the ResNet10 classifier. Gray boxes represent deep-learning models, coloured boxes represent data, and circles represent operations of tanh+1, multiplication (mul) by element and concatenation (concat). Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning 443 The model was therefore trained for both CPr and CLoc simultaneously. Furthermore, the dataset was well balanced, containing a similar number of cases for each of the 4 classes, and thus ensured sta- ble training using cross entropy as a loss function.40 For training, batch size of 5 was used with stochas- tic gradient descent optimizer with momentum of 0.9 and weight decay of 0.005. The initial learning rate was determined by a grid search in log space and learning rate decay on plateau scheduling was used. Identical procedure was used for all models. All models were trained from scratch. For both CPr and CLoc classification tasks, we performed baseline experiments using the 3D ver- sion of Resnet10 (RN10) architecture and using our novel architecture as described below.41,42 Our choice of architecture of Resnet10 was based on extensive experiments which included other state- of-the-art, and larger architectures, namely us- ing 3D versions of Densenet12143, wideResNet10144, PreActResnet10145, Resnet10141 and Resnet50. For all architectures except our novel architecture, imple- mentations from Kensho et al. were used.42 We provide comprehensive comparison between the performance of RN10 and proposed architec- ture “masked-PET Resnet10” (mRN10), as well as the comparison of mRN10 to experts’ performance. Masked-PET Resnet10 We developed a novel architecture designed to mask PET signals from unimportant (i.e., physi- ological uptake) regions with high signal (eg. muscle tissue, salivary glands) before entering the RN10 classifier. This is important as the FCH-PET images are heteroscedastic, with some regions - like muscle - having high variance between sub- jects and other regions - like air - having low vari- ance. To mitigate this, and to improve conditioning of the data and therefore the stability of the classi- fier,46 we decided to allow the model itself to opti- mize for differentiable masking of these potential- ly problematic regions. We named the proposed architecture “masked-PET Resnet10” (mRN10). The mRN10 consisted of 2 parts. First, a Unet architecture was used to mask the PET-CT.47 Next, Resnet10 was used to classify the masked PET-CT. We decided on Unet architecture since it is commonly used in segmentation tasks21 and we deemed the task of masking to be similar to segmentation of the region-of-interest. Masking was achieved by first activating per-voxel output of Unet with activation function f(x) = tanh(x)+1. These output values were in interval (0,2), such that regions where Unet output was negative were closer to 0, while regions where Unet output was positive were closer to 2. This matrix, representing the mask, was then multiplied elementwise by the PET matrix, to produce a masked PET image. The architecture of mRN10 is depicted on Figure 1. Regions in PET image where Unet output was negative were multiplied by values close to 0 and were therefore effectively “masked” from the PET image. This masked PET was then concatenat- ed with CT and the masked PET-CT was used as input for the Resnet10 classifier. The entire mRN10 was trained end-to-end, therefore the masking was optimized for the lowest loss in the classifica- tion task of the downstream Resnet10 classifier. The models were written in python 3.8.0 us- ing Pytorch 1.10 framework and trained on a sin- gle GTX 1080Ti graphics card (Nvidia Corporation, Santa Clara, US).48,49 The code is freely available on- line at: https://github.com/ljarabek/AI_FCH Training and evaluation For training, we used 12-fold cross-validation with data split into a test set of 10 random subjects, with the remaining subjects being randomly split into a training set (90% of the remaining subjects) and validation set (10% of the remaining subjects). Data was normalised using z-score normalization upon splitting accordingly, such that the mean and standard deviation were computed only using the training set. Sets were sampled such that each set contained at least 1 subject from each class (UL, LL, LR and control). For testing, the model with the lowest validation loss was used. The confusion matrix for CPr evaluation was computed by sum- ming the confusion matrices for the test set across the 12 data splits, providing 120 total samples. The confusion matrix for CLoc was obtained by sum- ming the 3 confusion matrices for evaluated lo- cations UL, LL, LR across the best performing 12 data splits, providing 360 “samples”. Similarly, the area under the receiver operating characteristic curve (AUCROC) was computed. We used epiR package for R to determine the diagnostic performance metrics and McNemar test from DTComPair package for determining statis- tically significant (p < 0.05) differences.50-53 Only binary diagnostic performance metrics were used for evaluation, even though CLoc is theoretically a multi-class classification task. In this way, the results comparable to studies evaluating the per- formance of FCH-PET, since they also mostly used binary classification metrics.3-13 Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning444 Results We determined the best performing models for both RN10 and mRN10 were trained using the ini- tial learning rate of 0.013. The confusion matrices for RN10 and mRN10 are presented in Tables 1A and 1B, while the diagnostic performances for both tasks using the RN10 and mRN10 models are presented in Table 2. Both models had compara- ble performance in the CPr task. The mRN10 had a significantly higher accuracy for the CLoc task than the RN10 and was therefore used for compar- ison with human performance. We performed a comprehensive comparison with human expert evaluation only for the CLoc task. Healthy controls had, by definition, no HPTT visible on FCH-PET (as reported by human ex- perts), so the comparison could not be made for the CPr task, as human performance for CPr was 100%. Comparison of performance metrics for the CLoc task between the mRN10 model and hu- man performance (based on the same subset of 83 patients used for the DL model development) is shown in Table 3. Studies with different architectures Studies across multiple models were performed to determine the use of RN10 as the base architec- ture. The results of other models are stated below, as well as the number of trainable parameters and optimal initial learning rate. Mean CPr AUCROC and 95% confidence intervals were computed as population statistics of 50 models obtained from 5 runs of 10-fold cross-validation at optimal learn- ing rate. The highest performance among the mod- els tested was achieved with RN10 and mRN10. The performance of other models is noted in the table below. PET masking qualitative results Qualitative results were evaluated across all sub- jects and using an iteration of the model trained from a single data split. The qualitative results did not change in a significant manner with repeated training. In qualitative analysis of PET masking re- sults, the region-of-interest mask correctly identi- fied the foreground, while we have found that in TABLE 1. Confusion matrices for CPr (A) and CLoc (B) for both RN10 and mRN10 models. Note that the confusion matrices for CLoc have more samples (360 in total), as they were computed by summing the confusion matrices for each of the three included locations (UL, LL, LR) CPr task with RN10 CPr task with mRN10 HPTT present HPTT not present sum HPTT present HPTT not present sum Model output HPTT present 79 8 87 Model output HPTT present 90 11 101 Model output HPTT not present 20 13 33 Model output HPTT not present 9 10 19 sum 99 21 120 sum 99 21 120 CLoc task with RN10 CLoc task with mRN10 HPTT at GTLoc HPTT not at GTLoc sum HPTT at GTLoc HPTT not at GTLoc sum Predicted GTLoc 35 51 86 Predicted GTLoc 53 50 103 Not predicted GTLoc 61 213 274 Not predicted GTLoc 43 214 257 sum 96 264 360 sum 96 264 360 CPr = classification of presence; CLoc = classification of location; GTLoc = ground truth location based on postsurgical histopathological reports; HPTT = hyperactive parathyroid tissue; mRN10 = novel masked-PET Resnet10 model; RN10 = baseline Resnet10 model A B Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning 445 all but 3 subjects, 1 with LL HPTT and 2 LR HPTT, that the mask completely obscured (masked) the original location of HPTT on masked PET. In the 3 subjects with visible HPTT in the masked PET in the original location, the mask still partially ob- scured the HPTT, as seen in Figure 3, rows d), f) and g). Figure 2 shows a typical example of mRN10 masking, where HPTT was masked and cannot be distinguished in masked PET image. The net- work correctly classified the subject in Figure 2 as having lower right HPTT. The region of air outside the patient is masked to approximately 25% of the original PET signal, with mask having a value of approximately 0.25. The high signal from the sali- vary glands is masked in all cases, whereas signal from the thyroid gland is only partially masked in all cases, as seen in Figure 3. Discussion The aim of the study was to evaluate the poten- tial of DL models in classifying HPTT presence and location in FCH-PET studies in the setting of PHPT. For our experiments to be representative of results of such a model in practice, we used data of representative cohort of subjects with PHPT. Classification of FCH-PET studies was performed using multiple common DL models and we found that the simplest among the models tested, RN10, achieved the highest performance. Furthermore, we improve the model’s performance by modify- ing the architecture to include a region-of-interest masking step, which produced a region-of-interest mask, which successfully identified the foreground of PET. The mRN10 achieved superior performance to models of similar size. Overall, given the size of our dataset and achieved performance, we found that the use of deep learning is highly promising in potential evaluation of FCH-PET in PHPT. Dataset and patient characteristics Both our patients and the controls had representa- tive demographic characteristics of patients with PHPT, with male-to-female ratio in literature being 1:3 to 1:4 and the peak incidence of 62 ± 13 years.54-57 Therefore, the models were more likely to have learned the correct features to classify HPTT pres- ence and were trained on a relatively representa- tive dataset that would be encountered in real-life TABLE 2. Diagnostic performance metrics of RN10 and mRN10 as well as p-values as determined by McNemar test comparing both models for each task (except AUCROC) CPr RN10 CPr mRN10 CPr p-value CLoc RN10 CLoc mRN10 CLoc p-value Sensitivity [95% CI] 0.800 [0.719; 0.877] 0.909 [0.852; 0.965] 0.028 0.365 [0.268; 0.460] 0.552 [0.453; 0.652] 0.018 Specificity [95% CI] 0.619 [0.411; 0.827] 0.476 [0.263; 0.690] 0.257 0.807 [0.759; 0.854] 0.811 [0.763; 0.858] 0.910 Positive predictive value [95% CI] 0.908 [0.847; 0.969] 0.891 [0.830; 0.951] 0.507 0.407 [0.303; 0.511] 0.515 [0.418; 0.611] 0.089 Negative predictive value [95% CI] 0.394 [0.227; 0.560] 0.526 [0.302; 0.751] 0.205 0.777 [0.728; 0.827] 0.833 [0.787; 0.878] 0.021 Accuracy [95% CI] 0.767 [0.681; 0.839] 0.833 [0.756; 0.895] 0.050 0.689 [0.638; 0.736] 0.742 [0.693 0.786] 0.031 AUCROC 0.815 0.849 / 0.702 0.770 / AUCROC = area under the receiver operating characteristic curve; CPr = classification of presence; CLoc = classification of location; mRN10 = novel masked-PET Resnet10 model; RN10 = baseline Resnet10 model TABLE 3. Comparison of mRN10 and human performance for the CLoc task. p-values were determined by using the McNemar test CLoc mRN10 CLoc human p-value Sensitivity [95% CI] 0.552 [0.453; 0.652] 0.917 [0.857; 0.958] < 0.001 Specificity [95% CI] 0.811 [0.763; 0.858] 0.997 [0.986; 0.999] < 0.001 Positive predictive value [95% CI] 0.515 [0.418; 0.611] 0.992 [0.945; 0.999] < 0.001 Negative predictive value [95% CI] 0.833 [0.787; 0.878] 0.972 [0.952; 0.984] < 0.001 Accuracy [95% CI] 0.742 [0.693; 0.786] 0.977 [0.960; 0.988] < 0.001 CLoc = classification of location; mRN10 = novel masked-PET Resnet10 model; RN10 = baseline Resnet10 model Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning446 A B C i ii iii FIGURE 2. Example of novel masked-PET Resnet10 model (mRN10) masking of PET signal in a subject with parathyroid adenoma in the region of lower right parathyroid gland (black arrow in row c). Each row represents a different slice through the pre- processed [18F]fluorocholine PET/CT (FCH-PET) images ((A) – mandibular region, (B) – upper neck region (C) – lower neck region containing parathyroid adenoma). The first column shows a pre-processed PET/CT image (64 × 64 × 32 matrix), where colours toward the “warm” (red) part of the spectrum indicate higher PET signal and colours toward the “cool” (blue) part of the spectrum indicate lower PET signal. The second column shows the mask, where regions coloured toward the red part of the spectrum have higher weights (non-masked) and regions toward the yellow part of the spectrum have lower weights (masked). The third column represents the final masked PET/CT images computed by multiplying the mask with the original PET/ CT. The image was correctly classified as containing the adenoma in the lower right region. application. Representation per quadrant of HPTT in our cohort was also congruous to numbers re- ported in the literature. Marzouki et al. provide 95% confidence intervals of HPTT ratio per site as follows: lower left 32–51%, lower right 25–42%, up- per left 10–23% and upper right 4–15%.58-60 Unfortunately, the dataset was imbalanced with respect to patients vs “controls”. However, obtaining negative FCH-PET studies is difficult due to high positivity rate of finding HPTT in FCH-PET, since only patients with biochemically confirmed PHPT are imaged. Such patients are highly likely to have visible HPTT, as reported in studies exploring the effectiveness of FCH-PET.3-13 Since healthy subjects are generally not referred to undergo FCH-PET imaging, the best attempt was made to select the criteria for choosing “controls” among patients with negative visual assessment of FCH-PET. Our controls therefore had negative im- aging findings and biochemical criteria for PHPT Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning 447 i ii iii D E F G FIGURE 3. Some examples of masking of hyperactive parathyroid tissue (HPTT), which is indicated by an arrow in column (I). The images are shown in the same format as in Figure 2. Rows (D), (F) and (G) represent the only 3 cases where HPTT was not completely masked. Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning448 resolved at follow-up after 6 months without sur- gical treatment. For ground truth location, histopathological re- sults were used as opposed to expert visual assess- ment of FCH-PET, in order to simulate real-world use of the models in guiding surgical removal of HPTT. Deep-learning model architecture We have chosen the 3D Resnet10 as our baseline model since multiple research groups have shown it provides promising results in classification tasks on both medical and non-medical images and is the basis of modern architectures.41,61-63 Resnet10 also achieved the highest performance among the models tested. The other tested models with more parameters performed worse, as they seemed overparameterized and likely learned aberrant features, thus overfitting to the training data. Not many studies explore this phenomenon in detail, but a similar phenomenon was noted in the results of a recent study of Bailly et al.64 studying the ef- fects of dataset size, dataset complexity, and model complexity on performance. The main motivation behind the design of mRN10 and implementation of masking is the way experts interpret FCH-PET. Experienced nuclear medicine physicians know that HPTT usually ap- pears around the thyroid region, and we wanted to allow for the model to learn to mask regions that were deemed unimportant for classification. Furthermore, these unimportant regions (e.g., muscle) commonly produced high intensity PET signal that might affect the classifier. Using end- to-end training with only cross-entropy classifica- tion loss, we allowed the network to learn to mask these unimportant regions in an unsupervised manner by carefully tailoring the architecture. Given how experts interpret FCH-PET, mRN10 was an attempt to integrate expert knowledge into the model to improve the Resnet10 classifier. The Unet was chosen as the masking architec- ture as we deem our masking to be a task that is comparable to segmentation. For the activation function, we used tanh (hyperbolic tangent), since it was shown to be more stable in backpropagation compared to sigmoid function.65 Since our initial goal was to mask unimportant parts of the image, and tanh is a function bound between –1 and 1, we used tanh + 1, such that regions where the Unet output was very negative were close to 0 and sub- sequently masked when multiplied by the PET sig- nal intensity. The use of batch normalisation lay- ers in the downstream Resnet10 in mRN10 ensures stable training even when masked PET is the in- put, which is not explicitly normalized apriori. The masking Unet was trained end-to-end along with Resnet10 in the mRN10 architecture for optimal performance of the classification task. This was an attempt to explain the classification decision of the classifier by allowing it to optimize for masking of unimportant parts of the image as well as increase the performance by improving the conditioning of the input data to the classifier.46 Classification results One of the goals of the study was to compare the model’s performance to nuclear medicine ex- perts. The task of detecting and localizing HPTT on FCH-PET is relatively “trivial” for human ex- perts, with reported accuracies of up to 98%.3-13 We therefore feel that a small dataset is sufficient for training a model to similar performance. However, the results differed from our expectations, as the achieved performance was significantly below the one of humans for both of our tasks. It is most like- ly that by increasing the dataset to several hundred subjects, the performance gap would be closed. TABLE 4. Performance of several models on CPr task Model name mRN10 RN10 Resnet50 Resnet101 Densenet101 PreActResnet101 WideResnet101 # Trainable parameters (millions) 33.5 14.3 46.2 85.2 112.9 85.2 85.2 Optimal initial learning rate 0.0136 0.0136 2.15*10 -3 1.47*10-4 0.316 1.47*10-4 2.15*10-3 Mean CPr AUCROC [95% CI] 0.850 [0.734; 0.998] 0.812 [0.716; 0.994] 0.754 [0.624; 0.980] 0.527 [0.410; 0.639] 0.703 [0.606; 0.905] 0.739 [0.486; 0.998] 0.752 [0.653; 0.966] AUCROC = area under the receiver operating characteristic curve; CPr = classification of presence; mRN10 = novel masked-PET Resnet10 model; RN10 = baseline Resnet10 model Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning 449 Given the size of our dataset, our results are comparable to other published studies on other medical imaging related tasks. The study with a similarly sized dataset (85 subjects) in the classifi- cation of cardiac sarcoidosis by Togo et al. achieved sensitivity and specificity of 84% and 87%.66 In line with the established best practice, Lu et al. explored the diagnosis of Alzheimer disease from PET and MRI images using a multimodal approach on a dataset of 397 subjects and achieved 93% accuracy at detecting Alzheimer disease; Ma et al. used a DL method to classify thyroid diseases from SPECT with a dataset of more than 2000 subjects and achieved accuracy of up to 100% for some tasks.67,68 Because the aforementioned tasks are different and generally have different difficulty compared to ours, these comparisons and potential conclu- sions are hypothetical, but they give us a rough estimate of the number of subjects needed to sub- stantially improve the performance of our model. We feel that by increasing the size of our data- set to several hundred patients, similar levels of performance metrics to human performance could most likely be achieved. One supporting data point for this assumption is that the upper-bound of the 95% CI of AUC in the population statistics of 50 model iterations used in experiments was 0.998. Given the right data split, the model could perfect- ly classify the test set. PET mask discussion Qualitatively, we observed interesting properties of the mask created using the UNet, with exam- ples depicted in Figures 2 and 3. In Figure 2 row a), we can see that the physiological signal from the salivary glands was masked, and the weak signal of the paravertebral musculature is ampli- fied. In row b), the physiological signal from the red marrow in the vertebral body was masked and signal from the neck musculature on the left was enhanced. In row c), the physiological signal from the thyroid gland and paravertebral musculature were masked, contradicting findings in row a). The model likely learns to amplify the weak signal from the musculature with low uptake of FCH and to suppress strong signal from salivary glands and certain muscle groups with high uptake. The physiologically high PET activity in sali- vary glands and the thyroid were correctly masked. This is likely because there is usually high PET activity in these regions. The masking of the thyroid region is especially problematic since the signal from HPTT can also be masked along with the thyroid. This resulted in HPTT being masked in all but 3 cases, as shown in Figure 3. Still, this did not always result in a false classification of the HPTT location. The parathyroid adenoma in row c) is crucial to the task for experts and yet it was masked in this case by the network. Even though the model masked the adenoma, the mRN10 mod- el output in this case was still correct (lower right adenoma location). It is likely that UNet learns to encode the information of adenoma into the mask that is passed to the Resnet10. Regions near the skin and the skin itself were always enhanced – we assumed that this was an important signal to the model, as skin-air inter- face exhibits high contrast on PET and CT and acts as a rough anatomical landmark. It is also much higher in contrast than soft tissue interfaces of the structures in the parathyroid region and produces stronger gradients in training. The region outside the patient (air) was not masked to 0, but to ap- proximately 25% of the signal (value of mask was 0.25), since it is irrelevant to the classification and likely does not produce a gradient in training, so the Unet output for this region is closer to the ini- tialization state. We find the obtained masks to be interpret- able in terms of optimizing downstream Resnet10, yet they did not enhance HPTT signal on masked PET as could be expected. Highly active PET re- gions were therefore always masked (thyroid, sali- vary glands). The regions which produced high PET activity only in some subjects (musculature) were masked only if they produced high PET ac- tivity (Figure 2, row c), if not, these regions were enhanced (Figure 2 row a), introducing noise to masked PET. This further makes the masked PET uninterpretable as the intensity of the introduced noise is higher than the masked signal from the parathyroid adenoma, which can itself be masked. However, in terms of optimizing the Resnet10 clas- sification performance, these findings make sense, since the mechanism acts to adaptively scale the inputs to stabilize Resnet10 classifier. While the proposed mRN10 model, using Unet and Resnet sequentially for region-of-interest iden- tification and classification tasks, respectively, somewhat resembles the state-of-the-art region proposal algorithms, we have not found such a model presented in existing literature. Firstly, it is unlikely that such architecture would achieve superior performance on other tasks as Resnet is a good classifier on its own if it is trained on a large enough database.20,41 Secondly, the masking results we achieved did not appear to consistently add Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning450 value to FCH-PET interpretation when explored by humans, however, according to our results, the mask can be clearly interpreted in terms of opti- mizing downstream Resnet10 performance. Namely, we found the mRN10 to be superior in performance to the RN10 in CLoc task. This is probably due to the improved conditioning of the masked input to Resnet10 in mRN10, leading to in- creased stability, which in turn increases the per- formance of the trained model.46 Limitations of the study In the model selection, we found that the model with lowest number of parameters performed the best. This is one limitation of our study since ex- periments with even simpler models were not car- ried out. Another potential performance improve- ment could be using transfer learning, but we have not found suitable pretrained models for the FCH- PET images. Our PET masking was an attempt to make the model more interpretable. Most notable similar mechanisms that exist within literature are the attention mechanisms.69 The main problem with most attention mechanisms is that they rely on weighing of the image features, which are ob- tained by embedding a small image patch into a vector. Because of this, the spatial resolution of the attention map is limited by the size of the image patch, which is commonly 16 × 16 in visual trans- formers.70 In analogy, if we used 16 × 16 × 16 for our theoretical attention, the feature map of our entire image would be of spatial dimensions 4 × 4 × 2, which is too low detailed enough interpretation. Another method of explaining the model output is the class activation mapping (CAM), which also relies on feature embeddings before fully con- nected layers and therefore entails a loss of spatial resolution;71 in case of the RN10, the CAM resolu- tion would be 4 × 4 × 2. Gradient-based attribution methods, which do provide pixel-level (or in our case voxel-level) input attribution to the model output, have received criticism due to their incon- sistency and poor theoretical foundations.72 Conclusions We provide extensive experiments in deep learn- ing analysis of FCH-PET using standard classifica- tion model RN10 and a novel architecture tailored to the task. As deep learning for FCH-PET anal- ysis in PHPT has to our knowledge not yet been described in literature, our experiments provide a baseline for future work. Even though inferior performance to human experts was achieved, the results seem very promising considering the small dataset and the achieved accuracy of 83% for de- tecting HPTT and 74% accuracy for localizing the quadrant of HPTT. References 1. Fraser WD. Hyperparathyroidism. Lancet 2009; 374: 145-58. doi: 10.1016/ s0140-6736(09)60507-9 2. Grimelius L, Akerström G, Johansson H, Bergström R. Anatomy and histo- pathology of human parathyroid glands. Pathol Annu 1981; 16(Pt 2): 1-24. PMID: 7036057 3. Cuderman A, Senica K, Rep S, Hocevar M, Kocjan T, Sever, et al. 18F-Fluorocholine PET/CT in primary hyperparathyroidism: superior diag- nostic performance to conventional scintigraphic imaging for localization of hyperfunctioning parathyroid glands. J Nucl Med 2019; 61: 577-83. doi: 10.2967/jnumed.119.229914 4. Lezaic L, Rep S, Sever MJ, Kocjan T, Hocevar M, Fettich J. 18F-Fluorocholine PET/CT for localization of hyperfunctioning parathyroid tissue in primary hyperparathyroidism: a pilot study. Eur J Nucl Med Mol Imaging 2014; 41: 2083-9. doi: 10.1007/s00259-014-2837-0 5. Graves CE, Hope TA, Kim J, Pampaloni MH, Kluijfhout W, Seib CD, et al. Superior sensitivity of 18F-fluorocholine: PET localization in primary hyper- parathyroidism. Surgery 2022; 171: 47-54. doi: 10.1016/j.surg.2021.05.056 6. Michaud L, Balogova S, Burgess A, Ohnona J, Huchet V, Kerrou K, et al. A pilot comparison of 18F-fluorocholine PET/CT, ultrasonography and 123I/99mTc-sestaMIBI dual-phase dual-isotope scintigraphy in the preop- erative localization of hyperfunctioning parathyroid glands in primary or secondary hyperparathyroidism. Medicine 2015; 94: e1701. doi: 10.1097/ md.0000000000001701 7. Kluijfhout WP, Vorselaars WM, van den Berk SA, Vriens MR, Borel Rinkes IH, Valk GD, et al. Fluorine-18 fluorocholine PET-CT localizes hyperpar- athyroidism in patients with inconclusive conventional imaging. Nucl Med Commun 2016; 37: 1246-52. doi: 10.1097/mnm.0000000000000595 8. Kluijfhout WP, Pasternak JD, Drake FT, Beninato T, Gosnell JE, Shen WT, et al. Use of PET tracers for parathyroid localization: a systematic review and meta-analysis. Langenbecks Arch Surg 2016; 401: 925-35. doi: 10.1007/ s00423-016-1425-0 9. Thanseer N, Bhadada SK, Sood A, Mittal BR, Behera A, Gorla A K R, et al. Comparative effectiveness of ultrasonography, 99mTc-sestamibi, and 18F-fluorocholine PET/CT in detecting parathyroid adenomas in patients with primary hyperparathyroidism. Clin Nucl Med 2017; 42: e491-7. doi: 10.1097/rlu.0000000000001845 10. Whitman J, Allen IE, Bergsland EK, Suh I, Hope TA. Assessment and com- parison of 18F-Fluorocholine PET and 99mTc-sestamibi scans in identifying parathyroid adenomas: a metaanalysis. J Nucl Med 2021; 62: 1285-91. doi: 10.2967/jnumed.120.257303 11. Beheshti M, Hehenwarter L, Paymani Z, Rendl G, Imamovic L, Rettenbacher R, et al. 18F-Fluorocholine PET/CT in the assessment of primary hyperpar- athyroidism compared with 99mTc-MIBI or 99mTc-tetrofosmin SPECT/CT: a prospective dual-centre study in 100 patients. Eur J Nucl Med Mol Imaging 2018; 45: 1762-71. doi: 10.1007/s00259-018-3980-9 12. Broos WAM, Wondergem M, Knol RJJ, Van der Zant FM. Parathyroid imag- ing with 18F-fluorocholine PET/CT as a first-line imaging modality in primary hyperparathyroidism: a retrospective cohort study. EJNMMI Res 2019; 9: 72. doi: 10.1186/s13550-019-0544-3 13. Hope TA, Graves CE, Calais J, Ehman EC, Johnson GB, Thompson D, et al. Accuracy of 18 F-fluorocholine PET for the detection of parathyroid adeno- mas: prospective single-center study. J Nucl Med 2021; 62: 1511-6. doi: /10.2967/jnumed.120.256735 Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning 451 14. Rep S, Hocevar M, Vaupotic J, Zdesar U, Zaletel K, Lezaic L. 18F-choline PET/ CT for parathyroid scintigraphy: significantly lower radiation exposure of pa- tients in comparison to conventional nuclear medicine imaging approaches. J Radiol Prot 2018; 38: 343-56. doi: 10.1088/1361-6498/aaa86f 15. Li Y, Sixou B, Peyrin F. A review of the deep learning methods for medical images super resolution problems. IRBM 2021; 42: 120-33. doi: 10.1016/j. irbm.2020.08.004 16. Yang W, Zhang X, Tian Y, Wang W, Xue J-H, Liao Q. Deep learning for single image super-resolution: a brief review. IEEE Trans Multimedia 2019; 21: 3106-21. doi: 10.1109/tmm.2019.2919431 17. Wang L, Chen W, Yang W, Bi F, Yu FR. A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access 2020; 8: 63514- 37. doi: 10.1109/access.2020.2982224 18. Liu B, Liu J. Overview of image denoising based on deep learning. J Phys Conf Ser 2019; 1176: 022010. doi: 10.1088/1742-6596/1176/2/022010 19. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, et al. Deep learning: a primer for radiologists. RadioGraphics 2017; 37: 2113-31. doi: 10.1148/rg.2017170077 20. Al-Saffar AAM, Tao H, Talab MA. Review of deep convolution neural network in image classification. In: 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunication. IEEE 2017. p. 26-31. doi: 10.1109/icramet.2017.8253139 21. Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D. Image segmentation using deep learning: a survey. [Internet]. arXiv: 2001.05566 2020. Available from: https://doi.org/10.48550/arXiv.2001.05566 22. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, et al. A survey of deep learning- based object detection. [Internet]. arXiv: 2019. Available from: http://arxiv. org/abs/1907.09408 23. Sahlsten J, Jaskari J, Kivinen J, Turunen L, Jaanio E, Hietala K, et al. Deep learning fundus image analysis for diabetic retinopathy and macular edema grading. Sci Rep 2019; 9: 10750. doi: 10.1038/s41598-019-47181-w 24. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542: 115-8. doi: 10.1038/nature21056 25. Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert com- parison. Proc Conf AAAI Artif Intell 2019; 33: 590-7. doi: 10.1609/aaai. v33i01.3301590 26. Nie D, Cao X, Gao Y, Wang L, Shen D. Estimating CT image from MRI data us- ing 3D fully convolutional networks. Deep Learn Data Label Med Appl 2016; 2016: 170-8. doi: 10.1007/978-3-319-46976-8_18 27. Torrado-Carvajal A, Vera-Olmos J, Izquierdo-Garcia D, Catalano OA, Morales MA, Margolin J, et al. Dixon-VIBE Deep Learning (DIVIDE) pseudo-CT synthe- sis for pelvis PET/MR attenuation correction. J Nucl Med 2019; 60: 429-35. doi: 10.2967/jnumed.118.209288 28. Guo R, Hu X, Song H, Xu P, Xu H, Rominger A, et al. Weakly supervised deep learning for determining the prognostic value of 18F-FDG PET/CT in extran- odal natural killer/T cell lymphoma, nasal type. Eur J Nucl Med Mol Imaging 2021; 48: 3151-61. doi: 10.1007/s00259-021-05232-3 29. Hwang D, Kang SK, Kim KY, Seo S, Paeng JC, Lee DS, et al. Generation of PET attenuation map for whole-body time-of-flight 18F-FDG PET/MRI us- ing a deep neural network trained with simultaneously reconstructed activity and attenuation maps. J Nucl Med 2019; 60: 1183-9. doi: 10.2967/ jnumed.118.219493 30. Liu F, Jang H, Kijowski R, Bradshaw T, McMillan AB. Deep learning MR imaging-based attenuation correction for PET/MR imaging. Radiology 2018; 286: 676-84. doi: 10.1148/radiol.2017170700 31. Leynes AP, Yang J, Wiesinger F, Kaushik SS, Shanbhag DD, Seo Y, et al. Zero- Echo-Time and Dixon Deep Pseudo-CT (ZeDD CT): direct generation of pseudo-CT images for pelvic PET/MRI attenuation correction using deep convolutional neural networks with multiparametric MRI. J Nucl Med 2018; 59: 852-8. doi: 10.2967/jnumed.117.198051 32. Blanc-Durand P, Van Der Gucht A, Schaefer N, Itti E, Prior JO. Automatic le- sion detection and segmentation of 18F-FET PET in gliomas: a full 3D U-Net convolutional neural network study. PLoS One 2018; 13: e0195798 doi: 10.1371/journal.pone.0195798 33. Zhao X, Li L, Lu W, Tan S. Tumor co-segmentation in PET/CT using multi-mo- dality fully convolutional neural network. Phys Med Biol 2018; 64: 015011 doi: 10.1088/1361-6560/aaf44b 34. Zhong Z, Kim Y, Plichta K, Allen BG, Zhou L, Buatti J, et al. Simultaneous cosegmentation of tumors in PET-CT images using deep fully convolutional networks. Med Phys 2019; 46(2): 619-33. doi: 10.1002/mp.13331 35. Schwyzer M, Ferraro DA, Muehlematter UJ, Curioni-Fontecedro A, Huellner MW, von Schulthess GK, et al. Automated detection of lung cancer at ul- tralow dose PET/CT by deep neural networks – initial results. Lung Cancer 2018; 126: 170-3. doi: 10.1016/j.lungcan.2018.11.001 36. Hatt M, Laurent B, Ouahabi A, Fayad H, Tan S, Li L, et al. The first MICCAI challenge on PET tumor segmentation. Med Image Anal 2018; 44: 177-95. doi: 10.1016/j.media.2017.12.007 37. Student. The probable error of a mean. Biometrika 1908; 6: 1. doi: 10.2307/2331554 38. Pearson K. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Lond Edinb Dublin Philos Mag J Sci 1900; 50: 157-75. doi: 10.1080/14786440009463897 39. Jones E, Oliphant T, Peterson P, Others. SciPy.org. SciPy Open source Sci. tools Python2. 2001. 40. Good IJ. Rational decisions. J R Stat Soc Ser B 1952; 14: 107-14. doi: 10.1111/ j.2517-6161.1952.tb00104.x 41. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016. doi: 10.1109/cvpr.2016.90 42. Hara K, Kataoka H, Satoh Y. Learning spatio-temporal features with 3D re- sidual networks for action recognition. 2017 IEEE International Conference on Computer Vision Workshops (ICCVW) 2017. doi: 10.1109/iccvw.2017.373 43. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected con- volutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. doi: 10.1109/cvpr.2017.243 44. Zagoruyko S, Komodakis N. Wide residual networks. Procedings of the British Machine Vision Conference 2016; 2016. doi: 10.5244/c.30.87 45. He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. Computer Vision – ECCV 2016. 2016: 630-45. doi: 10.1007/978-3-319- 46493-0_38 46. Full stack deep learning. Lecture 1: DL fundamentals [Internet]. Fullstackdeeplearning.com. [cited 2022 Aug 28]. Available from: https:// fullstackdeeplearning.com/spring2021/lecture-1/ 47. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for bio- medical image segmentation. In: Navab N, Hornegger J, Wells W, Frangi A. editors. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science 2015; 9351: 234-41. Cham: Springer.doi: 10.1007/978-3-319-24574-4_28 48. Rossum G Van, Drake FL. Python Tutorial, Technical Report CS-R9526. Cent voor Wiskd en Inform 1995. 49. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, et al.. Automatic differentiation in PyTorch. 31st Conf Neural Inf Process Syst 2017. 50. Stevenson M, Sergeant E, Nunes T, Heuer C, Marshall J, Sanchez J, et al. epiR: Tools for the analysis of epidemiological data. v1.0-15. 2020. [cited 2022 Mar 15]. Available at: https://CRAN.R-project.org/package=epiR 51. R Development Core Team. R: a language and environment for statistical computing. Vienna; R Foundation for Statistical Computing. Available at: . http://www.R-project.org 52. McNemar Q. Note on the sampling error of the difference between cor- related proportions or percentages. Psychometrika 1947; 12: 153-7. doi: 10.1007/bf02295996 53. Stock C, Hielscher T. DTComPair: comparison of binary diagnostic tests in a paired study design. R package version 1.0.3. [Internet]. 2014. Available from: http://cran.r-project.org/package=DTComPair 54. Rao SD. Epidemiology of parathyroid disorders. Best Pract Res Clin Endocrinol Metab 2018; 32: 773-80. doi: 10.1016/j.beem.2018.12.003 55. Somnay YR, Craven M, McCoy KL, Carty SE, Wang TS, Greenberg CC, et al. Improving diagnostic recognition of primary hyperparathyroidism with ma- chine learning. Surgery 2017;161: 1113-21. doi: 10.1016/j.surg.2016.09.044 Radiol Oncol 2022; 56(4): 440-452. Jarabek L et al. / Primary hyperparathyroidism on [18F]fluorocholine PET/CT using deep learning452 56. Press DM, Siperstein AE, Berber E, Shin JJ, Metzger R, Monteiro R, et al. The prevalence of undiagnosed and unrecognized primary hyperparathy- roidism: a population-based analysis from the electronic medical record. Surgery 2013; 154: 1232-8. doi: 10.1016/j.surg.2013.06.051 57. Bilezikian JP, Marcus R, Levine MA, Marcocci C, Silverberg SJ, Potts JT, editors. Parathyroids: basic and clinical concepts. 3rd edition. 2014. Elsevier, Academic Press. 58. Marzouki HZ, Chavannes M, Tamilia M, Hier MP, Black MJ, Levental M, et al. Location of parathyroid adenomas: 7-year experience. J Otolaryngol Head Neck Surg 2010; 39: 551-4. PMID: 20828518 59. Filser B, Uslar V, Weyhe D, Tabriz N. Predictors of adenoma size and location in primary hyperparathyroidism. Langenbeck’s Arch Surg 2021; 406: 1607. doi: 10.1007/s00423-021-02179-9 60. Shah VN, Bhadada SK, Bhansali A, Behera A, Mittal BR. Changes in clinical & biochemical presentations of primary hyperparathyroidism in India over a period of 20 years. Indian J Med Res 2014; 139: 694-9. PMID: 25027078 61. Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transforma- tions for deep neural networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2017. doi: 10.1109/cvpr.2017.634 62. Gao S, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr PHS. Res2Net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell 2019. doi: 10.1109/TPAMI.2019.2938758 63. Chen S, Tan X, Wang B, Hu X. Reverse attention for salient object detection. Computer Vision – ECCV 2018 2018; 236-52. doi: 10.1007/978-3-030- 01240-3_15 64. Bailly A, Blanc C, Francis É, Guillotin T, Jamal F, Wakim B, et al. Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Comput Methods Programs Biomed 2022; 213: 106504 doi: 10.1016/j.cmpb.2021.106504 65. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Comput 1997; 9: 1735-80. doi: 10.1162/neco.1997.9.8.1735 66. Togo R, Hirata K, Manabe O, Ohira H, Tsujino I, Magota K, et al. Cardiac sarcoidosis classification with deep convolutional neural network-based fea- tures using polar maps. Comput Biol Med 2019; 104: 81-6. doi: 10.1016/j. compbiomed.2018.11.008 67. Lu D, Popuri K, Ding GW, Balachandar R, Beg MF. Multiscale deep neural network based analysis of FDG-PET images for the early diagnosis of Alzheimer’s disease. Med Image Anal 2018; 46: 26-34. doi: 10.1016/j.me- dia.2018.02.002 68. Ma L, Ma C, Liu Y, Wang X. Thyroid diagnosis from SPECT images using con- volutional neural network with optimization. Comput Intell Neurosci 2019; 2019: 6212759. doi: 10.1155/2019/6212759 69. Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learn- ing. Neurocomputing 2021; 452: 48-62. doi: 10.1016/j.neucom.2021.03.091 70. Liu Y, Zhang Y, Wang Y, Hou F, Yuan J, Tian J, et al. A survey of visual trans- formers. arXiv [csCV] [Internet]. 2021 [cited 2022 Aug 28]; Available from: http://arxiv.org/abs/2111.06091 71. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. arXiv [csCV] [Internet]. 2015 [cited 2022 Aug 28]; Available from: http://arxiv.org/abs/1512.04150 72. Ancona M, Ceolini E, Öztireli C, Gross M. Towards better understanding of gradient-based attribution methods for deep neural networks. arXiv [csLG] [Internet]. 2017 [cited 2022 Aug 28]; Available from: http://arxiv.org/ abs/1711.06104