AdvancesinMethodologyandStatistics/Metodološkizvezki,Vol.18,No.1,2021,31–37 https://doi.org/10.51936/atez3037 Anoteongeneralisationsoftheconcordanceindex forsurvivaldata NatašaKejžar ∗ ,JanezStare University of Ljubljana, Faculty of Medicine, Ljubljana, Slovenia Abstract Concordance index (c-index) was adapted to survival data by Harrell (1982). In its basic form,theindexdependsoncensoring,howevertheissuecanbeeffectivelydealtwith. More importantly,Harrell’sc-indexcannotbeusedwithtime-varyingeffectsand/ortime-dependent covariates,andseveralgeneralisationswereproposed. Welookatsomeofthem,exploretheir differences,pointtoabasicdifferencebetweenthesegeneralisations,andstronglyfavourone typeofgeneralisation. Keywords: survivalanalysis,modelswithtime-dependentcovariates/time-varyingeffects, c-index 1. Introduction Concordanceparameterdefinestheprobabilityofselectingaconcordantpairfromthe population. Withoutconsideringtime-to-eventdatathisparameterisalsoknownunderthe nameofprobabilityindex(Acionetal.,2006),anditisidenticaltotheareaunderthecurve (AUC)measure(Hanley&McNeil,1982). Itsestimator(seeLehmann,1951)isconsistent andunbiasedwithminimumvariance;andscaledversionofthisestimatorisusedasatest statisticforthenon-parametricMann-WhitneyUtest(Lehmann,1951). One of commonly used estimators for the concordance parameter for survival data is the c-index. There have been some suggestions (Antolini et al., 2005; Gerds et al., 2013; Kremers,2007)onhowtogeneralisethec-indexsinceHarrelletal.(1982)firstadapteditfor useforeventhistorydata. Theconcordanceestimationparametercanbealsoobtainedusing theestimatefortheareaundertheROCcurveforwhichmultiplegeneralisationshavebeen proposed(Chambless&Diao,2006;Heagerty&Zheng,2005). Thecommongoalofsuch generalisationsistoadaptthemeasuretotime-varyingeffectsandtime-dependentcovariates. Noneofthesesuggestions(Antolinietal.,2005;Chambless&Diao,2006;Gerdsetal.,2013; Heagerty&Zheng,2005;Kremers,2007)defineswhatismeantbygeneralisation,possibly ∗ Correspondingauthor Email addresses: natasa.kejzar@mf.uni-lj.si(NatašaKejžar),janez.stare@mf.uni-lj.si (JanezStare) ORCIDiDs: 0000-0003-3069-9863(NatašaKejžar), 0000-0002-2564-8781(JanezStare) 32 becausetheideaissoobviousthatitdoesn’thavetobeexplicitlyformulated. Still,aclear definitionhelpstodistinguishgeneralisationsfrommodifications. Inthisbriefnotewefirst define what we believe should be understood as a generalisation of the c-index, and then discusssomeproposalsinlightofthisdefinition,comparethem,andstronglyargueforone typeofgeneralisations. C-indexforsurvivaldata(Harrelletal.,1982)isdefinedasfollows c= #concordantpairs #comparablepairs . (1.1) Apairisconcordantifpredictedsurvivaltimesforthepairareinthesameorderasobserved survivaltimes. Ifthereisnocensoringinthedata,allthepairsarecomparable. Censoringmakessome comparisons impossible and by not using them (original option) bias is introduced. We knowthatcensoringuptothelargesteventtimecaneffectivelybedealtwiththeprocedure presented by Uno et al. (2011). For discussion of censoring after the largest event time seeKejžaretal.(2016). Inthisnote,welimitourselvestonocensoringtoleavetheequationssimple. Time-dependentcovariatesarecommonlypresentinstudiesofsurvivalandtime-varying effectsareoftenfoundduringtheanalysis. Harrell’sc-indexwasdefinedforconstanteffects andcovariates,meaningthatthepredictionsaremadeattime0. Ifcovariatesand/orcovariate effects change in time, predictions have to change. This means that the original c-index cannotbeused. Therewerequitesomeproposalstoincludetimedependency,butbeforediscussing(some of)them,wefirstintroducesomenotation. The variables of interest are the true survival time T i and the predicted survival time T ∗ i and we denote their observed values by t i and t ∗ i . T ∗ is usually a function of predictor variables X. Theconcordanceparametercanbeexpressedas C =P(T ∗ i h(t i |X j (t))  where h(t) represents the predicted hazard at time t. R t denotes the risk set at t. A close lookrevealsthatr i,model iscomputedasthemaximalrankminusthenumberofallconcordant pairsatt i . Imputingtherankexpressionsintotheequationof b R E weget b R E =−1+2· ∑ t i  ∑ n j=1 I(t i h(t i |X j (t))   ∑ t i (|R t i |−1) . Thedenominatorofthesecondtermrepresentsthenumberofallcomparablepairsforeacht i . Thenumeratoristwicethenumberofconcordantpairsforeacht i ,thereforethewholeterm equalstime-dependentc-index b R E =2 b C 3 −1. Thetime-dependentc-index,inthiscase,isof theform b C 3 = ∑ n i=1 ∑ n j=1 I(t i h(t i |X i (t))] ∑ n i=1 ∑ n j=1 I(t i S(t|X j )(Antolinietal.,2005). Howeverwithtime-dependentcovariatesand/ortime-varying effectsthatdoesnotholdanymore. Survivalfunction S(t),aswellascumulativedistribution function 1−S(t), are cumulative measures (S(t) = P(T > t)), and hazard function is an instantaneous measure of risk. If hazard modifies, one detects that immediately and its relative change is larger than in survival function. In survival the whole history is also accumulatedandthatmakesrelativechangesoftwotimepointssmallerwithtime. A note on generalisations of the concordance index 37 Acknowledgments This work was supported by the Slovenian Research Agency (Methodology for data analysisinmedicalsciences,P3–0154). References Acion,L.,Peterson,J.J.,Temple,S.,&Arndt,S.(2006).Probabilisticindex:Anintuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4),591–602.https://doi.org/10.1002/sim.2256 Antolini,L.,Boracchi,P.,&Biganzoli,E. (2005).Atime-dependentdiscriminationindexfor survivaldata. Statistics in Medicine, 24(24),3927–3944.https://doi.org/10.1002/sim .2427 Chambless, L. E., & Diao, G. (2006). Estimation of time-dependent area under the ROC curveforlong-termriskprediction. Statistics in Medicine, 25(20),3474–3486.https: //doi.org/10.1002/sim.2299 Gerds,T.A.,Kattan,M.W.,Schumacher,M.,&Yu,C.(2013).Estimatingatime-dependent concordanceindexforsurvivalpredictionmodelswithcovariatedependentcensoring. Statistics in Medicine, 32(13),2173–2184.https://doi.org/10.1002/sim.5681 Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operatingcharacteristic(ROC)curve. Radiology, 143(1),29–36. https://doi.org/10.11 48/radiology.143.1.7063747 Harrell,J.,FrankE.,Califf,R.M.,Pryor,D.B.,Lee,K.L.,&Rosati,R.A.(1982).Evaluating theyieldofmedicaltests. JAMA: The Journal of the American Medical Association, 247(18),2543–2546.https://doi.org/10.1001/jama.1982.03320430047030 Heagerty, P. J. (2012). Risksetroc: Riskset ROC curve estimation from censored survival data(Version1.0.4)[Computersoftware].TheComprehensiveRArchiveNetwork. https://cran.r-project.org/package=risksetROC Heagerty, P. J., & Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics, 61(1),92–105.https://doi.org/10.1111/j.0006-341X.2005.030814.x Kejžar,N.,Maucort-Boulch,D.,&Stare,J.(2016).Anoteonbiasofmeasuresofexplained variationforsurvivaldata. Statistics in Medicine, 35(6),877–882.https://doi.org/10.1 002/sim.6749 Kremers, W. K. (2007). Concordance for survival time data: Fixed and time-dependent covariates and possible ties in predictor and tim.MayoFoundation.https://www.may o.edu/research/documents/biostat-80pdf/doc-10027891 Lehmann,E. L. (1951). Consistencyand unbiasedness of certainnonparametrictests. The Annals of Mathematical Statistics, 22(2),165–179.https://doi.org/10.1214/aoms/117 7729639 Stare, J., Pohar Perme, M., & Henderson, R.(2011). A measureof explainedvariation for eventhistorydata. Biometrics, 67(3),750–759.https://doi.org/10.1111/j.1541-0420.2 010.01526.x Therneau,T.M.(2021).Survival:Survivalanalysis(Version3.2-13)[Computersoftware]. TheComprehensiveRArchiveNetwork.https://cran.r-project.org/package=survival Uno,H.,Cai,T.,Pencina,M.J.,D’Agostino,R.B.,&Wei,L.J.(2011).OntheC-statistics forevaluatingoveralladequacyofriskpredictionprocedureswithcensoredsurvival data. Statistics in Medicine, 30(10),1105–1117.https://doi.org/10.1002/sim.4154