Metodoloˇ ski zvezki, V ol. 16, No. 2, 2019, 57–69 The Bad Mathematics of the Bad Luck Theory Mariia Beliaeva 1 Abstract The mathematics of the Bad Luck theory of carcinogenesis by Tomasetti and V ogelstein generated a great deal of controversy among cancer specialists but did not draw the mathematicians’ attention. Thus the gross mathematical mistakes of the theory foundation did not get a proper critique and remained unnoticed. As a result, the sensational quantitative estimates of the role of Bad Luck in cancer occurrence, though being erroneous, have spread widely among researchers and the general public and got the unfair popularity. The present paper reviews the actual mathematical mistakes of Bad Luck theory. 1 Introduction Almost five years have passed since Tomasetti and V ogelstein (hereafter T&V) attempted to explain the variation of cancer risk by the number of stem cell divisions [22]. Their idea, referred to as Bad Luck Theory, further elaborated in [25, 26, 27], generated a great deal of controversy. The discussion is still alive [17, 31, 35, 41], for more bibliography see [19] and [37]. T&V argue that intrinsic stochastic effects play the main role in tumor initiation, and specifically that the majority of cancers (22 of 31 considered) “appear to be mainly due to stochastic effects associated with DNA replication of the tissues’ stem cells” [22]. In their opinion, it follows that the primary prevention measures—vaccination, altering lifestyle, environmental control—are not likely to be as effective as the secondary prevention that is early detection and treatment. The authors believe that their “results could have important public health implication” and conclude that “secondary prevention should be the major focus” [22]. If this result had been obtained in a correct way, it would be really significant as making it possible to optimize the research direction and funding. Unfortunately for the authors and fortunately for all of us they were wrong. T&V claimed that their theory was based on a mathematical model—one of many cancer math- ematical models which “are popping up everywhere in cancer research” [15]. The gross mathematical mistakes however make the Bad Luck model erroneous and its conclusions unfounded. Surprisingly, these mistakes were neither detected at the peer-review stage nor noticed after publication, maybe because professional mathematicians rarely read medical papers. On the other hand, the statement that some conclusion is based on a mathematical model usually produces a magical effect on the general public. There are a great many 1 RHYTHM Engineering, Sofia, Bulgaria; maria.beljaeva29@gmail.com 58 Beliaeva ways though for a mathematical model to be incorrect - wrong assumptions, errorneous formulas, senseless criteria, or arithmetic mistakes. Thus each model has to pass a thor- ough examination until one can decide to trust it. Nevertheless some cancer specialists ([1, 6, 14, 21]) just believe that the Bad Luck theory is right and use it as a basis for the further research. The purpose of this paper is to stop that common misconception by demonstrating the crucial mathematical mistakes in the theory. The rest of the paper is organised as follows. The three main mathematical mistakes of Bad Luck theory are described in Sections 2–4, Section 5 draws the reader’s atten- tion to the unreasonable using of one fashionable method in the theory, in Section 6 we explain what is wrong with some reasoning of the theory fallacy, Section 7 contains the conclusions. 2 The First Mistake: Correlation between Logarithms treatedasCorrelationbetweentheirArguments T&V tried to calculate the coefficientR of correlation between the human lifetime num- berlscd of stem cell divisions within different tissues and the lifetime cancer riskr for them [22, 23]. Having obtainedR = 0:804, they argue that as the coefficient of determi- nationR 2 2=3, then approximately 2=3 of differences in cancer risk can be explained by the number of stem cells random divisions i.e., by bad luck. T&V concluded that “The stochastic effects of DNA replication appear to be the major contributor to cancer in hu- mans” [22]. Actually, what T&V have really calculated was not the correlation coefficient betweenlscd andr themselves but that between their logarithms. These two coefficients are not equal. To know it immediately one only has to google logarithms change correlation and find out questions like “I don’t understand why researchers sometimes use logged versions of their variables and why correlation seems so much higher if you do so” [42] or “Before logs the correlation is 0.49 and after logs it is 0.9. I thought the logs only change the scale. How is this possible?” [43]. I failed to find any mathematical explanation why logarithms change correlation in the literature. The proof below fills the gap. 2.1 APieceofTheory The well-known formula for the varianceD y of a non-random functiony =’(x 1 ;x 2 ;:::x n ) is [12, 39] D y n X i=1 @’ @x i 2 m D i +2 X i