ERK'2022, Portorož, 359-362 359
Computational analysis of rhythmic data using RDA
Arthur Vestu
1
, Lily-Jade Roldao
1
, Miha Moˇ skon
2
1
ESIGELEC Graduate School of Engineering, Rouen, France
2
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
E-mail: miha.moskon@fri.uni-lj.si
Abstract
Rhythmic processes can be found in different contexts that
range from biological to socio-technical systems. Several
computational methods have been introduced to study such
processes. However, these have mostly been adapted to
work well with specific data and need to be manually
adapted for a wider usage. We describe a software frame-
work dedicated to a comprehensive analysis of rhythmic
datasets. It integrates different state-of-the-art methods
dedicated to the identification and characterisation of rhy-
thmic processes. It allows its users to straightforwardly
apply different methods to a selected dataset, and to iden-
tify the method yielding the results with the largest rele-
vance in a given context. We demonstrate the application
of the proposed framework on two examples. Firstly, we
report the results of a benchmarking experiment, which
also indicate the classification performance of each of
the implemented methods (i.e., classifying measurements
among rhythmic and non-rhythmic groups). Secondly,
we present the application of the proposed framework on
the assessment of rhythmic trends in traffic data reflect-
ing circadian rhythmicity. We believe that the proposed
package will find a vast scope of applications in different
scientific domains.
1 Introduction
Rhythmic processes, i.e., processes that reflect periodic
response, are pervasive in our environment. For exam-
ple, circadian clocks present biological clocks that dis-
play daily (approximately 24-hour period) oscillations and
regulate up to half of all genes in an organism [1]. Since
a disruption of these rhythms can lead to the develop-
ment of different diseases, their analysis has gained sig-
nificant importance in recent years [2]. In this context
researchers are combining different experimental tech-
niques with computational approaches [3]. Several com-
putational methods for the identification and characteri-
sation of rhythmic trends have been proposed in recent
years [2]. These methods have been mostly adapted to
work well with specific biological data, i.e. transcrip-
tomic circadian time-series datasets. However, detection
and analysis of rhythmic patterns have become an impor-
tant aspect also in fields of research outside biology and
medicine, such as urban planning. On the other hand,
the majority of the state-of-the-art methods for rhythmic-
ity detection and analysis cannot be straightforwardly ap-
plied to such datasets.
In this work we describe a computational framework
for domain-agnostic analysis of rhythmic datasets. The
framework combines different methods for rhythmicity
detection and analysis. It accepts input data in standard-
ised formats, and is able to produce publication-ready fig-
ures and results in a tabular format for straightforward
analysis. Moreover, the framework implements a set of
functionalities which can be used for benchmarking ex-
periments. We demonstrate the proposed framework and
the methods it incorporates on two case studies. Firstly,
we describe a benchmarking experiment of synthetically
generated data. Secondly, we describe the application of
the framework on the real data presenting the average car-
travel paces throughout the day on a selected road seg-
ment in the city of Ljubljana. We use the obtained results
to discuss the pros and cons of each of the selected meth-
ods.
2 Methods
2.1 Identification and characterisation of rhythmic
data
Several computational approaches to identify and charac-
terise rhythmic data have been introduced in recent years
[2]. In this section we describe some of the most popu-
lar methods, which have also been incorporated into the
proposed computational framework.
2.1.1 Lomb-Scargle
Lomb-Scargle periodogram (LS) is one of the first algo-
rithms devoted to the identification of rhythmicity in data.
It presents a parametric model that identifies oscillations
by comparing the data to sinusoidal curves [4]. LS can
handle data quality issues like replicates, missing values
or uneven sampling.
2.1.2 Cosinor
Cosinor is a trigonometric regression model similarly to
LS. Cosinor is a parametric model which allow us to pre-
cisely identify oscillatory datasets as well as amplitudes
and acrophases of oscillations. Cosinor can produce rel-
evant results even when data quality is poor (irregular in-
tervals, unbalanced data full of outliers) [5].
360
2.1.3 ARSER
ARSER is similar to Cosinor as it is also a parametric
method using harmonic regression [6]. ARSER addition-
ally uses autoregressive spectral estimation to estimate
the period of the data. Nevertheless, it is sensitive to data
quality issues which can lead to inaccuracies. Moreover,
it does not work when there are replicates of data.
2.1.4 JTK CYCLE
JTK CYCLE is a non-parametric method that detects os-
cillations by comparing the ranks of the measured val-
ues to a set of specified symmetric reference curves [7].
JTK CYCLE works well with replicates and missing val-
ues. However, uneven sampling can lead to inaccurate
acrophase estimations.
2.1.5 RAIN
RAIN (Rhythmicity Analysis Incorporating Nonparamet-
ric methods) is an upgraded version of JTK CYCLE us-
ing asymmetric waveforms [8]. RAIN examines the in-
creasing and decreasing portions of the curve separately.
It is more tolerant to data quality issues than JTK CYCLE.
2.1.6 meta2d
Metacycle presents a platform that can be used to merge
the results of different methods. It applies the function
meta2d [9], which is based on Fisher’s combined proba-
bility test to combine the results of ARSER, JTK CYCLE
and/or LS.
2.2 Benchmarking of methods for rhythmicity anal-
ysis
One of the problems of benchmarking of methods on real
data is that the ground truth, e.g., does a specific dataset
reflect oscillatory response or not, is usually not known.
This problem can be solved with the application of syn-
thetic data. Synthetic data presenting rhythmic as well
as arrhythmic time-series can be created with samples
from signals with user-defined parameters (e.g., periods
and amplitudes), and with the addition of Gaussian noise
to recreate the natural as well as technical variance [3].
However, there are certain guidelines that should be fol-
lowed when collecting/generating data for experiments
involving rhythmicity analysis [3]. For example, the data
should be collected/generated for at least two periods to
reduce the sensitivity of computational methods to out-
liers (number of false negatives).
2.3 A computational framework for domain-agnostic
analysis of rhythmic data
We present a Python package RDA (Rhythmic Data Anal-
ysis) with the implementation of different functions that
allow the user to perform rhythmic data analysis using
LS, ARSER, JTK CYCLE, Cosinor, RAIN, and meta2d.
The package can accept the data in generalised input for-
mats suitable for an arbitrary scientific domain, and can
produce different types of visualisation of the obtained
results, namely p-value distributions, Venn diagrams, and
classification performance measures (when the target la-
bels are known). The RDA implementation, documenta-
tion and examples are available athttps://github.
com/VESTUArthur/RDA.
3 Results
3.1 Case study 1: benchmarking of methods on syn-
thetic data
We generated the synthetic data using the functionalities
of the CosinorPy package [10] with different noise lev-
els relative to the oscillation amplitudes (0.3, 0.6, 0.9)
and different number of cosinor components (1, 2 or 3).
The data were generated in a 48 hour interval with a 2
hour sampling resolution. We did not use replicates be-
cause the ARSER does not work with replicated data.
For each configuration we generated 5,000 rhythmic and
5,000 non-rhythmic time-series data, and each configura-
tion was repeated 6 times. The period of rhythmic data
was set to 24 hours. We tested these datasets using differ-
ent methods and we evaluated their performance through
various metrics, such as Matthew’s correlation coefficient
(MCC) [11], area under curve (AUC), precision, recall,
f1-score and accuracy (see Figure 1). Figure 2 presents
the results of MCC assessment for all the experiments.
method method
method method
method method
f1
precision recall
accuracy
auc
mcc
Figure 1: Evaluation of methods on synthetic data generated us-
ing a single-component cosinor model and a noise levels of 0.3.
The experiment was repeated 6 times and error bars represent
standard error of each metric for each model. Abbreviations
and symbols: ARS – ARSER, JTK – JTK CYLCE, LS – Lomb-
Scargle, meta2d – Metacycle, Cosinor – zero-amplitude test us-
ing a multi-component cosinor, Cosinor1 – zero-amplitude us-
ing a single-component cosinor, Cosinor1(q) – model signifi-
cance test using a single-component cosinor, RAIN – Rhyth-
micity Analysis Incorporating Nonparametric methods.
361
mcc
mcc
mcc
mcc
mcc
mcc
mcc
mcc
mcc
1 component, noise = 0.3 1 component, noise = 0.6 1 component, noise = 0.9
2 components, noise = 0.3 2 components, noise = 0.6 2 components, noise = 0.9
3 components, noise = 0.3 3 components, noise = 0.6 3 components, noise = 0.9
Figure 2: Matthew’s Correlation Coefficient (MCC) for each method evaluated on data generated with different number of harmonic
component and noise levels. Each experiment was repeated 6 times and error bars represent standard error of each metric for each
model. Abbreviations and symbols: ARS – ARSER, JTK – JTK CYLCE, LS – Lomb-Scargle, meta2d – Metacycle, Cosinor –
zero-amplitude test using a multi-component cosinor, Cosinor1 – zero-amplitude using a single-component cosinor, Cosinor1(q) –
model significance test using a single-component cosinor, RAIN – Rhythmicity Analysis Incorporating Nonparametric methods.
3.2 Case study 2: application of the framework on
traffic data
In our second case study we applied the proposed frame-
work to the evaluation of rhythmic trends in traffic data.
We performed the analysis of data obtained on different
road segments using Google Directions API [12] as de-
scribed in [13]. Briefly, travel times on a segment were
obtained using a 10 minutes sampling resolution. Route
lengths and travel times were then converted to average
paces for each route and for each time sampled. In the
analysis we employed the same methods as in the first
case study. We tested the capability of each method to
assess the rhythmicity parameters that can be used for in-
terpretation, namely locations of each peak and MESOR
(midline estimating statistic of rhythm) values. Each met-
hod returns different parameters, which can be interpreted
in a similar way (see Table 1).
We had problems using RAIN due to the large amount
of data which caused the memory exceeded error. More-
over, to test the data on ARSER it was necessary to av-
erage the replicates on each timepoint, and fill missing
values. For each method, we plotted the peaks obtained.
Nevertheless, ARSER peak was not plotted since it does
not yield the peak occurrence. Figure 3 indicates the
consistence between the observed data and the assessed
peaks.
We can see that Cosinor yields the most accurate lo-
cations of peaks. Moreover, a multi-component cosinor
model is the only method, that is able to assess the loca-
Table 1: Interpretation of rhythmicity parameters as returned by
the implementation of each method on the selected datset. Ab-
breviations and symbols: ARS – ARSER, JTK – JTK CYLCE,
LS – Lomb-Scargle, Cosinor – multi-component cosinor, Cosi-
nor1 – single-component cosinor, N/A – not available, t(peak)
– occurrence of peak(s), height(peak) – height of peak(s),
MESOR – midline estimating statistic of rhythm.
Method t(peak) height(peak) MESOR
ARS N/A amplitude mean
JTK LAG AMP mean
LS PeakSPD PhaseShiftHeight N/A
Cosinor peak heights mean
Cosinor1 acrophase[h] amplitude mean
tions of multiple peaks per one rhythmicity period. Cosi-
nor also yields the most user-friendly output with exact
locations of peaks and their heights.
4 Conclusions
In this paper we proposed a framework for domain-agnostic
analysis of rhythmic data. We benchmarked the methods
incorporated within the proposed framework using syn-
thetic data. Moreover, we demonstrated the applicability
of the framework on the traffic data reflecting circadian
trends with two distinct peaks per period.
Each of the methods applied in our analysis exhib-
362
 Figure 3: The consistence between the observed data and as-
sessed peaks for each of the selected methods. Black dots rep-
resent the locations of assessed peaks. The analysis was per-
formed on route 0 – a more detailed description of the data and
the location of this route is available at [13].
ited certain advantages in comparison to other methods.
For example, Cosinor can be used to accurately evalu-
ate the rhythmicity parameters, but is in some cases less
successful in classification than other (non-parametric)
methods (see Figure 2). On the other hand, RAIN and
JTK CYCLE were specifically developed for hypothe-
sis testing but yield inconsistent or biased estimations of
rhythmicity parameters [8]. We also tested recently de-
veloped PyBOAT method [14], which can be used to ac-
curately assess rhythmicity parameters, but failed on both
of our case studies. The period obtained with this method
was between 400 and 1000 minutes while the true pe-
riod of the data was 1440 minutes (i.e., 24 hours) in both
cases.
When performing the analyses one must identify its
goals and select the most suitable method accordingly.
Our recommendation is that different methods are used
together to take advantages of each method separately
and thus obtain the results with higher significance. The
proposed framework allows the researcher to do this stra-
ightforwardly.
Our future work will be focused to more detailed ben-
chmarking of the applied methods as well as to the appli-
cation of the proposed framework on other types of rhyth-
mic data. Moreover, our work has been mostly focused to
the identification and characterisation of individual time-
series data. In the near future we will extend the frame-
work and its benchmarking with comparative analyses of
rhythms from different fields of science. Additionally, we
will extend the framework with implementations of addi-
tional state-of-the-art methods.
Acknowledgements
This work has been partially supported by the scientific research
program P2-0359, and by the basic research projects J5-1798,
both financed by the Slovenian Research Agency. A V and LJR
were supported by the Erasmus+ exchange programme of the
European Union. The funding bodies had no role in the design
of the study and collection, analysis, and interpretation of data
nor in writing the manuscript.
References
[1] R. Zhang, N. Lahens, H. Ballance, E. Hughes, and J. Ho-
genesch, “A circadian gene expression atlas in mam-
mals: implications for biology and medicine.,” Proceed-
ings of the National Academy of Sciences, vol. 111, no. 45,
pp. 16219–16224, 2014.
[2] M. Wenwen, J. Zhiwen, C. Yang, C. Li, S. Aziz, and
J. Yuchao, “Genome-wide circadian rhythm detection
methods: systematic evaluations and practical guide-
lines.,” Briefings in Bioinformatics, vol. 22, no. bbaa135,
2021.
[3] M. Hughes, K. Abruzzi, R. Allada, R. Anafi, A. Arpat,
G. Asher, P. Baldi, C. de Bekker, D. Bell-Pedersen,
J. Blau, S. Brown, M. Ceriani, Z. Chen, J. Chiu, J. Cox,
A. Crowell, J. DeBruyne, D. Dijk, L. DiTacchio, and
J. Hogenesch, “Guidelines for genome-scale analysis
of biological rhythms,” Journal of biological rhythms,
pp. 380–393, 2017.
[4] J. T. VanderPlas, “Understanding the lomb–scargle peri-
odogram,” The Astrophysical Journal Supplement Series,
vol. 236, no. 1, p. 16, 2018.
[5] G. Cornelissen, “Cosinor-based rhythmometry,” Theoreti-
cal Biology and Medical Modelling, vol. 11, no. 1, pp. 1–
24, 2014.
[6] R. Yang and Z. Su, “Analyzing circadian expression data
by harmonic regression based on autoregressive spectral
estimation,” Bioinformatics, vol. 26, no. 12, pp. i168–
i174, 2010.
[7] M. E. Hughes, J. B. Hogenesch, and K. Kornacker,
“JTK CYCLE: an efficient nonparametric algorithm for
detecting rhythmic components in genome-scale data
sets,” Journal of biological rhythms, vol. 25, no. 5,
pp. 372–380, 2010.
[8] P. F. Thaben and P. O. Westermark, “Detecting rhythms
in time series with rain,” Journal of biological rhythms,
vol. 29, no. 6, pp. 391–400, 2014.
[9] G. Wu, R. C. Anafi, M. E. Hughes, K. Kornacker, and
J. B. Hogenesch, “Metacycle: an integrated r package to
evaluate periodicity in large scale data.,” Bioinformatics,
vol. 32(21), no. 3351–3353, 2016.
[10] M. Moˇ skon, “CosinorPy: a Python package for cosinor-
based rhythmometry,” BMC bioinformatics, vol. 21, no. 1,
pp. 1–12, 2020.
[11] D. Chicco and G. Jurman, “The advantages of the
matthews correlation coefficient (mcc) over f1 score and
accuracy in binary classification evaluation,” BMC ge-
nomics, vol. 21, no. 1, pp. 1–13, 2020.
[12] Google Inc., “The Directions API overview.”
https://developers.google.com/maps/
documentation/directions/overview, 2022.
[13]
ˇ
Spela Verovˇ sek, M. Juvanˇ ciˇ c, S. Petrovˇ ciˇ c, T. Zupanˇ ciˇ c,
M. Svetina, M. Janeˇ z,
ˇ
Ziga Puˇ snik, N. Velikajne, and
M. Moˇ skon, “An integrative approach to neighbourhood
sustainability assessments using publicly available traf-
fic data,” Computers, Environment and Urban Systems,
vol. 95, p. 101805, 2022.
[14] C. Schmal, G. M¨ onke, and A. E. Granada, “Analysis of
complex circadian time series data using wavelets,” in Cir-
cadian Regulation, pp. 35–54, Springer, 2022.