https://doi.or g/10.31449/inf.v48i1.3366 Informatica 48 (2024) 131–140 131
Generating L yrics using Constrained Random W alks on a W ord Network
Žiga Babnik, Jasmina Pegan, Domen Kos and Lovro Šubelj
Faculty of Computer and Information Science, V ečna pot 1 13, 1000 Ljubljana, Slovenia
E-mail: zb1996@student.uni-lj.si, jp2634@student.uni-lj.si, dk6314@student.uni-lj.si, lovro.subelj@fri.uni-lj.si
Student paper
Keywords: lyrics, lyrics generation, Markov models, networks, network analysis, poetry generation, semantics
Received: November 16, 2020
In the paper we pr esent an appr oach for automatic lyrics generation. Fr om the American National Corpus
of written texts we build a W or d Network, which encodes wor d sequences. L yrics ar e then generated by
performing a constrained random walk over the W ord Network . The constraints include the structur e of
the generated sentence, the r hythm of the lines of the stanza or the r hymes of the stanza itself. L yrics ar e
generated using each constraint individually and also using all thr ee constraints at the same time. W e tested
the single constraint strategies using a toy example, while the r esults of the joint strategy wer e subject to
human r eview . While the given pr operties of the toy example, wer e kept in the r esults, r eplicating the toy
example perfectly pr oved a difficult task. The r esults of the questionnair e showed that lack of a deeper
meaning and strange capitalization wer e the main r easons that our r esults did not appear as though they
wer e written by a human.
Povzetek: A vtomatsko generiranje besedil pesmi bazira na uporabi omejenih naključnih spr ehodov po
besednem omr ežju W or d Network, vzpostavljenem iz Ameriškega nacionalnega korpusa. Besedila se gener -
irajo z upoštevanjem struktur e stavka, ritma in rim.
1 Intr oduction
Natural Language Pr ocessing (NLP) is becoming a very
popular research field, with many researchers working on
it. Many methods for speech recognition, understanding of
language and generation of text are being developed. In this
paper we concentrate on the subtask of NLP which is text
generation. More specifically we address the problem of
generating lyrics that resemble real lyrics in some way .
Since the development of deep neural networks most of
the state of the art approaches for text generation are based
on extracting features with deep learning. Our approach
is to use NLP tools and methods to extract them manually
and pack them into one or more networks, all containing
some information about real lyrics. The main idea is to use
a big data set of existing songs and possibly other texts and
construct the needed networks out of them. W ith some con-
strained random walk through this networks we then gen-
erate lyrics for new songs. The nodes in the main network,
we call the W or d Network , are words and the edges are rela-
tions between them. W e focus on building many strategies
where each strategy ensures one property of real texts is sat-
isfied, such as rhythm, rhymes and sentence structure, all of
which play a major role in lyrics. By combining these indi-
vidual strategies we want to create a system that generates
lyrics which mimic real lyrics in many dif ferent aspects.
In section 2 we overview papers relevant to our research.
Firstly we present an overview of the field in section 2.1 ,
after which we take a closer look at three most relevant pa-
pers. In section 2.2 we present a paper [ 9 ] that introduces
the PoeT ryMe poetry generation system, in section 2.3 we
present the paper [ 8 ] that introduces the T ra-la-L yrics song
lyrics generation system and in section 2.4 we present a pa-
per [ 6 ] that introduces a Markov Constraint based system
for lyrics generation.
In section 3 we present the data used to build the neces-
sary networks and generate new lyrics as well as the W or d
Network , which is the central data structure of our system.
In section 4 we present our general approach for the im-
plementation of a lyrics generation system. Firstly in sec-
tion 4.1 we present how the lyrics structure was generated,
which is then used in dif ferent text generation methods pre-
sented in section 4.2 . Finally in section 4.3 we present how
the generated text is reor ganized using the structural infor -
mation to produce the final generated lyrics.
In section 5 we present the results of the dif ferent genera-
tion strategies. Firstly in section 5.1 we present an example
of generated lyrics for each of the developed strategies. In
section 5.2 evaluation of the results using a toy example is
presented, finally in section 5.3 evaluation using public re-
view is presented. In section 6 we present the main results
of our paper as well as propose our interpretation of them.
Finally in section 7 we overview the paper .
132 Informatica 48 (2024) 131–140 Ž. Babnik et al.
2 Related work
2.1 A Survey on intelligent poetry
generation
The authors of the paper A Survey on Intelligent Poetry
Generation: Languages, Featur es, T echniques, Reutilisa-
tion and Evaluation [ 7 ] made an overview of the intelli-
gent generation of poetry area. In the paper they discuss
many t opics, mainly surrounding dif ferent types of poetry ,
the structure of poems and how to recognise them. They
also discuss the most common formulated features and how
to design a generator which takes into account the features
based on the language of the poem. Another important
mention are so called Content featur es which depend on the
grammatical correctness and meaningfulness of the text and
how to achieve them.
In the second part of the paper they discuss artificial in-
telligence techniques for poem generation. One of the in-
teresting approaches is to use genetic algorithms where the
population is represented with initial drafts and in each it-
eration the most promising texts are kept. New poem are
generated using mutations and crossover operations which
are evaluated by some fitness function. Another approach
is to present this as a constraint optimization problem where
constraints are represented with the number of lines, sylla-
bles per line, number of rhymes, etc. The algorithm should
generate a poem such that it optimizes those constraints.
Standard machine learning methods were also used. A Sup-
port V ector Machine (SVM) [ 1 1 ] model was trained on a
poetry corpus and used to predict the next word or syl-
lable. They also used language models to generate po-
etry texts which were represented with Markov models and
some Deep Neural Networks (DNNs) which includes Re-
current Neural Networks (RNNs) [ 10 ].
In the last part they also discuss the evaluation of such
texts where most of the reliable evaluation is still performed
by humans. They discuss some metrics which are mostly
used for classification of the poem type by measuring oc-
currence of dif ferent properties in a generated poem.
2.2 PoeT ryMe
In the paper PoeT ryMe: a versatile platform for poetry
generation [ 9 ] an automatized poetry generation system for
Portuguese poetry is presented. It uses a set of seed words
to describe the general context of the goal lyrics, and a
poem template for structure and rhythm. PoeT ryMe sup-
ports syllable-based rhythm with no regards to stress pat-
terns. Also grammar and word relations represented by re-
lational triples(node
1
,relation _type,node
2
) can be user -
defined.
The paper categorizes poetry generation techniques into
four categories: template-based where a sentence is gener -
ated in accordance to the template, generate-and-test where
n sentences are generated and the best is chosen, evolu-
tionary where n poems are generated, then the best few
are selected and crossed repeatedly and case-based reason-
ing approach that uses adaptation of existing songs. Imple-
mentation of the algorithm PoeT ryMe uses three dif ferent
strategies to generate lines: basic which is categorized as
template-based, generate-and-test and an evolutionary ap-
proach. The system is modular , it consists of a sentence
generator , grammar processor , relations manager , contex-
tualizer , syllables utility , sentiment processor and a gener -
ation strategy [ 8 ] already described.
Three generated poems are presented as the results. The
authors confirm that following multiple properties of poetry
such as meaningfulness, grammatical correctness and poet-
icness at the same time is hard. PoeT ryMe generates gram-
matically correct sentences which are somehow related to a
given keywords and at the same time conforming to given
structure. Only the evolutionary approach has rhymes with
high probability .
2.3 T ra-la-L yrics 2.0
In the paper T ra-la-L yrics 2.0: Automatic Generation of
Song L yrics on a Semantic Domain [ 8 ] a system for au-
tomatic generation of lyrics is presented. T ra-la-L yrics
2.0 generates text with rhymes on a semantic domain with
a given rhythm, based on input music. Its predecessor
T ra-la-L yrics generated rhymed rhythmicized text based on
stressed syllables with no regards to semantics. The 2.0
version integrates the previous approach with PoeT ryMe to
achieve generation of meaningful lyrics on a given topic
with rhythm and rhymes.
T ra-la-L yrics has two rhyming strategies:
Rhythm+Rhymes (RR) and Generative Grammar (GG).
The RR strategy prefers rhymes at specific parts of the
song. In addition to that, GG sets morphological con-
straints. As lyrics are often repetitive, both strategies also
include a repetition parameter .
The implementation of T ra-la-L yrics 2.0 was derived
from PoeT ryMe by changing the algorithm to accept a song
as an input and by creating a new generation strategy which
considers also the rhythm.
Results are again presented in the form of generated
lyrics. The results of T ra-la-L yrics and T ra-la-L yrics 2.0
are evaluated empirically and numerically on a number of
points such as rhythm, rhymes, semantics and meaning-
fulness. On the average T ra-la-L yrics 2.0 outperforms its
precedent, but although it shows improvement in meaning-
fulness, it is still far from perfect.
2.4 Markov constraints for generating lyrics
In the paper [ 6 ] the authors used Markov models to gener -
ate lyrics in the style of existing authors. Since the Markov
chains are not suitable to satisfy the non-local properties of
poems such as structural constraints, the authors developed
a more advanced framework. Using so called Constrained
Markov Processes (CMP) they generated texts that were
consistent with the corpus. The idea is to represent the prob-
Generating L yrics using Constrained Random W alks… Informatica 48 (2024) 131–140 133
lem as the constraint satisfaction problem. A Markov prob-
abilistic model is then built in two steps. They presented
two dif ferent constraints. The first one is replacing the tran-
sition probability in the standard Markov model. It is called
Markov constraint and beside the transition probability also
holds a constraint variable. The other type is called Contr ol
constraint which needs to be satisfied in some specific state.
The Markov constraints on each transition are then set so
that they satisfy Contr ol constraint . Using these techniques
they were able to keep structural properties of the poems
such as rhyme and rhythm.
They also demonstrated the methods and evaluate them.
The evaluation was again performed manually by 12 vol-
unteers.
3 Data
Our approach is based on a constrained random walk over
the so-called W or d Network . W e have to ensure that the
W or d Network is lar ge enough, so that we will be able to
perform the constrained random walks on it. In order for the
W or d Network to be lar ge enough it needs to be constructed
from a lar ge data set, we chose the Open American National
Corpus data set [ 3 ] which contains over 6000 texts from
dif ferent domains, totaling around 1 1 million words.
Since our approach tries to generate text that mimics
lyrics by some property , we also need a data set that in-
cludes lyrics, from which we will be able to extract these
properties. W e chose the Song L yrics data set [ 5 ] available
on the Kaggle platform. The data set includes lyrics from
49 dif ferent artists such as Adele, The Beatles, Bob Marley
and countless others, gained from free online lyrics hosting
websites using a Python script. For each artist a single text
file is available that contains lyrics from several songs of
the artist. Since the data structures built from this data set
are specific to certain sub-tasks of our approach, we will
introduce them later on.
3.0.1 W ord network
The W or d Network is a directed network and represents the
dependencies between single words in the lyrics, the nodes
in the network represent individual words, while the links
show if two words appear in the lyrics one after another .
T o build such a network we first tokenize each sentence of
the texts. W e than construct a list of all word tuples, such
that the first word in the tuple is always followed by the
second word in the tuple in the lyrics. T o build the network
we than iterate over all such tuples adding individual words
as nodes in the network, where each word node gets the
following attributes: the Part-of-Speech tag (POS tag) of
the word and a list of all possible phonemes of the word.
After adding both words from the tuple into the network
we than do the following, if a link already exists between
the words in the network we increase the weight of the link
by one, on the other hand if a link does not exist we simply
add it with weight equal to one.
T able 1 presents some basic statistics of the W or d Net-
work , while Figure 1 presents the indegree and outdegree
distributions of the network.
Statistic Result
Number of nodes (n ) 601 15
Number of links (m ) 2357451
A verage degree (k ) 39
Density (ρ ) 0.00065
Number of nodes in LCC 601 1 1
A verage clustering coef ficient (C ) 0.467
T able 1: Basic statistics of the W or d Network
Figure 1: Indegree and outdegree distributions
From the statistics we see that the W or d Network is quite
dense, which is a desirable property for our approach. T o
calculate the number of nodes in the lar gest connected com-
ponent, we first turned the W or d Network into an undirected
network, we see that most of the words are within the lar gest
connected component. Finally we see that its degree distri-
butions roughly follow a power -law distribution.
4 Methods
Our approach consists of three stages. In the first stage the
general structure of the lyrics is generated, here we obtain
the following: how dif ferent stanzas such as the chorus and
verse follow each other and also how many lines are con-
tained in each of them. This information is fed to the sec-
ond stage which generates lines for each stanza in the lyrics
structure. The third stage than collects the lines and stacks
them according to the lyrics structure, while also adding
details such a s capitalization and commas. Figure 2 shows
visually how our approach is structured at the highest level.
4.1 Generating lyrics structur e
The approach used to generate lyrics structure uses a sim-
ple network called the Structur e Network . The Structur e
Network is a directed network, which contains the four
most basic blocks of lyrics: intr o , verse , chorus and bridge .
Figure 3 shows how these nodes are connected and the
134 Informatica 48 (2024) 131–140 Ž. Babnik et al.
Figure 2: V isualization of approach pipeline
weights of each connection. The Structur e Network was
handcrafted and represents only a rough approximation of
how a song can be structured.
T o generate the lyrics structure a random walk starting
from the intr o node was performed. The walk was stopped
once more than five steps were performed and the current
observed node was not verse , meaning we did not want our
lyrics to end with a verse .
Figure 3: V isualization of the Structure Network
After the lyrics structure was generated we also gener -
ated the number of lines each part should contain. This was
done simply by randomly selecting a number in a given in-
terval. The interval was defined as [3, 6] for verse , chorus
and bridge , while for intr o it was defined as [2, 4] .
4.2 Generating text with certain pr operties
In the following section we present approaches for generat-
ing texts with certain properties. These properties include
proper sentence structure, rhymes and rhythm. W e intro-
duce strategies for generating lyrics that take only one of
these properties into consideration and also a joint strategy
which takes all three into consideration.
4.2.1 Generating text with pr oper sentence structur e
W e propose a model that takes into account the sentence
structure present in individual lines of lyrics. Alongside the
W or d Network this strategy also uses the Part-of-speech tag
network or POS-tag network .
The POS-tag network is a directed network that contains
information about the sequences of line structures we can
observe in lyrics. Firstly each line in the lyrics is repre-
sented as a sequence of Part-of-Speech (POS) tags, that tell
us the structure of the line. The POS tag sequences of lines
are then added to the network as nodes, we create a link
between POS tag sequences X and Y , if it holds that in the
lyrics POS tag sequence Y comes directly after POS tag
sequence X. By performing a random walk over such a net-
work we not only guarantee the proper structure of each
individual line, but also proper ordering of lines.
T o generate text for a given part, we first generate a se-
quence of line structures using the POS-tag network . This
is done using simple random walks over the network, where
the number of steps equals the length of the given part. The
walk generates all the needed constraints for this strategy .
Once the constraint in the form of POS-tag structure of
individual lines has been generated, we perform a con-
strained random walk on the W or d Network . For each line
a seperate constrained random walk is performed. The first
word of a line is chosen randomly among all words with
the proper POS-tag, for each successor we search among
all neighbors of the current word, that again have the proper
POS-tag. If such a neighbor does not exist, we start the walk
for the current line from the beginning.
4.2.2 Generating text based on rhyme scheme
W e first defined three types of rhymes. Since the W or d Net-
work is constructed from random texts and not lyrics we do
not expect to find many neighbourhood words that corre-
spond to a perfect rhyme. That is why we allow three types
of rhymes. The first one is a perfect rhyme which is defined
as the rhyme where the stressed vowels and any succeeding
consonants are identical e.g. believe and conceive [ 4 ]. The
second rhyme is called assonance or a vowel rhyme. It is a
rhyme in which the same vowel sounds are used with dif-
ferent consonants in the stressed syllables of the rhyming
words e.g. pentient and r eticence [ 1 ]. The third rhyme is
called consonant rhyme and is the repetition of consonants
or consonant patterns especially at the ends of words e.g.
bell and ball [ 2 ].
Our strategy generates words in s uch order that they fol-
low the rhyme scheme we chose. By defining the num-
ber of words in a line and the rhyme scheme, e.g ”ABBA”,
we then generate our lyrics by randomly choosing the node
in the W or d Network . As in a random walk we chose a
successor by taking into account the weights of each edge.
After reaching the last word in the line we chose the next
node only from the successor that do not violate the rhyme
scheme. In this step we chose the successor both uniformly
at random and by weighting each rhyme. Most of the stop
words have high weights in the W or d Network and they are
usually short and by definition most of them rhyme. That
is why they were chosen in most of the cases. The strategy
generated more natural rhymes when choosing them ran-
domly .
4.2.3 Generating text with rhythm
W e propose a model that generates lyrics based on a given
rhythm. The model uses the W or d Network and an addi-
tional network storing rhythm data. The Rhythm network is
Generating L yrics using Constrained Random W alks… Informatica 48 (2024) 131–140 135
a weighted directed network that represents rhythms found
in the lyrics. It consists of three nodes:− 1 represents start
or ending of a line, 0 stands for an unstressed syllable and1
indicates a stressed syllable. The weights of edges were de-
cided by calculating the normalized number of transitions
between the corresponding syllables. W e can use this net-
work to generate a random rhythm by starting at node− 1
and choosing a random neighbor taking into account cor -
responding edge weights. When we reach− 1 again, the
generated line of rhythm is completed. The generated line
of rhythm is then corrected to include more repetitiveness
which is expected from rhythm to feel natural. The first n
syllable stresses are chosen as baseline rhythm and are then
propagated throughout the rhythm line with 70% probabil-
ity . Number of syllables n is a randomly chosen number
between 2 and 4 .
The rhythm-based model has two variants: one is given
a rhythm and the other generates the rhythm for each verse.
Each variant has two sub-variants, one uses random walk
strategy and the other uses POS tag strategy .
First, the rhythm is acquired in form of a string where 0
stands for an unstressed syllable and 1 for a stressed syl-
lable. If the rhythm is given, it is expanded to match the
line length. Otherwise, a random line rhythm is generated
from the rhythm network for each verse of generated text.
W e start with a random node from the W or d Network or the
POS-tag network , depending on the variant, and expand the
line with successors that match the required rhythm.
4.2.4 Generating text with multiple constraints
Our last strategy combined all the above constraints: struc-
ture, rhythm and rhyme. Each line was generated such that
it took into account the structure, rhythm and the rhyme.
Although the network is relatively lar ge, it sometimes hap-
pened that none of the successors would satisfy all the con-
straints. In that case we performed a random jump and
started generating the current line again. This was repeated
until the generated text satisfied all the constraints.
4.3 Constructing lyrics fr om generated text
In the final phase of our approach we combine the gener -
ated text, with the structure of the lyrics to produce proper
lyrics. Firstly we reorder the text according to the lyrics
structure, so that the parts properly follow each other . Each
part also gets an annotator in the form of [<part name>] .
The ordered and annotated text is capitalized using a simple
POS-tag heuristic, where we simply check the tag of each
word. If the tag of the word is NN or NNS , or if the word is
the first in line, we capitalize it. Commas are also added to
each line, as well as a line separator between each part.
The text generators described previously generates text
for each unique part only once, meaning that if the lyrics
contain for instance more than one chorus all would contain
the same text. T o avoid exact repetition five to ten words
in each part were chosen at random to be replaced. The
whole text of the part along with the list of words to replace
were then sent back to the generator . The generator then
replaced the words according to the same constraints the
text was generated in the beginning.
5 Results
In the following section we present the results of each strat-
egy described in section 4.2 . For each strategy we present
lyrics generated by that strategy .
In order to evaluate how well the strategies work, we de-
vised two dif ferent evaluation approaches. The first ap-
proach using a toy example was applied to all strategies
where only one property of lyrics was being considered,
the second approach using public review was applied to the
strategy which considered all three properties.
5.1 L yrics generated by strategies
In the following section we present lyrics generated by each
of the strategies developed.
5.1.1 L yrics generated by the sentence structur e
strategy
The lyrics generated based on the sentence structure strat-
egy are presented below .
[intro]
Benjamin upward Half as described a V ariety ,
Carriages looser Sugar as,
Nov Angstroms o Ja Et Banditry ,
Newsgroups has bacall Whitney .
[verse]
Okay like newsweek explain what you awoke,
Handhold Inspire Confidence Nudge between Gore Camp s play it s,
Think I this Observation shows helplessly he co their Interfaces,
Mine Everybody harsher Children,
Professor tour Hitler S to have,
Concludes it s my E.
[chorus]
T ina would have their Security Capital Sneeze Bob Bookies won T ,
Hottest T ech Approach for nettlesome Human,
V ersa Whereas outfit Blame for directional the Apparatus,
Lor ge Jr Call Attention Let s h,
Craigie the Membrane regular blazes the Funding Mix.
[bridge]
Postage in Y akovlev for Discounting,
Sloth less stringent E ether Inhibitory ,
Boris a T ampax described i attend academic Literature Briefly back mr Morris I
think they .
[verse] Method like newsweek explain what you awoke,
Handhold Inspire Confidence Ricky between Gore Camp s play it s,
Think I this Detainee chides liter he co their Interfaces,
Mine Encyclopaedia Hanover Campsites,
Professor tour Hitler S to have,
Concludes it arsenal my E.
[chorus]
T ina could have their Security oilseed Danforth Bob Bookies won T ,
Freshener tech Airport for nettlesome Human,
T eacher whereas Outfit Blame for directional Ar gentina Apparatus,
Strove Jr Call Attention Let s h,
Craigie the Neville Sari blazes the Funding Mix.
While the generated sentences might follow the correct
POS tag structure it is clear that this constraint is not strong
136 Informatica 48 (2024) 131–140 Ž. Babnik et al.
enough to enable the generation of lyrics that would to some
extent resemble real lyrics. Generation of meaningful lyrics
is not one of our main goals, but rather t hat individual lines
and smaller building blocks could resemble real parts of
lyrics. This generation strategy is not informative enough
to enable the generation of such lyrics.
5.1.2 L yrics generated by the strategy based on the
rhyme scheme
In the example below we can observe the lyrics generated
by our strategy with a predefined rhyme scheme which is
”ABA” for intr o , verse and chorus , while for the bridge
”ABABAB” was used.
[intro]
Shrug Democrats have and when in,
V oided the strengthening their biggest Risk,
T etrads were controlled T rials since when.
[verse]
Plumage of ethical Story ,
Stoneburner and the to,
Somehow forced the Study .
[bridge]
Randy s no Consensus that federal Reserve,
Lighting Shows which the Elk they be,
Summits to any other Poet who was,
Coelho but if you find Anybody who,
Dystrophy Pages for personal favorite Newspaper but,
Retried unpopular Gingrich in faces some Corroboration.
[chorus]
Lousiness impregnable the Incorrect,
Kolb and other racial,
Berri for Performance Checked.
[bridge]
Randy s no Consensus that federal Reserve,
Lighting Shows which the Elk they be,
Summits to all other Poet who preserve,
Coelho if you find Anybody than me,
Dystrophy Pages for personal favorite Newspaper Reserve,
Retried unpopular Gingrich in faces some be.
W e can observe that some rhymes are more natural like
”be-me” and ”preserve-reserve”, while the others satisfy
the definition but are not so natural to read, e.g. ”in-when”.
5.1.3 L yrics generated by the rhythm-based strategy
W e tested our rhythm-based strategy . First we look at the
variant of the algorithm which accepts rhythm as an input.
When given the rhythm ”01 1” of Prešeren’ s Povodni mož ,
the random walk sub-version returns the following exam-
ple.
[intro]
Duked out on the pre Birth Control of a Round Golf,
T o find a Campaign seems and a Union Rules in,
The Lung lymphocytes a T ractor it is so says,
The good Seafood in S a new Car are not the.
[verse]
Kasparov was Things on the T imes you about the,
Subject Matter when just Delights in e big Hot,
And more Y ears make are Fans will find both good or wade.
[chorus]
Embodiment of Error R a relaxed Dole,
Campaign had to turn out the Sum up to be those,
Of south will its Roots to its Entries and but and,
A M Rosenthal who was true but impressed by ,
The Bush s alleged new Account for the six Lines,
Though a new Environment Act at the Field flat.
[bridge]
Nationals leaving its Building overlooks this,
Has the brooklyn Heights or a lar ge Part of Spread his,
It a Ratio of the new Products that we,
Punished the Schedule slipped across Species female,
In the black and it a Policewoman who did,
Seduce Frank in one Y ear Alumni and T ruck on.
[verse]
Southerner does Things on the T imes you intend the,
Subject about when that delights in T ake big Love,
And more Y ears make are Goals will find both good or wade.
[chorus]
Embodiment of Error R to whether Dole,
Campaign had to turn out the Sum up may be those,
Of Range will its Roots to T ime Limit and but and,
A M Rosenthal who was true but impressed by ,
The Bush his alleged new Account for the six Sets,
Though find new Environment Act at the Field flat.
The text can definitely be read in the given rhythm, but
some words are accented in an unusual way . Some words
even change their meaning by being dif ferently accented,
ex. ”subjécts” - verb vs. ”súbjects” - noun. The POS-tag
sub-version returns similar results.
W e also tested the version of the algorithm that generates
the rhythm from the Rhythm network . W e limited the num-
ber of syllables per line to no less than 3 and no more than
12 . The random walk sub-version returns the following ex-
ample.
[intro]
Peaking they the Engagement,
Such the regional T rain in,
Same Fighters that the Cycle,
Genes the T erms the Reporting.
[verse]
Bard on R S back up quick Rise,
Up an of the Mars is deemed key ,
Sounds and seek to such as with a.
[chorus]
Bags of Store for a Print Ad the Need for the far more than for ,
Non u s next such as shown were the right Heart Rate Hike to the,
Drug Use thanks the Half of slate S T rees that was you help be blurred.
[verse]
Oath on R S back up quick Rise,
Up an Ed the Chain were deemed key ,
Al and Seek to such as with a.
[chorus]
Bags of Store must a Sand Ad we need for the far more than for ,
Non u S of such as shown were cool right Heart Rate Hike to the,
Drug Use thanks the black of slate S T rees that was you help be blurred.
[bridge]
Au Buisson de mi usa and was low in Kosovo,
And Critics were white House from the State Bar is not a Detour ,
T o win and leaf of the Floor to see a four Y ears the Excess,
Hair were sized to what this T est Strips of the n four the Upswing,
Is the most to the Inn Chain had not flinch the Bruce T ownsend had.
This nicely emulates the fact that song lyrics are not fixed
in structure. But although the line lengths are more relaxed,
the rhythm does not flow as we would expect. There is
no obvious dif ferences between these results and those of
POS-tag sub-version.
Generating L yrics using Constrained Random W alks… Informatica 48 (2024) 131–140 137
5.1.4 L yrics generated by the combined strategy
Our last strategy combined all previous constraints: sen-
tence structure, rhyme and rhythm. L yrics generated by this
strategy are presented below .
[intro]
Does the Coast a V eto ugh the rayed Moth Mouse Rat,
Hora Asthma have little of it demand Side,
If beck a Rosewood the Kiryat to upgrade my ,
Exam on Police then has an Iron Mask.
[verse]
Thumbs Share the Extent other sift the Koran Clif f,
In their Favor Growth even sulk private all Non,
Depressed Stock and Health all this was each Survey consists,
Response Part of where the Pursuit getting the bronzed,
Ef fects of the Stock Linda this nazis T ees in,
Its fred O and we have a low Signal Fig W asps.
[chorus]
Its Thank the Bloomber g Y ears of a Story one W ife,
And knew that if all Lawyer now a Decade long,
And glass the perfect W eather and his Approach T ry ,
His Rome Site the Role of no T erms meant a Boat was,
Both Rest T ill de la vita with Risk of the Iles,
Despite Market to the Success of his Bork was.
[bridge]
It and Symbols bose boris worm Culture through Feed,
A Means of wednesday Gen we obtain a,
Rigdon over in Cris the Midline us is sir ,
The real T ime of the Gene are shown gown within Reach.
[verse]
Thumbs Share the Extent other fro the Koran Clif f,
In their Janeway Growth even sulk private all Non,
Shuf fled Stock and Health all this was each Survey consists,
Response Part of where the Panzer getting those run,
Ef fects of the Stock Linda this nazis T ees in,
Its fred O and we have a low Proscribes Fig W asps.
[chorus]
Them thank the bloomber g Y ears of these Story one W ife,
And knew that if all Lawyer now the Decade long,
And glass the perfect W eather and his Approach Enough,
Me qualms soo the Role of no T erms meant a Boat was,
Both Rest T ill de lymph vita with Risk of the Prescribed,
Despite Market to the Success of psi Ip was.
It was quite dif ficult to satisfy all the constraints and
thus some of the lines are a bit strange. Even before join-
ing the strategies it was hard to find a perfect rhyme on a
given W or d Network . W ith additional constraints we elim-
inated even more of the possible candidates so the rhymes
are mostly combined from stop words and common words.
5.2 Evaluation using a toy example
In the following section we present the results of evaluating
the three single constrained strategies using a toy example.
For the actual example we chose the following short stanza.
The itsy bitsy spider crawled up the water spout.
Down came the rain, and washed the spider out.
Out came the sun, and dried up all the rain,
and the itsy bitsy spider went up the spout again.
First the T oy-W or d Network was built from the toy exam-
ple, later on the POS-tag sentence structure, rhyming and
rhythmic scheme were all extracted from each line in the
toy example. The extracted properties were than used as
constraints in each individual strategy to perform the con-
strained random walk over the T oy-W or d Network .
5.2.1 Results of sentence structur e strategy
The stanza generated using the sentence structure strategy
is presented below .
The Itsy Bitsy Spider crawled up the W ater Spout,
Down came the Rain and washed the Spider out,
Out came the Rain and dried up all the Sun,
And the Itsy Bitsy Spider went up the Spout again.
The resulting stanza is very similar to the original indicat-
ing that proper sentence structure, when building from a toy
example represent quite a strong constraint. When generat-
ing the stanza there is small variation between runs, overall
most results produce a stanza dif fering only in a few words
from the original toy example, at times it can also happen
that the result perfectly matches the toy example.
5.2.2 Results of the strategy based on rhyme scheme
The stanza generated using the extracted rhyming scheme
is presented below .
W ashed the Itsy Bitsy Spider went up all the Rain,
Itsy Bitsy Spider crawled up the Rain and Spout again,
Down came the W ater Spout again dried up the Itsy ,
Rain and washed the Sun and dried up the Bitsy .
Although the generated stanza does not make sense se-
mantically the rhyme is the same as in the original song
”AABB”. The first rhyme is even composed from the same
words as in the original one. Multiple experiments were
performed and in most cases both rhymes were the same
”Itsy-Bitsy”. This is expected since the strategy also uses
weights on the edges, but we can also observe that the strat-
egy reproduces the rhyme of the original stanza.
5.2.3 Results of the rhythm-based strategy
The stanza generated using the extracted rhythm scheme is
presented below .
And the Sun and the Itsy Bitsy Spider crawled up,
The Itsy Bitsy Spider out came the,
Sun and the Itsy Bitsy Spider out,
Came the Sun and the W ater Spout again came the Spout again.
The text is hard to read in that rhythm, one word is even
stressed incorrectly this way ( Bitsý instead of Bítsy ). The
unreadability is expected as the algorithm uses all the possi-
ble word pronunciations and their combinations. The main
problem here is that algorithm chooses many monosyllabic
words which can be pronounced stressed or unstressed thus
fulfilling the rhythm pattern with any combination of such
words. In natural speech the text as a whole would be
pronounced dif ferently , depending on stresses of nearby
words.
5.3 Evaluation using public r eview
In the following section we present the results of evaluat-
ing the strategy that takes into account all three properties
of lyrics: proper sentence structure, rhythm and rhyme. In
138 Informatica 48 (2024) 131–140 Ž. Babnik et al.
order to evaluate this strategy we put together a short ques-
tionnaire. The questionnaire included three sections, each
dedicated to its own generated line or stanza. Each sec-
tion included two questions: the first question was a linear
scale questions where participants had to rate how much
they agree with the statement that the given line or stanza
was written by a human, the second question simply asked
if the participants could briefly explain their choice from
the first question. T o elaborate on the possible answers of
the first question, participants were tasked with submitting
a number between 1 and 5, where: 1 meant strongly dis-
agree, 2 meant disagree, 3 meant neither disagree nor agree,
4 meant agree and 5 meant strongly agree. In total 29 people
participated in the questionnaire, the results broken down
into each section are presented below .
5.3.1 Results for the given line
In the first section, the participants were given the following
generated line.
Love Af fair issue down from July
Since the strategy using all three constraints does not
have any contextual information that it could use when gen-
erating dif ferent lines for a stanza, it would be natural that
people would think that a single line from the lyrics is more
likely to be written by a human than a whole stanza. This
is why we included this question.
Figure 4 shows the results of the first question for the
given line. W e can observe that most of the participants dis-
agreed that the generated line was written by human, while
a small part were undecided or thought that it is possible
that the line was written by a human.
Figure 4: Results of the linear scale question for the given
line
When we asked them to reason about their decisions most
of them said that they do not believe it was written by a
human since the sentence does not have semantic meaning,
while some others said that the capitalization of the word
affair was the main reason for their decision.
5.3.2 Results for the first stanza
In the second section the participants were given the fol-
lowing generated stanza.
Y ou miss Hip for her from,
This Game as a Fast will,
See the most of the Dow .
W e included two such stanzas, so that we could make a
comparison between the two and possibly nail the reason
why one would appear more like it was written by humans.
Figure 5 shows the results of the first question for first
given stanza. W ith a longer text the disagreement that the
text was written by human was stronger . Most people either
strongly disagreed or disagreed that the stanza was written
by humans, with only a handful being undecided or agree-
ing that the stanza might be written by a human.
Figure 5: Results of the linear scale question for the first
given stanza
The reasoning was similar as before that most of the text
does not make any sense. A lot of them were also confused
about upper case letters in the middle of the sentence. Some
pointed out that the stanza did not have proper rhymes.
5.3.3 Results for the second stanza
In the third section, the participants were given the follow-
ing generated stanza.
Cleaned up that her ,
Mouth while T ag T eam,
Though that my T ime,
Fig Leaf you see.
Figure 6 shows the results of the first question for the
second given stanza. The results of the first question are
quite similar to the first given stanza, with even more people
strongly disagreeing that the given line was written by a
human.
The reasoning for the choices is again that there is no
semantic meaning in the lines of the stanza. Some of the
participants also pointed out that the rhymes do not look
natural, while minority pointed out that it looks more natu-
ral than the previous stanza.
Generating L yrics using Constrained Random W alks… Informatica 48 (2024) 131–140 139
Figure 6: Results of the linear scale question for the second
given stanza
5.3.4 Comparison of the two generated stanzas
In the last section we asked people if the first generated
stanza replicated real lyrics better than the second stanza.
Figure 7 shows results for the given statement.
Figure 7: Results of the linear scale question about the com-
parison of the two generated stanzas
The results clearly show that most people agreed that the
first stanza replicated real lyrics better . T rying to compare
the answers of the second questions of both the given stan-
zas, poses a problem. On one hand there is a clear consen-
sus that the first stanza looks more real, while the answers
for both seem to indicate the same problems, with meaning,
unusual rhymes and capitalization. Between the two stan-
zas we could not identify the exact reason why one might
appear more real than the other . What is clear is that indi-
vidual lines from the text do appear more natural than both
stanzas.
6 Discussion
Looking at the results gained from the toy example, it is
clear that replicating one property of the original example
does not give enough information to properly reconstruct
the toy example. It is clear that using only rhymes as a con-
straint here produces a result that can be very dif ferent from
the original as we limit only the end words of each line. It
is also clear that using the sentence structure produced the
best results, this is probably due to small variability in the
POS-tags of the words in the toy example. Leading to the
fact that not many constrained random walks on the built
T oy-W or d Network produce the correct extracted sentence
structure. Overall, we are satisfied with the results gained
from using the toy example, since the properties are repli-
cated perfectly , which is the main goal of these strategies.
The results of the questionnaire are very clear cut, our
combined strategy does not produce lines or stanzas that
would appear as though they were written by a human. This
results does not surprise us much, as the main reason many
people named is simply a lack of meaning in the lyrics, a
property we did not incorporate into our system. A bit of a
surprise is that sometimes people named the lack of proper
rhymes to be the main reason for their decisions, most peo-
ple probably expect rhymes to be clear cut even though
many types of rhymes exist. Another reason that people
kept mentioning is the capitalization of words, which could
be easily fixed by using a more complex deep model for the
capitalization of words, since this w as not the focus of our
research we do not see this as a big problem.
From the results it is clear that convincing people that a
single line was written by a human is easier than a whole
stanza. The reason for this is probably very simple, as a
line is shorter thus making it appear proper and cohesive is
much easier than doing the same with a whole stanza, for
which we would need long-term word contexts.
The answers show that one of the two given stanzas ap-
peared more like a real stanza, while the comments and rat-
ings of both of them seemed to dif fer very little. Our ar gu-
ment for this is that poems and lyrics for humans represent
much more than text that simply follows some number of
properties, it has a deeper meaning that dif fers for each in-
dividual.
7 Conclusion
W e proposed several approaches for generating structured
lyrics, which imitate some property of real lyrics. The ap-
proaches trying to imitate only one property were evalu-
ated using a toy example, while the combined approach was
evaluated using a questionnaire, to determine how human-
like the generated lyrics were.
Using the sentence structure strategy we performed con-
strained random walks on the W or d Network . T o create the
needed constraints a random walk on the POS-tag network
was done, creating a sequence of sentence structures. The
results showed that while the generated lyrics did follow
the proper sentence structure, they did not resemble actual
lyrics.
The rhythm-based strategy with given rhythm generates
texts that follow the rhythm well. There are some words
that sound unusual when stressed that way , but overall the
resulting text is quite flowing and readable. The results of
the version which generates its own rhythm are more con-
140 Informatica 48 (2024) 131–140 Ž. Babnik et al.
fusing to read as it is not clear what rhythm is used. Al-
though the same rhythm is used for multiple lines and is
self-similar within a line, it is not obvious to the reader what
is the actual rhythm.
The last strategy generated lyrics according to a prede-
fined rhyme scheme. Since all our lyrics were generated
using the same W or d Network and since it was not con-
structed on poems we did not expect many perfect rhymes.
The results confirmed this belief. W e also tried to learn the
rhymes from lyrics. Since the lyrics are written in a modern
style, e.g. hip hop, rap, which do not have a specific rhyme
scheme like for example ballads, this did not work well.
Finally , we combined the three aspects of lyrics gener -
ation. Problems arose when we were trying to take into
account all of the strategies, as sometimes no successors
which fit all the constraints could be found, so we had to
dismiss some of the already generated text and retry from
some other node. Public evaluation showed that the main
reason why our generated lyrics did not seem natural was a
lack of deeper meaning, capitalization of words and rhymes
that are not always straightforward.
Automatic lyrics generation is a hard problem and there
is definitely more work to be done for our methods to pro-
duce valuable results. W e realized that while our algo-
rithms are able to achieve proper sentence structure, rhyme
and rhythm, the resulting lyrics did not fully replicate real
lyrics. One improvement that could be done to tackle this
issue would be to build an improved data set from which we
would build the W or d Network . Another would be to try to
create constraints around the meaning of lyrics so that we
would impose not only structural rules into the generated
lyrics but also some form of meaning that could be picked
up by a human reader .
Refer ences
[1] Assonance rhyme definitions. https://www.
dictionary.com/browse/assonance . Accessed:
06-06-2020.
[2] Consonant rhyme definitions. https://www.
thefreedictionary.com/consonant+rhyme .
Accessed: 06-06-2020.
[3] Open ANC – Open American National Corpus. http:
//www.anc.org/data/oanc/ . Accessed: 06-06-
2020.
[4] Perfect rhyme definition. https://www.
collinsdictionary.com/dictionary/
english/perfect- rhyme . Accessed: 06-06-2020.
[5] Song L yrics – Kaggle. https://www.kaggle.com/
paultimothymooney/poetry . Accessed: 08-05-
2020.
[6] Barbieri, G. and Pachet, F ., Roy , P . and Degli Esposti,
M. Markov Constraints for Generating L yrics with
Style. In ECAI (2021), vol. 242, pp. 1 15–120. https:
//doi.org/10.3233/978- 1- 61499- 098- 7- 115
[7] Oliveira, H. G. A survey on intelligent poetry gener -
ation: Languages, features, techniques, reutilisation
and evaluation. In Pr oceedings of the 10th Interna-
tional Confer ence on Natural Language Generation
(2017), pp. 1 1–20. https://doi.org/10.18653/
v1/W17- 3502
[8] Oliveira, H. G. T ra-la-lyrics 2.0: Automatic genera-
tion of song lyrics on a semantic domain. In Journal
of Artificial General Intelligence 6 , 1 (2015), 87–1 10.
https://doi.org/10.1515/jagi- 2015- 0005
[9] Oliveira, H. G. PoeT ryMe: a versatile platform for po-
etry generation. In Computational Cr eativity , Concept
Invention, and General Intelligence 1 , (2021), 21.
[10] Rumelhart, D. E. and Hinton, G. E. and W illiams, R.
J. Learning internal representations by error propaga-
tion. T ech. rep., California Univ San Diego La Jolla
Inst for Cognitive Science, 1985. http://dx.doi.
org/10.1016/B978- 1- 4832- 1446- 7.50035- 2
[1 1] Cortes, C. and V apnik, V . Support-vector networks.
In Machine learning 20 , 3 (1995), 273–297. https:
//doi.org/10.1007/BF00994018