Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
 
ISSN 1855-931X 
SEGMENTING PREFERENCES FOR INVESTMENT BONDS USING 
LATENT VARIABLE MIXTURE MODELS 
Liberato Camilleri 
Depar men of S atistics and Operations Research; University of Malta t t t
f t
Malta 
liberato.camilleri@um.edu.mt
 
Helena Francalanza* 
Department o Ma hematics; Junior College 
Malta 
helena.francalanza@um.edu.mt
Abstract  
Market segmentation is a key component of conjoint analysis which addresses consumer 
preference heterogeneity. Members in a segment are assumed to be homogenous in their 
views and preferences when worthing an item but distinctly heterogenous to members of 
other segments. Latent class methodology is one of the several conjoint segmentation 
procedures that overcome the limitations of aggregate analysis and a-priori segmentation. 
The main benefit of Latent class models is that market segment membership and regression 
parameters of each derived segment are estimated simultaneously. The Latent class model 
presented in this paper uses mixtures of multivariate conditional normal distributions to 
analyze rating data, where the likelihood is maximized using the EM algorithm. The 
application focuses on customer preferences for investment bonds described by four 
attributes; currency, coupon rate, redemption term and price. A number of demographic 
variables are used to generate segments that are accessible and actionable.   
Key words: Latent class models, EM algorithm, Market Segmentation, Conjoint Analysis 
Topic Group: Marketing and Consumer behaviour 
INTRODUCTION 
Market segmentation has become a dominant concept in marketing practice.  Besides being 
one of the major ways of operationalizing the marketing concept, segmentation provides 
guidelines for a firm’s marketing strategy and resource allocation to increase the expected 
ABSRJ 3(2): 105  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
profitability (Wind 1978). Understanding the diversity of preferences and sensitivities of 
customers in the market is one of the greatest challenges of market research.  Market 
segmentation describes the division of a market into homogenous clusters, whose members 
respond differently to promotion, advertising, communication and other marketing variables. 
These clusters are created to group customers with similar needs, tastes and preferences, so 
that products or services can be optimally designed and targeted.   
 
Market segmentation was first described by (Smith 1956) who recognized that segments are 
directly derived from the diversity of customer wants.  The market environment is not static 
and market segments change composition over time. It is the interest of every market 
researcher to identify these changes.  With more direct access to customers via databases, 
the market environment presents new challenges and opportunities for market 
segmentation. New developments in information technology provide marketers with much 
richer information on consumer behaviour. The rapid growth of new technologies in 
information, development of product, production and distribution enables a new company to 
make more efficient use of marketing resources, focussing on the best segments for its 
market products. The ability of a firm to differentiate its products relative to competing firms 
is essential for its survival.  This survival depends on finding and addressing a niche rather 
than trying to be all things to all consumers. Consequently, marketers are focussing on 
smaller segments with micro marketing and direct marketing approaches. On the other hand, 
the increasing globalisation of most product markets is leading many multi-product 
manufacturers to look at global markets that cut across continents.  
 
Six criteria have been frequently put forward as being essential for effective and profitable 
marketing strategies. The identifiablity criterion is the extent that marketers identify 
differences between distinct groups of customers in the market and the ability to classify 
each customer into one or more segments.  The substantiality criterion refers to the size 
issue.  If the identified segment is large enough to ensure profitability then it warrants 
separate market targeting.  In micro markets and mass customisation, smaller segments 
become profitable due to lower marginal marketing costs; whereas in direct marketing, the 
criterion of substantiality can be applied to each individual customer. The accessibility 
criterion is the degree to which marketers are able to reach the targeted segment by a 
distinct marketing mix strategy. Once segments are identified and products are designed to 
suit their tastes, the marketer must be able to identify members of the segments so that 
marketing efforts can be directed to them. In other words, the message must reach the right 
market segment by using the right promotional strategies, media sources and distributional 
efforts to target these people. The responsiveness criterion is the degree by which 
segments respond uniquely to marketing effort targeted at them. Responsiveness is crucial 
for the effectiveness of any market segmentation strategy. Once the market is segmented, 
stability is necessary in which the segments do not change their composition or behaviour 
during the period for the identification of members and the implementation of the 
segmented market strategy.  It is very likely that the segment will not be viable if its 
existence is the result of a short-term phenomenon,. The actionability criterion refers to 
the extent to which the identified market segments provide direction of marketing efforts.  
Segments are actionable if their identification provides guidance for decisions on the 
effective specification of associated marketing strategies towards segment targets.   
ABSRJ 3(2): 106  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
CONJOINT SEGMENTATION METHODS 
In most of the traditional a-priori segmentation approaches the type and number of 
segments were determined in advance by the researcher in which consumers were, very 
often, assigned to segments on the basis of demographic and socio-economic variables. 
Subsequently segmentation shifted to post-hoc predictive approaches because its recent 
developments allow for the grouping of consumers according to how they respond to 
product features in making choice decisions. Segmentation methods differ in three aspects: 
the type of partitioning assumed; the algorithms and estimation procedures used and the 
criterion being optimized.  Some of these conjoint segmentation methods are summarized in 
the subsequent section.  
A review of segmentation methods 
A conjoint segmentation procedure proposed by (Green and DeSarbo 1979) is componential 
segmentation in which consumer descriptive variables are used. Consumer profiles are first 
generated on the basis of such characteristics. Respondents matching these profiles are 
chosen from a sample frame and asked to complete a conjoint task.  From these evaluations, 
the componential segmentation model estimates both the main effects of the design 
variables and interactions between design variables of product and subject profiles. 
Estimation is carried out by minimizing the error sum of squares and the segmentation 
scheme is non-overlapping.   
 
In the traditional two-stage conjoint segmentation approach estimation and clustering are 
conducted consecutively. Individual-level parameter estimates are first obtained by using 
least squares regression.  At the second stage, subjects are clustered into segments on the 
basis of similarity of the estimated parameters through hierarchical or nonhierarchical 
nonoverlapping clustering procedures. One of the limitations of this two stage approach is 
that it ignores biasing errors.  A second problem is that the use of fractional factorial designs 
often leaves few degrees of freedom for estimation at the individual level. This makes the 
parameter estimates unreliable as they become more sensitive to the measurement error. A 
third problem arises when the predictors are collinear.  Near linear dependencies render it 
more difficult to sort out the impact of each predictor on the response and parameters 
estimates tend to be unreliable.  This in turn may cause misclassification of individuals and 
negatively affects the goodness-of-fit and the power of the significance tests. A fourth 
limitation is that least squares regression and clustering procedures optimize different 
criteria. 
 
(Green and Srinivasan 1978) proposed an alternative two-stage procedure. In the first step, 
consumers are clustered on the basis of their preference ratings whereas in the second step, 
separate conjoint models are estimated across the subjects in each of the identified 
segments.  So rather than clustering consumers on the basis of similar parameter estimates 
at individual level, this method applies regression to the responses in each cluster to obtain 
more reliable parameter estimates.  This procedure in effect increases the number of 
observations available for estimating the parameters and thus reduces the error of 
estimation. 
 
(Hagerty 1985) proposed a method based on a weighting scheme which represents a factor-
type partitioning of the sample.  This weighting scheme presents an optimal overlapping 
partitioning obtained by a Q-factor analysis of the between-subject correlation matrix of 
preferences.  A possible problem with this method is the interpretation of the factor solution 
ABSRJ 3(2): 107  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
in terms of segments.  The number of extracted factors need not be an adequate indicator of 
the number of segments.  Another problem is that the factoring solutions are not unique, 
given the rotational indeterminacy of such factor solutions. Procedures proposed to identify 
segments on the basis of the factor solution, very often, result in a loss in predictive 
accuracy. 
 
In response to the limitations of a-priori and two-stage procedures several integrated 
conjoint segmentation methods were proposed in which the parameters within the segments 
are estimated at the same time that the segments are identified.  (Kamakura 1988) 
suggested a hierarchical clusterwise regression procedure that allows for prediction within 
segments.  At the first stage of the algorithm a regression equation is estimated for each 
subject using ordinary least squares, yielding regression parameter estimates of several 
independent variables for each subject. In the second stage a weighting scheme is devised 
that group subjects to maximise the accuracy with which preferences are predicted from 
product profiles. The fusion of any two subjects that yields the minimum increase in the total 
residual sum of squares of the regression across all clusters is retained and the two subjects 
are combined.  The agglomerative process is similar to that of Ward’s method.  In each 
successive stage, segments that provide the smallest possible increase in the pooled within-
segment error variance are linked together.  A predictive accuracy index is computed at each 
aggregation level and provides an intuitive criterion for deciding how many segments to 
retain.  There are two disadvantages to this agglomerative hierarchical method. First, the 
clustering process implies that this method depends in the initial stages on parameter 
estimates at the individual level, thereby creating the danger of misclassification at an early 
stage due to unreliable estimates.  This misclassification may extend to higher levels in the 
hierarchical clustering process.  Second, the number of parameters at the individual level 
may exceed the number of responses and so cannot be estimated.  Models that are over-
parameterised at the individual level yield unstable individual parameter estimates due to 
lack of degrees of freedom.  Statistical tests to check for the significance of parameter 
estimates and check for homogeneity within the segments cannot be used because the 
asymptotic properties do not apply when the number of estimated parameters is close to the 
number of observations. 
 
(Ogawa 1987) presented an approach for rank order preferences that employs simultaneous 
segmentation and estimation of conjoint models by using a hierarchical, non-overlapping 
clustering method. His formulation employs a stochastic logit framework.  To evade 
problems with uniqueness of parameter estimates the author proposed a ridge regression-
like procedure to estimate parameters at the individual level using multinomial logit models. 
An information criterion is also proposed to aggregate consumers hierarchically. This 
agglomerative method starts with single subject clusters and segments are combined 
iteratively to give a minimum reduction of the aggregate log-likelihood.   
 
Several non-hierarchical procedures based on optimisation criteria are descriptive clustering 
methods that do not distinguish between dependent and independent variables. (Spath 
1979, 1982) proposed a clusterwise linear regression procedure to find homogeneous groups 
in terms of the relationship between dependent and independent variables and 
simultaneously estimate corresponding regression functions within the clusters such that the 
sum of the error sums of squares over all clusters is minimized.  Spath’s method handles 
only one observation per individual. (Wedel and Kistemaker 1989) proposed a generalization 
of clusterwise regression by extending Spath’s method to handle more than one observation 
per individual and which estimates parameters and segments simultaneously. Their 
ABSRJ 3(2): 108  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
procedure uses (Banfield and Bassil 1977) exchange algorithm to maximize the likelihood 
and yields nonoverlapping, nonhierarchical segments. 
 
(DeSarbo et al., 1989) proposed an overlapping clusterwise regression procedure that uses a 
simulated annealing algorithm for optimisation. This methodology can accommodate more 
general clusterwise linear regression formulations.  It allows for multiple dependent 
variables, replicated observations by respondent, overlapping and nonoverlapping clusters 
and constraints on cluster membership. Computationally, simulated annealing has been 
devised as a general optimisation methodology to find the global optimum of a function that 
may have several local optima.  This technique is based on a controlled random search that 
samples the objective function in a feasible region of the parameter space.  The simulated 
annealing procedure starts from a random initial partition of the sample, and iteratively 
specifies steps in a random direction in the parameter space.  If the new value of the 
objective function improves the criterion then the new solution is accepted.  If the new value 
of the objective function does not improve the criterion then the new solution is rejected 
with a probability proportional to the decrease in the criterion value. The merit of this 
procedure is that it is less burdened with convergence to local optima. 
 
(Wedel and Steenkamp 1989) proposed a fuzzy clusterwise regression algorithm that differs 
from the other fuzzy procedures since clusters are defined from regressions of the 
dependent variable on a set of explanatory variables. Similar to other fuzzy algorithms, 
partitioning of the data is carried out by minimizing the residual sum of squares criterion, 
which represents the sum of the distances of subjects from the regression equations in all 
clusters.  The clustering algorithm iterates between two steps: computing regression 
parameters within each cluster and calculating fuzzy membership of subjects in clusters. 
(Wedel and Steenkamp 1991) generalizes this fuzzy clusterwise procedure to allow for a 
simultaneous grouping of both consumers and brands into groups, making possible the 
identification of market segments and market structures at the same time.  There are two 
potential problems with this approach.  The first is that the users must subjectively specify a 
fuzzy weight parameter that influences the degree of separation of the clusters and the 
second is that the statistical properties of the estimators are not established. 
 
Probably, the advent of latent class models stands out to be the most extensive development 
in market segmentation.  The works of (Wedel and DeSarbo 1995) and (DeSarbo et al., 
1992) brought major changes in market segmentation applications. The major merit of these 
models is that they allow for simultaneous estimation and segmentation and enable correct 
statistical inference. In an excellent review, (Vriens, Wedel and Wilms 1996) conducted a 
Monte Carlo comparison of several traditional and integrated conjoint segmentation 
methods.  The authors found that latent class segmentation models performed best in terms 
of parameter recovery, segment membership recovery and predictive accuracy.   
THEORETICAL FRAMEWORK OF LATENT CLASS MODEL 
One of the criteria for effective market segmentation is to identify differences between 
distinct groups of customers in the market and be able to classify each customer into a 
segment.  The general principle of latent class models is that each segment defines a 
different probability structure for the response variable.  For the segmentation procedure a 
latent class model with K segments is proposed. 
 
ABSRJ 3(2): 109  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
( ) ( )
1
; , , , , ,
K
n k nk n
k
H fπ
=
=∑y π X β Σ y X β Σk k
N
K
 
 
1, ,n = K  respondents;  
1, ,k = K  derived segments; 
kπ  is the proportion of respondents in segment k and 
( )1,..., Kπ π=π ; 
ny  is the vector of response ratings elicited by consumer n; 
X  is the data matrix; 
kβ  is the vector of parameter estimates for segment k and ; ( )
'
1,..., K=β β β
kΣ  is the covariance matrix estimated for segment k and . ( )
'
1,..., K=Σ Σ Σ
 
It is assumed that 
1
1K kπ =∑  and each nkf  has a conditional multivariate normal 
distribution. 
 
( ) ( ) ( ) ( )1 ' 122 1, , 2 exp
2
J
nk n k k k n k k n kf π
−− −⎡ ⎤= − − −⎢ ⎥⎣ ⎦
y X β Σ Σ y Xβ Σ y Xβ  
The log-likelihood expression for N independent respondents is given by: 
 
( ) ( ) ( )
1 11
ln , , , ln ; , , , ln . , ,
N N K
n k nk n
n kn
L H fπ
= ==
= =∑ ∑∏π X β Σ y π X β Σ y X β Σk k  
 
The derivatives of the expected log-likelihood function ( )ln , , ,E L⎡ ⎤⎣ ⎦π X β Σ  with respect to 
the parameters are not straightforward.  An effective procedure to fit a latent class model 
with K segments is to maximize the expected complete log-likelihood function using the 
iterative EM algorithm proposed by (Dempster et al., 1977). The idea behind the EM 
algorithm is to augment the observed data by introducing unobserved data nkλ . This is a 0-1 
indicator indicating whether respondent n is in segment k. Given the matrix ( )nkλ=Λ  the 
complete log-likelihood function is given by: 
 
( ) ( ) ( )
1 1 1 1
ln , , ,  .ln , , .ln
N K N K
nk nk n k k nk k
n k n k
L fλ λ π
= = = =
= +∑∑ ∑∑π X β Σ Λ y X β Σ  
  
(ln , , ,  L π X β Σ Λ)  has a simpler form than ( )ln , , ,L π X β Σ  and the derivatives are 
manageable.  Each iteration is composed of two steps - an E-step and an M-step.  In the E-
step, the expected log-likelihood function is calculated with respect to the conditional 
distribution of the unobserved data matrix ( )nkλ=Λ  given the data and the provisional 
parameter estimates . This is carried out by replacing ˆ ˆˆ ,   and k kπ β kΣ ( )nkE λ  by the posterior 
probabilities ˆnkp   
 
ABSRJ 3(2): 110  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
( ) ( ) ( )
1 1 1 1
ˆ ˆln , , ,  .ln , , .ln
N K N K
nk nk n k k nk k
n k n k
E L p f p π
= = = =
⎡ ⎤ = +⎣ ⎦ ∑∑ ∑∑π X β Σ Λ y X β Σ  
 
( )
( )
( )
1
ˆ ˆˆ . , ,
ˆ
ˆ ˆˆ . , ,
k nk n k k
nk nk K
k nk n k k
k
f
p E
f
π
λ
π
=
= =
∑
y X β Σ
y X β Σ
  and 
1
ˆ 1
K
nk
k
p
=
=∑  
 
In the M-step the two terms of ( )ln , , ,  E L⎡ ⎤⎣ ⎦π X β Σ Λ  are maximized separately with 
respect to the parameters kπ  and . Maximizing the first term of the expected log-
likelihood function with respect to  leads to independently solving each of the K 
expressions 
kβ
kβ
 
( )
1
ˆ . ln , ,
N
nk nk n k k
n k
p f
=
∂
∂∑ y X β Σβ  for  Kk ,...,2,1=  
 
Maximizing the second term of the expected log-likelihood function with respect to kπ , 
subject to the constraint , yields 
1
1
K
k
k
π
=
=∑
1
1ˆ ˆ
N
k n
n
kpN
π
=
= ∑   for  Kk ,...,2,1=  
METHODOLOGY AND IMPLEMENTATION 
The EM algorithm for fitting latent class models is implemented as a set of GLIM macros.  
This is equivalent to the iterative fitting of a weighted generalized linear model with posterior 
probabilities recalculated at each iteration. The iterative procedure is initiated by setting 
random values to . The algorithm then alternately updates the parameters  
and the probabilities  until the process converges.  The assignment of individuals to 
segments is done probabilistically by Bayes’ Theorem. Individuals are assigned to the 
segment with the highest posterior probability
ˆ nkp ˆ ˆˆ ,   and k kπ β Σk
ˆ nkp
ˆnkp . 
 
A problem associated with the application of the EM algorithm to latent class models is its 
convergence to local maxima.  It is caused by the likelihood being multimodal, so that the 
algorithm becomes sensitive to the starting values used.  A procedure that widens the search 
for the global maximum is to perturb the posterior probabilities at each iteration by adding to 
each probability a pseudo-random real value multiplied by a scalar. The pseudo-random real 
values are generated from a uniform distribution in the range [0,1] from and the scalar is 
initially set to 0.1. These modified posterior probabilities are rescaled such that they sum to 
1 across the segments.  This scalar is reduced systematically after a number of iterations so 
that the iterative procedure will eventually converge. 
 
 
ABSRJ 3(2): 111  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
APPLICATION 
Bond investment strategies vary between policyholders.  Some policyholders may give more 
relevance to coupon rate and duration of investment; whereas other policyholders may give 
more importance to the issue price and credit rating of the issuer. Indeed the choices 
policyholders make depend upon several factors, which include their knowledge, their time 
frames, investment goals and the amount of risk that they are willing to take. 
 
To illustrate the methodology a conjoint study on 300 policyholders was conducted to 
investigate their preference for investment bonds. Four product attributes, which included 
issue price, duration of investment, coupon rate and credit rating of issuing company were 
identified as being key determinant attributes. The study compared three issue prices (97, 
100 and 103) with two investment periods (10 and 20 years) with three coupon rates (3%, 
4% and 5%) and with two levels of credit rating (A and B).  A complete factorial design was 
utilized which included 36 combinations of attribute manifestations. For data collection a full 
profile approach was used in which all the profiles had a unique attribute combination for an 
investment bond.  Such a design guarantees orthogonality (independence) of the attributes 
and eventually this will result in an efficient estimation of the parameters.  To reduce 
information overload on respondents two blocks of cards were presented and each 
respondent was handed a set of 18 cards with random assignment to block.  Preference 
ratings were measured on a seven point scale where 1 corresponds to ‘worst’ and 7 
corresponds to ‘best’.  A rating scale was chosen over a ranking scale on the merit that it 
express more the intensity of a preference. The data collection procedure used was person-
to-person interview as this ensured a higher return rate.  The linear predictor which relates 
the expected worth of an investment bond to its product attributes includes all main effects 
and all pairwise interactions. To make the derived market segments more accessible and 
actionable two subject profiles were also recorded including gender and their knowledge 
about investment bonds.  The sample of 300 participants comprised equal number of males 
and females and equal number of participants with good knowledge and limited knowledge 
about investment bonds. Most respondents with good knowledge were employees in the 
financial sector. 
 
Latent class models assume that observed data is made up of several unknown 
homogeneous segments which are mixed up in an unknown proportion. The first statistical 
objective is to discover the true number of segments. To address this issue, three criteria 
were used to identify the correct number of homogeneous groups of respondents in a 
heterogeneous population.  Two of these information criteria are based on the bias-corrected 
log-likelihood. 
 
( )2logC L dc= − +Ψ  
 
d is the number of estimated parameters and c is a penalty constant and measures the 
complexity of the model.  For the Akaike information criterion (AIC),  and for the 
Bayesian information criterion (BIC), 
2c =
( )lnc N= , where N is the sample size.  The third 
criterion includes an additional entropy term which is related to posterior probabilities ˆnkp .  
 
( ) ( )
1 1
ˆ ˆ log
K N
nk nk nk
k n
ˆEN p p p
= =
= −∑∑  
 
ABSRJ 3(2): 112  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
This criterion, which is an approximation of the Integrated Classification Likelihood (ICL), 
assesses the degree of separation between the segments and is more appropriate for large 
cluster sizes and attempts to overcome the short-comings of AIC and BIC.     
 
( ) ( ) ( )ˆ2log 2 lognkICL L EN p d N= − + +Ψ  
 
This latent class model was fitted three times varying the number of segments from one to 
three clusters. To overcome the problem of convergence to local optima, ten different 
random starting values were considered for each model fit. The solution with the smallest 
log-likelihood was selected. The entropy and the number of estimated parameters were also 
recorded for each solution to determine the optimal number of segments  
FINDINGS 
Table 1 shows that BIC and ICL reach a two-segment solution whereas AIC reaches a three-
segment solution.   Many authors have observed that AIC tend to overestimate the correct 
number of segments.  Since AIC does not penalize complex models as heavily as the other 
two criteria we opt for a two-segment solution. After assigning each respondent to the 
segment with highest posterior probability, these were then categorized by their gender and 
knowledge about investment bonds.  
 
Table 1: Determination of the number of segments using AIC, BIC and ICL 
 
Number of 
Segments 
( )2log L− Ψ
 
Number of 
parameters 
 
Entropy 
 
AIC 
 
BIC 
 
ICL 
1 2342.2 19 106.92 2361.2 2450.6 2664.4 
2 1262.7 38 42.29 1338.7 1479.4 1564.0 
3 1184.9 57 40.76 1298.9 1510.0 1591.5 
   
Table 2 shows a higher proportion of male and female participants in segment 1 that have 
good knowledge about investment bonds; whereas segment 2 comprises a higher 
percentage of respondents with limited knowledge.  The segments do not discriminate much 
between the gender groups. 
 
Table 2: Number of respondents assigned to segments by gender and knowledge 
 
Gender  
Segment 
Knowledge about 
investment bonds Male Female 
Total 
Good 48 46 94 
Limited 17 15 32 1 
Total 65 61 126 
Good 25 31 56 
Limited 60 58 118 2 
Total 85 89 174 
 
ABSRJ 3(2): 113  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
Figure 1 demonstrates that respondents in segment 1 discriminate between investment 
bonds having different coupon rates and different issue prices. Respondents in segment 1 
worth bonds with low issue prices more than bonds with high issue prices. These 
respondents worth high-coupon rated bonds more than low-rated coupon bonds. On the 
contrary, respondents in segment 2 discriminate between different coupon rates but hardly 
differentiate between different issue prices.   
 
Figure 1: Predicted rating scores by cluster allocation, coupon rate and issue price 
 
 
 
Figure 2 demonstrates that respondents in both segments worth bonds with a high coupon 
rate more than bonds with a lower coupon rate. Respondents in both segments worth bonds 
with an ‘A’ credit rating more than bonds with a ‘B’ credit rating; however, the difference in 
the expected worth for these two types of investment bonds is more conspicuous for 
respondents in segment 1.  Respondents in segment 2 give more priority to the coupon rate 
than the credit rating of the issuing company.   
ABSRJ 3(2): 114  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
Figure 2: Predicted rating scores by cluster allocation, coupon rate and credit rating 
 
 
 
Figure 3 shows those respondents in segment 1 worth investments bonds issued for a short 
term more than long-term bonds. Conversely, respondents in segment 2 prefer to invest 
their money in bonds that are issued for longer durations.  
 
Figure 3: Predicted rating scores by cluster allocation, coupon rate and duration 
 
 
 
The main finding of the application related to investment bonds is that there are two groups 
of investors.  Investors who have good knowledge of financial investments tend to give 
priority to all product attributes; whereas, investors with limited knowledge tend to give 
ABSRJ 3(2): 115  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
more priority to coupon rates and dividend returns than to the issue price of the bond and 
credit rating of the issueing company. Moreover, investors with good knowledge of financial 
investments tend to invest their money for shorter terms than their counterparts with limited 
knowledge. 
IMPLICATIONS FOR THEORY AND PRACTICE 
Segmentation has proved to be a very useful concept to managers and modelling consumer 
heterogeneity is the central focus of many statistical marketing applications. Models that 
approximate market heterogeneity by a number of unobserved segments have great 
managerial appeal in many applications.  Moreover, managers seem comfortable with the 
idea of market segments based on the assumption that consumers can be grouped into 
relatively homogeneous segments and the models appear to do a good job of identifying 
useful groups. Although segment-level models are very compelling from a managerial 
standpoint, they can over-simplify the market scenario and may have limited predictive 
validity. One of the major concerns underlined by market researchers is whether segment-
level models enable marketers to customize their products or services to very small 
segments, particularly micro-marketing, direct marketing and mass customisation. Segment-
level models may not be sufficiently accurate in estimating responses to marketing variables 
at the consumer level. The rapid growth of new technologies is enabling marketers to 
customize their products or services to very small segments where each consumer may 
represent a segment and the responses to marketing variables are estimated at the 
individual level.  In other words, a set of idiosyncratic parameters is estimated for each 
subject, where the posterior distribution of individual-level parameters can be estimated 
using Bayesian methods. Bayesian estimation methods have gained popularity recently and 
their main advantage lies in obtaining posterior distributions of individual-level parameters 
based on the parameters of the prior distribution. 
 
The majority of market research applications assume a discrete distribution. The popularity 
of this approach is partly due to the fact that the marginal likelihood is easily evaluated as a 
sum over a discrete number of mass points. (Heckman and Singer 1984) emphasize that any 
distribution can be approximated, to a high degree of accuracy, by a discrete distribution for 
a sufficient number of mass points. However, other authors argue that consumer 
heterogeneity is better described by a continuous rather than a discrete distribution. (Allenby 
and Rossi 1999), pointed out that the underlying assumption of a limited number of 
segments of individuals that are perfectly homogeneous within segments in finite mixture 
models is overly restrictive.nThe authors argue that by segmenting the market into a small 
number of homogeneous clusters leads to an artificial partition of the continuous distribution 
because a limited number of mass points may inadequately capture the full extent of 
heterogeneity in the data.  (Lenk, DeSarbo, Green and Young 1996) argue that a discrete 
latent class or mixture approach to heterogeneity ignores the inherent differences across 
consumers and may result in a loss of predictive performance.  In micro or direct marketing 
applications a continuous approximation of consumer heterogeneity may be more 
appropriate because targeting individual customers is more essential than identifying 
segments.  (Allenby and Lenk 1994; Allenby and Ginter 1995) suggest that consumer 
preferences, tastes and response to marketing variables are distributed over the consumer 
population according to a continuous distribution rather than assuming a discrete distribution 
across homogeneous segments. Both discrete and continuous representations have 
advantages and disadvantages and there is no evidence that one representation outperforms 
the other.  It is still an empirical question under which conditions one representation is more 
ABSRJ 3(2): 116  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
appropriate than the other.  (Arora, Allenby and Ginter 1998; Lenk and DeSarbo 2000) have 
developed segmentation models that reach a compromise between the two philosophies to 
account for both discrete segments and within segment heterogeneity. Their methodology 
combines discrete and continuous heterogeneity approaches. Both discrete and continuouos 
heterogeneity models can in principle be estimated with either maximum likelihood or 
Bayesian methods. 
LIMITATIONS AND FUTURE RESEARCH 
Our discussion focussed mainly on latent class models that address heterogeneity through a 
discrete distribution.  These segment-level models assume that subjects within each segment 
respond similarly to a marketing mix but respond differently to others in other clusters. The 
main advantage of these models over traditional clustering techniques lies in simultaneous 
estimation and segmentation.  The algorithm was devised to implement latent class models 
for normally-distributed rating responses, where the parameters are estimated using a 
maximum likelihood approach. The limitation of the algorithm is that the normality 
assumption may not always be adequate and inappropriate statistical assumptions may lead 
to deficiencies in the performance of the analyses, particularly when the distribution of rating 
scores is skewed. An alternative approach is to modify the algorithm by combining the 
Proportion odds model with the EM algorithm. The Proportion Odds model is more 
appropriate since it accommodates skewed ordinal categorical responses better when the 
normality assumption is not satisfied. Moreover, when the data set comprises response 
categories that have a natural ordering it is more sensible to work with cumulative link 
models since they utilize the ordering better.    
 
Consider an ordinal scale from 1 to R  and let jϒ  represent the R -category responses then 
 can be defined as the cumulative probability of the  item for .  
The cumulative probabilities reflect the ordering since; 
( jP ϒ ≤ )r
1
thr thj 1, 2,...,r R=
 
( 1) ( 2) ... ( )j j jP P P Rϒ ≤ ≤ ϒ ≤ ≤ ≤ ϒ ≤ =  
 
Assuming that  is the design matrix containing the values of the explanatory variables, the 
 row 
X
thj jx  of  contains the values of the explanatory variables for the  item.  Also, 
letting 
X thj
1( ,..., )pβ β=β  be a vector of parameters for the explanatory variables and 
1( ,..., )R 1α α −=α  be a vector of threshold parameters such that 1 2 1... Rα α α −≤ ≤ ≤ 0α = −∞,  
and Rα = ∞ .  The proportional odds model, which is the appropriate model for analyzing 
ordinal categorical responses is given by: 
 
( ) ( )j r iP r F α ηϒ ≤ = +    for 1, 2,..., 1r R= −  
 
where  and is a cumulative distribution function.  The link function  is a 
strictly monotonic function in the range [0  onto the real line.  The cumulative link model 
links the cumulative probabilities 
j jη ′= x β F
1F −
,1]
( jP )rϒ ≤  to the real line, using the link function .   
1F −
 
1[ ( )]j rF P r jα η
− ϒ ≤ = +  
ABSRJ 3(2): 117  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
)r
 
(.)F  can be the logistic, normal or the extreme value distributions leading to logit, probit 
and complementary log-log link functions. The model suggested by (McCullagh 1980), for 
predicting the probabilities (j jPµ = ϒ =  is given by 
 
1( ) ( ) ( 1) ( ) ( )j j j r j rP r P r P r F F jα η α −ϒ = = ϒ ≤ − ϒ ≤ − = + − +η  
 
For the segmentation procedure a latent class model with K segments is considered 
 
1
( , , ) ( ,
K
j k j
k
P r P r )α β π π α β
=
ϒ = = ϒ =∑  
 
where 1( , ) ( ) (j r j rP r F Fα β α α −′ϒ = = + − +x β x β)j′  and kπ  is the proportion of 
respondents in the  segment. The likelihood function is maximized using the EM algorithm 
which is equivalent to iterative fitting of a weighted GLM with posterior probabilities 
recalculated at each iteration.  Moreover, the parameters within the segments are estimated 
at the same time that segment membership is identified. 
thk
REFERENCES  
Allenby, G. M. & Ginter, J. L. (1995). Using Extremes to Design Products and Segment 
Markets, Journal of Ma keting Research, 32, 392-405. r
t
Allenby, G. M. & Lenk, P. J. (1994). Modelling Household Purchase Behaviour with Logistic 
Normal Regression, Journal of the American Statistical Association 89, 1218-1231. 
Allenby, G. M. & Rossi, P. E. (1999). Marketing Models of Heterogeneity, Journal of 
Econome rics, 89, 57-78. 
Arora, N., Allenby, G. M. & Ginter, J. L. (1998). A hierarchical Bayes model of primary and 
secondary demand. Marketing Science, 17, 29-44. 
Bagozzi, R. P. (1994), Advanced Methods of Marketing Research, Blackwell Publisher. 
Banfield, C. F. & Bassil, L. C. (1977). Transfer Algorithm for Nonhierarchical Classification, 
Applied Statistics, 26, 206-210. 
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977). Maximum Likelihood from Incomplete 
Data via the EM algorithm, Journal of the Royal Statistical Society, B, 39, 1-38. 
DeSarbo, W. S., Oliver, R. L. & Rangaswamy, A. (1989). A Simulated Annealing Methodology 
for Clusterwise Linear Regression, Psychometrika, 54, 707-736. 
DeSarbo, W. S., Wedel, M., Vriens, M. & Ramaswamy, V. (1992). Latent Class Metric 
Conjoint Analysis, Marketing Letters, 3, 273-288. 
Green, P. E. & DeSarbo, W. S. (1979). Componential Segmentation in the Analysis of 
Consumer Trade-offs; Journal of Marketing, 43, 83-91. 
Green, P. E. & Srinivasan, V. (1978). Conjoint Analysis in Consumer Research: Issues and 
Outlook, Journal of Consumer Research, 5, 103-123. 
Hagerty, M. R. (1985). Improving the predictive Power of Conjoint Analysis: The Use of 
Factor Analysis and Cluster Analysis, Journal of Marketing Research, 22, 168-184. 
Heckman, J. J. & Singer, B. (1984). A Method for Minimizing the Impact of Distributional 
Assumption in Econometric Models for Duration Data, Econometrica, 52, 271-320. 
Kamakura, W. A. (1988). A Least Squares Procedure for Benefit Segmentation with Conjoint 
Experiments, Journal of Marketing Research, 25, 157-167.  
ABSRJ 3(2): 118  
 
Advances in Business-Related Scientific Research Journal (ABSRJ) 
Volume 3 (2012), Number 2 
 
 
t
  
Lenk, P. J. & DeSarbo, W. S. (2000). Bayesian Inference for Finite Mixture of Generalized 
Linear Models with Random Effects, Psychometrika, 65, 93-119. 
Lenk, P. J., DeSarbo, W. S., Green, P. E. & Young, M. R. (1996). Hierarchical Bayes Conjoint 
Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs, 
Marketing Science, 15, 152-173. 
McCullagh, P. (1980). Regression models for ordinal data. J. R. Sa ist. Soc. B 42, 109-142. 
Ogawa, K. (1987). An Approach to Simultaneous Estimation and Segmentation in Conjoint 
Analysis, Marketing Science, 6, 66-81. 
Rossi, P. E., Allenby, G. M. & McCulloch, R. (2006). Bayesian Statistics and Marketing, John 
Wiley and Sons Ltd. 
Smith, W. (1956). Product Differentiation and Market Segmentation as Alternative Marketing 
Strategies, Journal of Marketing, 21, 3-8. 
Spath, H. (1979). Clusterwise Linear Regression, Computing, 22, 367-373. 
Spath, H. (1982). An Algorithm for Clusterwise Linear Regression, Computing, 29, 175-181. 
Vriens, M., Oppewal, H. & Wedel, M. (1998). Ratings-based versus Choice-based Latent 
Class Conjoint Models – An empirical comparison, Journal of the Market Research 
Society, 40, 237-248. 
Wedel, M. & DeSarbo, W. S. (1995). A Mixture Likelihood Approach for Generalized Linear 
Models, Journal of Classification, 12, 1-35. 
Wedel, M. & Kamakura, W. A. (2000). Market Segmentation: Conceptual and Methodological
Foundations, Kluwer Academic Publishers. 
Wedel, M. & Kistemaker, C. (1989). Consumer Benefit Segmentation using Clusterwise Linear 
Regression, International Journal of Research Marketing, 6, 45-49  
Wedel, M. & Steenkamp, J. B. (1989). Fuzzy Clusterwise Regression Approach to Benefit 
Segmentation, International Journal of Research in Marketing, 6, 241-258. 
Wedel, M. & Steenkamp, J. B. (1991). A Clusterwise Regression Method for Simultaneous 
Fuzzy Market Structuring and Benefit Segmentation, Journal of Market Research, 28, 
385-396. 
Wind, Y. (1978). Issues and Advances in Segmentation Research, Journal of Market 
Research, 15, 317-337. 
 
ABSRJ 3(2): 119