Samo UHAN, Mitja HAFNER FINK* CONTEXT EFFECTS IN SOCIAL SURVEYS: BETWEEN INSTRUMENT AND RESPONDENT Abstract. The intention of this article is to examine the nature of cognitive representations that respondents create and utilise when making evaluative judgements in surveys. The following two hypothetical statements are the starting point of our analysis: (1) the order of items affects both the cognitive representation of the underlying dimension and the factor structure of the answers; (2) the different levels of cognitive sophistication of respondents affect their recognition of the concepts measured and the nature of their answers. A multidimensional scale of the concept of 'negative nationalism' was analysed. The concept was measured on two dimensions: xenophobia and protectionism. The research results partly confirm the hypothetical statements: the level of a respondent's education (as an 233 indicator of their level of cognitive sophistication) influences their recognition of the concept measured, while other classical context characteristics (such as item order) did not. We can only confirm the possible effect of item order on the nature of their response, which results in the two-dimensionality or unidimensionality of the concept measured. Keywords: survey context, measurement, scales, cognitive structures, item-order effects Introduction Social survey researchers tend to reduce or control as far as possible (systematic) context effects on the replies of their respondents. To this end, researchers must determine whether respondents - as a result of the particular context effect- are able to identify a measured concept 'hidden' within the questions asked. In this article, we will discuss two aspects of context effects: a) so-called 'local context effects' (the effects of the measurement instrument); and b) so-called 'global context effects', which include the * Dr. Samo Uhan, Assistant Professor, dr. Mitja Hafner Fink, Assistant Professor, Faculty of Social Sciences, University of Ljubljana. motivational and cognitive basis of attitudes (see Uhan, 1998). We observe the local context as a question-order effect and the global context as the effect of respondents' cognitive sophistication (level of education) (Kros-nick, 1992; Johnson et al., 1999). When measuring a concept (such as nationalism, in/tolerance, social distance, prejudice) within a social survey, researchers often use statements with which respondents express their agreement or disagreement. These usually take one of two forms: a) a balanced list of positive and negative items that cover the two poles of the same dimension (e.g. tolerant - intolerant) (bipolarity); and b) a list of items covering two or more (sub)dimen-sions of the same concept, which are not necessarily antagonistic (multidi-mensionality) (cf. Hafner-Fink and Uhan, 2013). In this article, we focus on the second form. Context effects are not usually identified directly in respondents' replies, but are rather based on hypothesis. Context effects can be defined as those changes in responses to survey questions that are a consequence of the characteristics of the questionnaire or of the circumstances in which the survey takes place. If these effects were not present, the replies would be different, namely unaffected by context. The key to understanding context effects is the mental 234 representation, or the model of information processing that the respondent references when forming his/her response. The replies of respondents can thus be influenced by the formal characteristics of the instrument used; for instance, the question order, the type of scale, etc. (see Schuman & Presser, 1981; Tourangeau & Rasinski, 1988; Smith, 1988; Sudman, Bradburn & Schwarz, 1996). On the other hand, some authors have considered other possible effects based on the specific circumstances in which the survey interview takes place - such as the personal, cultural and social context of those involved - which have an explicit or implicit influence (see Zaller & Feldman, 1992; Turangeau, Rips & Rasinski, 2000; Hair, 2005). In this respect, the most frequent question researchers ask is whether the context effects should be treated as a 'temporary disturbance', or whether they represent a serious, systemic fault that may diminish the significance of the survey results or findings. Although the methodological literature cites many instances of research into context effects, most are experimental studies constructed in order to demonstrate context effects. This does not diminish their validity, but it does raise the question, how often and in what way do context effects appear in non-experimental circumstances? Researchers often point to the well-known study by Schuman and Presser (1981), who describe the influence of context in researching standpoints in 'normal' non-experimental circumstances. On the basis of their analysis of results in the DAS (Detroit Area Study), they establish that the likelihood of the occurrence of context effects is scarcely any greater than coincidence. Smith (1988) reaches a similar conclusion following his analysis of the replies in the GSS (General Social Survey). Smith establishes that a random rotation of questions leads to context effects in only four percent of cases. However, these findings which indicate a low or coincidental effect of context on responses can be misleading. Tourangeau, Rips and Rasinski (2000) demonstrate that context effects are more frequent than one might expect. They contend that researchers often overlook the fact that the appearance of context effects depends on the conceptual links among questions. This means that heterogeneous research often conceals context effects. In this paper we will verify the appearance of context effects in the 'natural' environment of a public opinion survey. In our research we will check for the traditional effect of survey context on survey responses, i.e. the influence of the order in which questions are asked. At the same time, we will also test the influence of context in the case of related questions or statements, based on the hypothesis that a respondent's cognitive sophistication influences their ability to identify what is being measured (socio-cultural context). And in this situation we expect socio-cultural context (the ability of respondents to identify the concept measured) to override the expected effects of the survey context. 235 The Problem In order to operationalise concepts, the standard design of social surveys includes two basic types of questions that can be referred to as objective and subjective. As a rule, respondents have fewer problems framing responses to objective questions than to subjective questions, since rather than the recognition and linking of facts, the latter demands the creative processing of information. This awareness has led to a debate as to whether it is theoretically appropriate to view the underlying variable as a unidimensional continuum. Bipolar survey items have long been held to be appropriate instruments for the operationalisation of theoretical models. This is because it is relatively simple for them to prompt the respondent to a cognitive representation of the object of research that is close to the relevant concept expected by the researchers. The application of a bipolar scale, however, raises a number of methodological questions. In the face of 'faulty' results, cognitive psychologists have drawn attention to the fact that the actual instrument, namely the bipolar scale of statements, can itself be the object of the cognitive representations that the respondents form of the research and the context in which it is carried out. If cognitive representations play a crucial role in models of information processing, this would also imply the hypothesis that these representations have a direct influence on the formation of responses to survey questions and, consequently, that they reduce the validity of the results. The experimental findings relating to this methodological problem (presented below) emphasise the importance of taking account of both the socio-cultural context (cultural and social) and the context in which the survey is conducted (the characteristics of the instrument). Traditionally, social surveys address respondents' attitudes as a latent variable that can be adequately measured on a bipolar response scale. These models assume that the cognitive structures underlying the responses to a bipolar survey scale are both unidimensional and continuous. The term 'unidimensional' refers to the notion that there is a single dimension of variability for the class of stimuli being judged. The term 'continuous' refers to the fact that there are no breaks in the dimension: that is, that there is a seamless gradient from one end of the latent variable to the other. A cognitive structure that has the properties of unidimensionality and continuity should produce judgments that are reciprocally antagonistic: that is, as one moves away from one pole of the response continuum, one will inevitably move towards the opposite pole (Ostrom et al., 1992: 298). If, for example, one were measuring the 'subjective' dimension of '(in)tolerance' then one end 236 of the bipolar scale would consist of a statement that expressed extreme intolerance, while at the other end would be an highly tolerant statement. Early on, researchers noted that the concept of a unidimensional latent variable was unsuitable for evaluating complex social phenomena. For instance, researchers encountered difficulties when attempting to assess subjective standpoints as to whether the death penalty is morally justifiable. The measuring of such a standpoint proved possible only when the respondents combined evaluations for different aspects of the death penalty (ethical, legal, cultural, historical, etc.). According to Ostrom (Ostrom et al., 1992), respondents face a similar task every time they evaluate the properties of objects that differ from each other but are not mutually exclusive (when measuring a complex, subjective standpoint). As an example of an ineffective scale, Ostrom cites the traditional American presidential question that assesses the conservatism or liberalism of the candidate. The scale includes two poles on what appears to be the same continuum - liberal and conservative- which the survey participants are expected to perceive as mutually exclusive, although much research shows the opposite. A newer approach to social psychology (above all cognitive theory) reconceptualises the traditional psycho-physiological models for presenting opinion, or, as Ostrom (Ostrom et al., 1992: 298) notes, 'The attitude construct, traditionally viewed as a bipolar continuum, can be reconceptualised in terms of two discrete categories that are separately stored in a semantic network.' The essence of this approach is the desire to identify the discrete cognitive representations triggered by the survey questions. The respondent has two kinds of cognitive representations available: those that relate to the object of evaluation and those that relate to the instrument of measurement (for instance, the above-mentioned scale of conservatism or liberalism of the presidential candidate). Prior to giving a response, the survey participant's task is to find the highest level of correspondence between the two cognitive representations, the object and the instrument of measurement. Continuing with our the example of evaluating the presidential candidate on the conservative-liberal scale, it is reasonable to assume that the extreme poles of this scale trigger separate cognitive structures or representations (a prototypical conservative or liberal president). The task of the respondents is to decide which of the prototypical representations (or both together) best describe the candidate. In spite of the apparently unidimensional bipolar questioning, the likelihood that, in giving their answers, the respondents will make use of separate cognitive categories that can 'support' both poles of the scale (simultaneously conservative and liberal) has prompted theorists to surmise that the latent cognitive structures are dualistic and discrete (Ostrom et al., 1992: 298). In discrete categories, the two poles are linked by content but are nevertheless independent of each other; each pole is not necessarily a negation 237 or inverse property of the category at the other pole. Dualistic categories are linked but not necessarily antagonistic. In the conservative-liberal example, the respondent can express his or her attitude as non-conservative, but not at the same time liberal. In other words, we are no longer talking about one dimension on a bipolar conservative-liberal continuum, but about two separate dimensions - liberal and conservative. Based on the assumption of duality, Ostrom designed an experiment to reject the hypothesis that the latent variables were merely unidimensional and continuous (Ostrom et al., 1992). The starting point for Ostrom's experiment was the 'Donald case', which is known in socio-psychological research and which describes an average day in the life of a fictitious character known as Donald. Donald has both the positive and negative features of an average person. In the original experiment, the researchers tested a number of hypotheses with a pre-prepared evaluation test on the unidimensional continuum of 'friendly-unfriendly (malicious)'. To this end, Ostrom made use of a special procedure that involved a short presentation of Donald, which incorporated a 'disturbing story'. The purpose of this story was to negate the effect of recentness. In carrying out his experiment, Ostrom used three different sequences of questions: each sequence incorporated the same twelve statements which served to prompt the respondents to express their impressions of the target person. Six statements fitted the latent dimension 'friendly-unfriendly (malicious)', while the other six were construct-irrelevant. Ostrom used an eleven-step scale of agreement presented on the two pages of the questionnaire, which each carried six statements. The sequences in Ostrom's experiment differed in terms of the location of the relevant and irrelevant statements. In the random ordering sequence, the twelve statements were ordered at random with three relevant and three irrelevant statements on each page. In the 3-6-3 sequence, the first three statements on the first page were relevant to one of the poles of the scale, while the last three statements were relevant to the opposite pole; the middle six statements in this sequence were irrelevant. The 6-6 sequence placed all the relevant statements on one page. According to Ostrom's findings, duality (two-dimensionality) can be expected when the measurement instrument enables the direct identification of the concept due to the ordering of the questions in both block sequences. In the case of the random sequence of items, bi-polarity (unidimensionality) would be expected. The findings of other researchers suggest that when the instrument enables the direct identification of the concept, the opposite effect (a one-dimensional structure) can be expected (Hafner-Fink & Uhan, 2013). It seems that the latter indicates the 'global context' effect (the content of the concept and the cognitive sophistication of respondents), which may override the question order 238 effect (ibid. 850-851). The Research Model These findings formed the starting point of our research. Apart from question order effect ('local context'), we were interested in the ability of respondents to identify what was being measured ('global context'). The test was performed in the autumn of 2003, within the framework of the Slovenian Public Opinion Survey (SJM 2003/3 and SJM 2003/4; see Tos et al., 2004a, 2004b). We obtained 1777 questionnaires which answered all the relevant questions (2002 adult residents of Slovenia were surveyed). The respondents were divided into three sub-groups, each of which responded to a different version of the questionnaire.1 With the help of a five-step (Likert-like) scale, the participants expressed agreement or disagreement with a group of twelve statements, six of which were designed to express two dimensions of negative nationalism (protectionism and xenophobia), while the remaining six were construct-irrelevant. In each version of the questionnaire, the position of the construct-relevant items were changed in line with Ostrom's sequence, so that, in the first version, the first three statements expressed xenophobia and the last three protectionism, while the six intermediate statements were construct-irrelevant (i.e. 1 The respondents completed a survey sheet which was attached to the basic SJM questionnaire. using a 3-6-3 sequence). In the second version, all six construct-relevant statements appeared in one block that comprised the concept 'negative nationalism', while the six irrelevant statements appeared separately on the other page of the questionnaire (i.e. a 6-6 sequence). In the third version, the twelve statements appeared in random order, with three relevant and three irrelevant statements on each side of the questionnaire, irrespective of the concept. Analysis and Results We supplemented Ostrom's thesis on the influence of the measuring instrument (bipolarity and item order) with the thesis on the influence of the 'content' of the concept measured. We thus carried out a test in which all the statements were unidirectional (which means that we excluded the possible influence of bipolarity) and we researched only the influence of the item order on the 'formation' of the two-dimensionality (or unidimensional-ity) of the concept of 'negative nationalism'. A Likert scale was used to measure the level of negative nationalism, which was hypothetically divided into two dimensions: xenophobia and protectionism. Here, each dimension was represented by three statements that were combined in the questionnaire 239 with six irrelevant statements2. The scale was constructed as follows: a. the dimension of 'protectionism' was represented by the following three statements: - Slovenia should limit the import of foreign products to protect its economy. (x1) - Foreigners should not be allowed to buy land in Slovenia. (x2) - Slovenian television stations should give precedence to Slovenian films and programmes. (x3) b. the dimension of 'xenophobia' was represented by the following three statements: - The crime rate is increasing because of the number of immigrants. (x4) 2 The six irrelevant statements covered concepts (dimensions) of authoritarianism and traditionalism that are theoretically (conceptually) related to the relevant dimension (concept) of nationalism. Because of this affinity of the relevant and irrelevant concepts, the 'task' of 'identification' of the relevant dimension was not trivial for respondents. The following six irrelevant items were included in the questionnaire: - Homosexuals should not be allowed to express their sexual orientation in public. (x7) - Old customs are being destroyed by modern times. (x8) -1 am always prepared to support new things. (x9) - Human life is determined by destiny. (x10) - In general, a child will benefit if he/she accepts his/her parents' way of thinking. (x11) - A community that tolerates large differences in beliefs cannot survive in the long run. (x12) - Non-Slovenes should not be allowed to hold public posts. (x5) - Mixing people of different ethnic and cultural backgrounds brings only problems. (x6) To test both models (unidimensionality and two-dimensionality) for each of the three conditions of ordering (3-6-3; 6-6; and random), we used confirmatory factor analysis (CFA) on the basis of a correlation matrix of the six relevant statements shown. We first carried out an exploratory factor analysis (the principal axis method) which in this case yielded similar results to those arrived at by Ostrom and his colleagues: in both block sequences there is a clear two-dimensional solution (protectionism, xenophobia); while in the case of the random ordering of statements, a unidimensional solution makes more sense.3 Using confirmatory factor analysis for all three conditions of ordering the relevant statements, we tested two measurement model variants: a) a two-factor model (a thesis of the two-dimensionality of the concept of negative nationalism) - protectionism (|j) and xenophobia (|2); and b) a single-factor model, which presupposes the unidimensionality of the concept of negative nationalism (|). Here too, we tested using both models, and for each model we tested the three conditions of ordering the 240 relevant statements (Diagram 1). We compared the results of the testing of all six models to the same degrees of freedom (df). To evaluate the models' fit we applied two statistics: the ratio x2/df and the root mean square error of approximation (RMSEA)4. 3 In the instances when irrelevant items were also included in the model, the results were structurally the same: a) in both block sequences, the two expected factors of nationalism (xenophobia and protectionism) and the two factors of irrelevant items were formed; b) in the random sequence, two factors were formed - one factor of relevant items (nationalism) and one factor of irrelevant items. There are only a few deviations: a) one irrelevant item (about homosexuals) 'joined' the factor of xenophobia in both block sequences; and b) one irrelevant item (about homosexuals) 'exchanged'positions with one relevant item. 4 There are different views in the literature as to the threshold value for each statistic. For the y.2/df ratio, 'different researchers have recommended using ratios as low as 2 or as high as 5 to indicate a reasonable fit' (Marsh and Hocevar, 1985: 567). There is also disagreement regarding the RMSEA value: some authors are more conservative and put the value of 0.05 as the upper boundary, while others are more liberal and put the value of 0.08 as the upper boundary for a good model fit (e.g. Macintosh, 1998: 87; Li and Wehr, 2007: 376). Diagram 1: GENERAL HYPOTHETICAL MODELS FOR THE SCALE OF 'NEGATIVE NATIONALISM' 241 The Question-Order Effect Only when allowing for correlations of measurement errors in all three conditions of ordering were we able to obtain a good fit between the model and the data in the case of both the single-factor and the two-factor models. However, when comparing the results of testing for all six models to the same degrees of freedom, differences in the model's fit are present (Tables 1a, 1b and 1c). These differences and the specific results for each model do indicate item-order effects. In the case of the two block sequences, the fit between the single-factor model and the data is significantly worse than the fit between the two-factor model and the data. Thus, the single-factor model's fit with the data in the case of the random ordering of statements is stronger than in both block sequences, while the opposite applies for the two-factor model; random ordering means a worse fit between the model and the data. So, for the block sequences, a two-factor solution makes most sense and for the random distribution, a single-factor solution. Similarly, given the correlations among the factors (