| 153 RSC, Number 6, Issue 2, May 2014, pp. 153-183. Informational value of data about data in surveys: example of two web surveys Marija Paladin University of Ljubljana marija.paladin@gmail.com 154 | RSC, Number 6, Issue 2,May 2014 Abstract: Considering data about data in survey research a potentially rich source of additional research information in theoretical part paper discusses definitions, categorizations, usefulness and dilemmas connected to data about data. In empirical part of paper we presented some metadata, auxiliary data and paradata gathered in two web surveys conducted on www.1ka.si. We analyzed potential differences between pre and post reminder respondents in which we included partially and fully completed questionnaires. We also analyzed time spent by respondent to answer full questionnaire or each page of questionnaire. In this case we analyzed only those questionnaires that were fully completed. Beside differences between type (size) of organization we were also interested if pre and post reminder participation in web survey and time needed to answer full questionnaire or each page depends on some control variables (age, work experience, education, gender). Keywords: paradata, data about data, reminder, online questionnaire, web survey | 155 Introduction Researcher has to be aware that when conducting (social) survey or research she/he can reach far beyond survey product data or said in another way - just answers to his questions. Potentially useful tool are also byproduct data. We refer with this to data made in different aspects or phases of survey and which are not researcher's primary concern but could be useful in further analysis. Most often used expression describing these data is paradata, although we prefer somewhat broader expression 'data about data'. Detailed categorization different types of 'data about data' will be done in further text, especially considering paradata, metadata and auxiliary data. Paradata is part of Couper's triple categorization of data, metadata, paradata (Kreuter, Coupery & Lybergz, 2010: 286) and could be regarded as byproduct of the (field, internet, phone) data collection process. Researchers use computer assisted methods to collect survey data and data about processes which allow statistical evaluation, monitoring and managing of survey process (Kreuter, Coupery & Lybergz, 2010: 282). Use of paradata focuses on management of some nowadays challenges such as declining response rates, increasing risk of non-response bias and measurement error, and escalating costs of survey data collection. The collection of survey paradata is not new but the range and detail of paradata being collected has increased due to the computerization of the survey process (Nicolaas, 2011: 4). The possibilities of use of paradata for research purposes are wide and not so good explored. It is needed better knowing which paradata can be useful (and in connection to what), so that they will be collected and treated intentionally not just as byproduct. Some steps in that direction we are making in this paper. 156 | RSC, Number 6, Issue 2,May 2014 This paper is divided into two parts, theoretical in which we talk about data which are produced in survey process (data about data) and empirical, which is based on two surveys conducted by an online questionnaire on www.1ka.si. Main focus and main goals were two. Firstly, to identify if there are some differences between those respondents who answered the questionnaire before or after reminder so we could tackle some information on pre or post reminder responders to address potential units of survey in invitation more appropriately (to get more responses). And second, to find out if there were some differences between respondents on the basis of time they needed to answer questionnaire completely to see if there exists any statistically significant differences between respondents based on how long they need to answer questionnaire. Definition Maybe because there is currently no consensus over a standard definition for paradata (Nicolaas, 2011) there are quite a few, more or less similar operational definitions of paradata. For example: Paradata are data collected about the survey process and captured during computer assisted data collection by interviewer's assistance or automatically. They include call records, interviewer observations, time stamps, keystroke data, travel and expense information, and other data (Kreuter, Coupery & Lybergz, 2010: 282) Other says that paradata are process data, or all the data collected during the response process and do not include the response itself and which exist in both interviewer-administered surveys and computer assisted self-administered surveys (Horwitz et. al. 2012). Couper was the first author to introduce the term "paradata" to the field of survey methodology in terms of automatically generated process data. Now the term paradata covers all types of data about the process of collecting survey data such as interviewer call records, length of interview, interviewer characteristics, interviewer observations (Nicolaas, 2011: 3). Paradata are data captured throughout the entire survey process that are a result of collecting | 157 the product data which could (or not) be used and intentionally collected (Frost & Duffey, 2010: 14) Categorization Four primary categories of survey data proposed by Frost Hubbard and Ben Duffey (2010: 15-16) and qute similar from Garry Nicolaas (2011: 3) are product data, paradata, metadata and auxiliary data. Product data answers to questions in surveys or survey questionnaire data (not paradata). Paradata (process data) measure keystroke files, Contact attempt data, Interviewer hours worked and miles traveled, ect. Metadata describes variables, description of survey purpose ect. Metadata are static descriptions of a data file or data system, for example variable and value labels, response rates. Auxiliary data give information on sampling frames, Census area, characteristics, administrative data ect. Auxiliary data are preexisting data that is used to support the survey process or the analysis of the substantive data. For interviewer-administered surveys, paradata can include response times, respondent utterances (pauses, hedges, stutters), respondent expressions, interviewer observations. In computer assisted self-administered surveys, such as Internet surveys we can collect information about location of break offs, changed answers, error messages, mouse clicks, response times (Horwitz et. al. 2012: 1). Paradata items that could be collected in computer-assisted personal interview surveys (Nicolaas, 2011: 7-12): Interviewer characteristics, call record data, interviewer observations about the area and dwelling, doorstep interaction, audit trails, audio recordings, other paradata items (data items which describe the process of asking and answering questions). 158 | RSC, Number 6, Issue 2,May 2014 As we see some paradata are automatically collected (for ex. by softwer) and some are interrwiever assisted (for ex. their observations). Different type or data collecting and also different type of data (time needed to complete questionnaire vs. observation on willingness to participate in research) may produce different quality of data. Despite that, as Casas-Cordero, Kreuter, Wang and Babey observed, the literature on the quality of, for example, neighbourhood observational data collected by interviewers is only now emerging. And due to moderate to low Cohen k statistics (which is used to score the agreement between observers in categorical rating tasks) they got in their research on neighborhood observations, we could argue that much of observed could have less relationship to the real characteristics of the areas than to characteristics of the interviewers (Casas-Cordero et. al., 2013: 228, 236-240). The National Health Interview Survey (NHIS) uses following groups of paradata: response paradata, measures of time, measures of contactability, measures of cooperation, mode measures, survey-level Information (U.S. Department of Health and Human Services, 2012: 6-16). Another categorization of paradata is that paradata can be macro or micro. As author says macro paradata or summary process measures is common and widely used. Examples of these are overall process summaries, like coverage rates, item and unit nonresponse rates. On the other hand there are micro paradata or process details known on each case, like language in which each interview was taken in multilingual environment, how many times the household was called before interviewing, whether there was refusal at the beginning ect. Micro paradata are less familiar. This could be due to lack of interest in seeing possible added value of paradata (by researchers) and possible additional expenses (on clients side). Micro paradata are not about overall survey process (aggregated) but rather it describes survey process on individual records (Scheuren: 1-2). We can understand macro paradata as metadata in previously mentioned categorization and micro paradata as paradata in previously mentioned categorization. | 159 Collecting paradata is, beside in interviewer-administered surveys or laboratory-based research accessible in web surveys also. It is possible to conduct large-scale self-administered surveys while collecting paradata (Heerwegh a: 2). Dirk Heerwegh, discussing mostly on audit trails, categorizes paradata in two subgroups: server side paradata and client side paradata. Mostly all web surveys gather some of server side paradata which are collected automatically without additional effort or consent of client (responder). If researcher is interested in deeper information on respondent behavior, like at the level of specific survey questions, author introduces term client side paradata. As author distinguishes, client side paradata are not collected at the level of the server, but at the level of the respondent's computer, by incorporated script (researcher detects respondent's behavior such as clicking radio-buttons, drop-boxes and hyperlinks) and are sent when responder submits the web page (Heerwegh a: 2-5). On this point we could argue that there are not client side paradata if there exists automatic collection, but on the other hand, some examples speak against it. For example, time needed for completing one page or all questionnaire, is on some survey software tools collected automatically when responder submits each page along with his/hers answers. Although submitting is needed, it's main purpose is to complete one block of questions and send answers. Submitting in this case does not mean that answers to survey questions are also paradata. But, never the less, we have to consider that author discusses about audit trail or audit log which is important for his categorization. An interesting categorization offers Frauke Kreuter (2010: 4) when speeking about what paradata are available throughout the survey process. For this purpose she devides paradata into: key strokes, for example response times, vocal characteristics, for example pitch of interviewer voice, disfluencies, contact data and interviewer observation, for example day and time. 160 | RSC, Number 6, Issue 2,May 2014 Categorization is especially interesting because it underlines one part of interviewers nonverbal communication (vocalic cues) as important part of survey process. Vocalic cues appeared to be quite important in many cases of face to face communication and also in process of persuasion which in other package takes place in convicting possible responder to participate in survey. If we consider different categorizations properly we can make conclusion that to some degree there exist some overlapping between different categories. If we take categorization to product data, paradata (process data), metadata and auxiliary data, it must be said that there will be emphasis on all categories (without product data) for purposes of this paper, although primary interest lays in paradata. Usefulness of Paradata The possibilities of use of paradata in the spirit of statistical process control are wide. The first uses of paradata focused on exploration of measurement error in surveys. Paradata are also widely used to explore non-response in surveys, to manage data collection and for use of paradata-driven responsive design (Couper & Kreuter, 2013: 271). Kreuter, Coupery and Lybergz differ between paradata and their post-survey use and paradata used in monitoring and managing of ongoing surveys. Paradata and their post-survey use (Kreuter, Coupery & Lybergz, 2010: 283285) means post-survey assessments or post-survey corrections of errors common in the survey process. Paradata in monitoring and managing of ongoing surveys (Kreuter, Coupery & Lybergz, 2010: 286-288) means that measures about the process are taken along the way, so that error sources can be located and interventions can be targeted during the collection process. | 161 Paradata can be used to gain reliable and replicable findings about survey methods and practice to minimize survey error (Nicolaas, 2011: 4). They can also be used as alternative to measure to survey data quality analysis. Some argue that merely response rates are not most suitable measure of quality of survey data. That why it is proposed to upgrade this approach with other paradata. Pros of that point of view is that it involves more data, uses complete data, data are reported at the survey level, it encourages the development of paradata and cases are differentiated in process of (para)data collection. Approach is promising also because it enables comparison of respondents and nonrespondents on some variables. Further upgrading of this approach means comparison of respondents by paradata and data about crucial variables that is examining correlations between paradata and survey variables like comparison of early and late responders (Wagner 2009). Or to identify potential problems with the survey instrument, understand the process the respondent uses to complete the survey, to assess the quality of the instrument design, to evaluate how well the instrument is working and whether there are modifications that need to be made prior to production (Horwitz et. al. 2012). Pro gathering and analyzing paradata reasons also arise from practice (Horwitz et. al., 2012). Among those reasons are identification of problematic screens or questions, testing usefulness of help option, identifying drop out points ect. Dirk Heerwegh (a: 6-15) also sees a few possible uses of client side paradata, which can be categorized with following goals: calibrating progress indicators, testing the effects of response formats, testing the effects of question, identifying attitude strength. Couper and Kreuter (2013) conducted exploratory study using paradata to explore item level response times in surveys on results of computer astisted survey from cycle 6 of the National Survey of FamilyGrowth (2002-2003). They found out that automatically derived indicators of item characteristics are found to vary systematically with response time and 162 | RSC, Number 6, Issue 2,May 2014 interviewers also appear to contribute independently to the completion times (although it has to be stressed that measured demographic characteristics and experience of interviewers explain only a small part of variability) (Couper & Kreuter, 2013: 293-284). SOME DILEMAS ABOUT PARADATA Paradata capture can be viewed as collecting information about the process of completing a survey. No behavior outside the survey is captured, so it can be argued that no additional consent than to participate in survey is needed, although the question of whether and how to inform respondents about the capture of paradata remains. On the other hand respondents are usually not aware that such additional information is being collected and, if they were aware of it that might change their behavior or decide not to participate in the survey. Questions is how to provide information about the collection of paradata, linkage them to survey data while at the same time maintaining respondent cooperation with the survey (Couper & Singer, 2013: 58-59). Social surveys mostly rely on the voluntary cooperation of respondents and protection of their personal information and identities, actually and perceived. Some authors argue that various paradata include information that could disclose respondents' identities; for example address details, interviewer remarks, audio recordings. Consequently paradata databases cannot be released without thorough processing and the removal of information that could be used to identify respondents. But this process is problematic and time consuming (Nicolaas, 2011: 16). Researchers must protect respondents from potential harm and assure their autonomy in deciding whether to participate in the research or not. This | 163 means assuring and obtaining respondents' informed consent. To some authors this means assuring that they are treated as autonomous individuals with the right to make informed, voluntary decisions about participation. That is connected with ethical and practical questions arising from the growing use of paradata - the data collected by computerized systems during data collection - in surveys, especially those conducted online (Couper & Singer, 2013: 57). Couper and Singer conducted a web study of how information about disclosure risk might affect survey participation. Results were following. 63.4% of those respondents who received a note describing a hypothetical survey and were then asked whether they would be willing to participate in the survey; if yes, whether they were willing to permit use of their paradata, agreed to do the survey and consented to paradata use. Same consent gave 59.2% respondents who received a note describing a hypothetical survey that they had already completed. Afterwards they were asked whether they would be willing to permit use of their paradata and 68.9% of them agreed. Differences between groups are statistically significant. Mentioning of paradata resulted in lower willingness to participate in the survey. Reasons respondents gave for refusing usage of their paradata were: concerns about aspects of paradata, with mentioning the tracking of browsing behavior, general privacy-related concerns. Many responses suggested confusion over the extent of paradata capture and additional explanation did not made things easier and more understandable (Couper & Singer, 2013: 63-65). As authors said presented experiment did not adequately inform respondents about methodology around paradata and to elicit their consent. But on the other hand respondents are probably not aware that paradata are unavoidably collected in the process of responding to a survey so the question really is whether respondents would consent to their use or not (Couper & Singer, 2013: 65-66). Heerwegh (a: 18) also opens the question about ethical concerns bonded to collection and analyzing paradata, because respondents may regard 164 | RSC, Number 6, Issue 2,May 2014 it as a tool to invade their privacy. It could be understood that collecting client side paradata should occur only if it is the only way of answering a research question, and if it does not mean an invasion of privacy. The question is also whether the use of paradata collected in web surveys reaches the level needing explicit mention to respondents. For many this arises ethical and legal dilemmas. Recent EU online privacy legislation and US regulations go in direction of requiring informed consent for the collection of any data other than the responses to the survey (Couper & Singer, 2013: 66). With implementing of new Law on Electronic Communications in Slovenia in 2013 (https://www.ip-rs.si/novice ...), were brought new rules regarding the use of cookies and similar technologies for storing information or access to information stored on a computer or users mobile device. The new legislation does not prohibit the use of cookies, but exacerbates rules on conditions of how cookies and similar technologies may be used. The stress is given to the requirement that the users are paired and that they should be offered a choice of whether they allow or not websites to use cookies. The new legislation is primarily aimed at better protect of users' online privacy. As can be seen from the Information Commissioner's guidelines on the use of cookies (https://www.ip-rs.si/fileadmin ...), that probable cookies originated in process of non-commercial research are not listed among the exceptions of cookies permitted for use without the prior consent nor among cookies which may not be used without previous consent of the user. However, it should be noted that the scope of usability of cookies is very vividly thus the guidance of the Information Commissioner will continue to be updated regularly. | 165 EMIRICAL PART: PARADATA IN TWO WEB SURVEYS The empirical part of this paper is based on two surveys conducted by an online questionnaire. The questionnaire was sent to Croatian small sized (up to 49 employees) and big sized (250 or more employees) organizations, with instructions to meet the person who is responsible for HRM and for the recruitment and employment of new staff in the organization. The target population (and sample in case of small organization) for questionnaire was determined with existing database in register of the Croatian Chamber of Commerce. Theme of the questionnaire (in both surveys identical) for the purpose of this paper was the impact of nonverbal factors on persuasiveness of individuals in business context. For purpose of this paper we are especially interested in paradata and some auxiliary data which were accessible on www. 1ka.si and were gathered along with survey data. Two main goals were, first to identify if there are some differences between those respondents who answered the questionnaire before or after reminder (there was only one reminder, which was sent to all units regardless of previous participation). Fully and partly completed questionnaires were included. And second, find out if there was some differences between respondents on the basis of time needed to answer questionnaire completely (only fully completed questionnaires included). 166 | RSC, Number 6, Issue 2,May 2014 Table 1: Samples. Type of org. % HRM manager (referent positions excluded) Age (mean ) Mean age of employ. (mean) Work exper. in years (mean ) Duration of educat. in years (mean) Number of employ. (mean) Gender small (up to 49 69% 40.9 37.6 20.4 16.0 * 64.1 M 42 or 41% empl.) F 60 or 590/ big (from 250 69% 39.0 39.6 14.9 16.6 439.4 M 34 or 390/ empl.) F 54 or 61%/ * Number exceedes 49, this could be due to variability in in data which is corrected in databases only in year interval, so some organizations included in sample exceed 49 emploees. Also: not all respondents answered to question about organization. | 167 DATA ABOUT DATA Table 2: Sample frame. Type of organization Population Included in survey (sent invitation) Planned response Realized response (status 5 an 6)* small (up to 49 employees) 75.917 1.333 Cca 10% eg. 130 7.88% eg. 105 big (from 250 employee) 449 414 Cca 10% eg 40 21,50% eg. 89 * Partially full or completed questionaire. 168 | RSC, Number 6, Issue 2,May 2014 Table 3: Basic data about questionnaire and survey Basic data about questionnaire and survey Big organization Small organization Number of questions 17 17 Variables 91 91 Items 214 277 partially or completed questionnaires 89 105 Language Hrvatski Hrvatski Estimated time for completion of q. 15min 0s 15min 0s Real time respondent spent on q. (partially or complete) 11min 54s 12min 43s Date of first item 6.1.2013 7.1.2013 Date of last item 28.1.2013 1.2.2013 Completed the survey (6) 69 89 Partially completed (5) 20 16 Total adequate(5+6) 89 105 Total inadequate 125 172 Total units 214 277 | 169 REMINDER AS STIMULUS TO PARTICIPATE Table 4: Participation in survey before and after reminder. Type of organization Invited to participate in survey Participated Participated before reminder Participated after reminder small (up to 49 employees) 1.333 105 32 (30,5 %) 73 (69,5 %) big (from 250 employee) 414 89 13 (14,6 %) 76 (85,4 %) Included: partially and fully completed questionnaires. It is slightly surprising that we can see in table 6 that in small organization, where time pressure is maybe more important factor than in big organization, proportion of answered questionnaires before reminder was bigger (30,5%) in comparison to big organization (14,6%). In both cases though, most questionnaires were answered after reminder. Table 5: Participation in survey before and after reminder - by gender. Type of organization Male % (of male) Female % (of female) small (up to 49 employees) before reminder 14 33,3% 16 26,7% after reminder 28 66,7% 44 73,3% big (from 250 employee) before reminder 4 11,8% 9 16,7% after reminder 30 88,2% 45 83,3% 170 | RSC, Number 6, Issue 2,May 2014 We see in table 5 that in small organizations male respondents in comparison to female respondents were slightly more willing to participate in survey before reminder. Right the opposite case was with respondents in big organizations. And in general, in both types of organization respondents of both genders were more willing to participate in survey after reminder. In further we calculated percent of before and after reminder respondents on basis of position in organization and field of education. Due to space limitation we would not display all data in tables. Analysis showed that in big organizations there exists the biggest percent of before reminder participants among respondents on owner position and smallest among respondents director of organization position. In small organizations there exists the biggest percent of before reminder participants among respondents on head of unit position and smallest among respondents director of district position. We have to note that the number of unit in some of groups is very small. Analysis also showed that in big organizations there exists the biggest percent of before reminder participants among respondents with education from natural sciences and smallest among respondents with education in technical field. In small organizations there exists the biggest percent of before reminder participants again among respondents with education from natural sciences (the only case with more than 50% respondents from group in before reminder participation) and smallest among respondents in technical field and other. Like previously said, we have to note that the number of unit in some of groups is very small. We also analyzed Pearson coefficient on duration of education, age and work experience in connection to size of organization to identify statistically significant differences. It showed that in our two surveys decision about participation in survey before one gets reminder does not depend | 171 significantly on chosen demographic characteristics, with exception of age in case of respondents in small organization (in case of big organization no statistically significant differences showed). TIME NEEDED TO ANSWER QUESTIONNAIRE In this section there were analyzed only those questionnaires that were fully completed (partials are excluded). Excluded were also those questionnaires in which more than 45 minutes for completion were needed (more than 3 times exceeded estimated time). On that criterion 7 items from big organizations and 10 from small organizations were excluded. The questionnaire, which was used in both discussed online surveys are composed of the following sets of questions divided into 6 pages. Page 1 with demographic questions about the respondent, a set of seven statements about non-verbal communication of the respondent (5 Point Likert-type scales), a set of eleven statements about the factors of persuasion (5 Point Likert-type scales). Page 2 with set of 16 statements about factors of movement and touch (5 Point Likert-type scales). Page 3 and 4 with set of 30 statements about factors of appearance and decoration (5 Point Likert-type scales). Page 5 with set of 10 statements covering the vocalic factors (5 Point Likert-type scales) and a set of 3 Statements time factors (5 Point Likert-type scales). And page 6 with questions about the company. From what we can see in Table 6 we have to argue that coefficients of Skewness and Kurtosis give us information about non-normal distribution in time needed to answer full or each page of questionnaire. That is why we also have to take into account medians which are in some cases very close to mean (2nd page about factors of movement and touch in both types of org., 3rd page about factors of appearance and decoration in small org., 4th page about 172 | RSC, Number 6, Issue 2,May 2014 factors of appearance and decoration in both types, 5th page about vocalic and time factors in big org.) and quite different from mean in other cases (full questionnaire, 1st and 6th page in both types, 3rd page about factors of appearance and decoration in big org., 5th page about vocalic and time factors in small org.). Never the less we will in further analysis regard mean/average as appropriate measure of mean value (t-test, Pearson coefficients). We can see that mean/average time needed for answering full questionnaire in both types of organization (calculated as average the difference in starting and ending time, pauses are not recorded and thus not taken into account) exceeds estimated time needed just slightly and that, on the other hand median is in both cases slightly under estimated time. As we also see in table 6, estimated time is exceeded in both types of organization in average time needed for answering to most of pages in comparison to estimated time. This appears regardless to page topic (discussed above). Exceptions are 2nd page about factors of movement and touch for both types, 3rd, 4th about factors of appearance and decoration and 5th covering the vocalic and time factors page in small organizations. We could argue that, due to longer time needed to really answer questions on each page in comparison to estimated time, that responder needed slightly more time to comprehend the topic of questions which was nonverbal communication cues based. These topics (especially for example appearance and touch) can be perceived as sensitive topics by many. That is why it is also useful information about median time needed for answering each page in comparison to estimated time. Due to non-normal distribution in time needed to answer each page of questionnaire. Median is slightly lower on almost all pages for both types of organizations than mean/average time (exception is 1st page for both types of organizations). This argues against some assumed difficulties in question comprehension mentioned above. | 173 In further we calculated t-test for identifying possible statistically significant differences between two types of organization. It showed that there exists statistically significant difference only on average time needed to complete 5th page covering vocalic and time factors. On this page respondent in big organizations needed more time to complete the page than respondents in small organizations. In all other pages, including time needed to complete full questionnaire, existing real differences were not statistically significant (due to space limitation tables are not included in paper). We could argue that respondents from both types of organization had taken similar effort to answer questionnaire regardless to time pressure which is in small organizations is, presumed, to be higher. We were also interested if time needed to answer full questionnaire or each page was statistically significantly connected with some control variables (age, work experience, education), if there exists some statistically significant differences in time based on gender of respondents. Firstly we wanted to know if there are some statistically significant differences between two types of organization on basis of chosen control variables. Computed t-test showed none statistically significant difeerences between respondenst from big or small organizations on age, years of work experience and years of education (due to space limitation tables are not included in paper). Hi square test also showed no statistically significant differences between two types of organizations on gender of respondents (due to space limitation tables are not included in paper). Respondents from both types of organizations seems to be quite comparable due to chosen control variables. In further we calculated Pearson's coefficients of correlation between time needed to answer full questionnaire or each page in questionnaire and previously mentioned control variables in both types of organizations (due to space limitation tables are not displayed). Some statistically significant correlations do exists and correlations are not the same if we consider type (size) of organization. Exception is 1st and 6th page where no significant correlations were calculated regardless to type of organization. 174 | RSC, Number 6, Issue 2,May 2014 On one hand time needed to answer full questionnaire is in big organization statistically significantly correlated with two of three control variables (positive with age and work experience). On the other hand none statistically significant differences exists in case of small organizations. If we consider each page separately, we see that in total more statistically significant correlations exists in case of big type of organization (8 in big type and just 2 in small type of organization). Also interestingly, time needed to answer to 2nd page about factors of movement and touch is in case of big organization statistically significantly correlated to all three chosen control variable (positive with age and work experience and negative with education) on one hand. And on the other, in case of small organization just one statistically significant correlation exists (positive with work experience). Other statistically significant correlations are as follows. In small organizations: in time needed to answer to 3rd page about factors of appearance and decoration with work experience (positive). In big organization: in time needed to answer to 4th page about factors of appearance and decoration with education (negative) and in time needed to answer to 5th page covering vocalic and time factors with age (positive) and work experience (positive). We also calculated t-test for identifying possible statistically significant differences in time needed to answer full questionnaire or each page in questionnaire regarding to gender of respondents in each type of organization. It showed that there exist two statistically significant differences in case of small organizations and one in case of big type of organization. In case of big organization male respondents in comparison to female respondents needed more time to answer 1st page about respondent's demographics, non-verbal communication of the respondent and about the factors of persuasion. In small type of organization male respondents in comparison to female respondents needed more time to answer 2nd page about factors of movement and touch. Female respondents in comparison to male respondents | 175 also needed more time to answer 5th page covering vocalic and time factors (due to space limitation tables are not included in paper). CONCLUSION In theoretical part of paper we first discussed definitions on metadata, auxiliary data and paradata - data about data. Main focus was given to data about data in case of web suveys where nowadays interesting area audit trail is. Beside potentially fruitful role of gathering and analyzing paradata in (web) surveys, some concerns also araises. There are open some ethical dilemas bonded to collection and analyzing paradata due to possible regarding paradata as a tool to invade responders privacy. But in opinion of some authors (and also ours), the real question one has to answer is whether the use of paradata collected in web surveys really reaches the confidentiality and other form of threat to respondents privacy to that point or level on which it is needed explicit mention. But on the other hand, with implementing of new Law on Electronic Communications in Slovenia in 2013 researcher have to be aware what legal and other connotation may be given also to area of paradata. In new law there were brought new rules regarding the use of cookies and similar technologies for storing information or access to information stored on a computer or users mobile device. In empirical part of paper we presented some metadata, auxiliary data and paradata gathered in two web surveys conducted on www.1ka.si. It is interesting to underline some results. In part where we were analyzing potential differences between pre and post reminder respondents in which we included partially and fully completed questionnaires, some results were quite interesting. In small organization, for example, where time pressure is maybe more important factor than in big organization, proportion of answered questionnaires before reminder was bigger in comparison to big organization. Although in both cases, most questionnaires were answered after reminder. It also seems that in our two surveys decision about participation in survey before one gets reminder does in some cases depend on chosen demographic 176 | RSC, Number 6, Issue 2,May 2014 characteristics. For example, in small organizations male respondents in comparison to female respondents were slightly more willing to participate in survey before reminder. Right the opposite case was with respondents in big organizations. In big and small organizations there exists the biggest percent of before reminder participants among respondents with education from natural sciences and smallest among respondents with education in technical field. In case of respondents in small organization before reminder respondents are older. In part in which we were interested in time spent by respondent to answer full questionnaire or each page of questionnaire were analyzed only those questionnaires that were fully completed (partials are excluded). Excluded were also those questionnaires in which more than 45 minutes for completion were needed (more than 3 times exceeded estimated time). Mean/average time needed for answering full questionnaire in both types of organization exceeds estimated time just slightly and that, on the other hand median is in both cases slightly under estimated time. Estimated time is exceeded in both types of organization in average time needed for answering to most of pages in comparison to estimated time (regardless to topic of page). We could argue that, due to longer time needed to answer questions on each page in comparison to estimated time, that responder needed slightly more time than it was assumed to comprehend the topic of questions which was nonverbal communication cues based. These topics (especially for example appearance and touch) can be perceived as sensitive topics by many. But in the other hand information about median time needed for answering each page in comparison to estimated time (which is slightly lower on almost all pages for both types of organizations) argues against assumed difficulties in question comprehension mentioned above. Calculated t-test for identifying possible statistically significant differences between two types of organization showed that just on one page (5th) respondent in big organizations needed more time to complete the page than respondents in small organizations. It could be argued that respondents | 177 from both types of organization had taken similar effort to answer questionnaire regardless to time pressure which is in small organizations, presumed, to be higher. We were also interested if time needed to answer full questionnaire or each page was statistically significantly connected with some control variables (age, work experience, education) and if there exists some statistically significant differences in time based on gender of respondents. Firstly we wanted to know if there are some statistically significant differences between two types of organization on basis of chosen control variables. Analysis showed that respondents from both types of organizations seemed to be quite comparable due to chosen control variables. Due to calculated Pearson's coefficients of correlation between times needed to answer full questionnaire or each page in questionnaire and previously mentioned control variables in both types of organizations some statistically significant correlations do exists. It is interesting that on one hand time needed to answer full questionnaire is in big organization statistically significantly and positively correlated with two of three control variables (age and work experience). On the other hand none statistically significant differences exists in case of small organizations. If we consider each page separately, we see that in total more statistically significant correlations exists in case of big type of organization (6 in big type and just 2 in small type of organization). Calculated t-test for identifying possible statistically significant differences in time needed to answer full questionnaire or each page in questionnaire regarding to gender of respondents in each type of organization showed some significant differences. In case of big organization male respondents in comparison to female respondents needed more time to answer 1st page which comprised (beside questions about respondent's demographics and about the factors of persuasion) some questions about self-evaluation of respondent's non-verbal communication. In small type of organization male respondents in comparison to female respondents needed more time to answer 2nd page about factors of movement and touch. Previous 178 | RSC, Number 6, Issue 2,May 2014 researches about nonverbal communication showed, that in many cases man have more trouble excepting touch as appropriate way of communication than women. In case of presented two surveys we operated with limited number of paradata which showed to be potentially informative and useful although there are some limitations which will be noted in further. This goes hand in hand with examples of other authors who indicate that paradata seems to be very useful tool to evaluate several points in survey, survey data and survey process. As some authors say, more methodological research is required to identify the key paradata items to be collected and the best ways to use those (Nicolaas, 2011: 4). That is why the use of paradata is still in need for development. Many also argue that little is known about the quality of paradata and, consequently the usefulness of the data (Nicolaas, 2011: 17). There are also some dilemmas about problematic viewpoints of gathering and usage of paradata connected with confidentiality of respondent and also interviewer. In my opinion it has to be carefully identified where paradata opens real confidential and other ethical dilemmas and where it is simply too much emphasis giving to unreal threat. There exists some of limitations that we have to highlight. Samples in both surveys are rather small to make definite conclusions in some cases. Although reminder was just one it was sent in two waves (1/2 respondents in first wave, 2/2 in second), so there could be some differences between first and second wave which were not analyzed (even though every respondent was sent just one reminder). Data base for contacts of units invited to survey may not be updated completely (differences in number of employees, actual existence of organization due to economic crisis, accessibility of e-mail - not for all, ect.) due to one year interval for updating information, freedom of organizations to give some information, like e-mail ect. | 179 It would be very useful to have more detailed paradata referring on each question of questionnaire. This could give more in-depth information about those questions which caused most difficulties for answering (e.g. Most time needed to answer, most corrections of initial answer ect.). But that assumes additional considerations mentioned in connection to the new Law on Electronic Communications in Slovenia excepted in 2013. Theoretical base is drawn down mainly from foreign sources so the question is whether the assumptions of foreign literature can be directly tested in (for discussed online survey) chosen environment (the study is limited to Croatian organizations). But on the other hand this is also one of the main contribution of this paper due to lack of researcher and paper on topic of data about data in survey research in non-English speaking environment (in our case Croatian). Although Couper (1998) originally coined the term paradata as a general notion for by-product process data which sticks to data about data (not just paradata) despite development of area, we could argue that in further researches of all types of data about data the set of data called now considered as by-products will decrease. Why? With further much needed research it will be more clear which concrete paradata are useful and appropriate for use in connection with what (with which product data or survey variables). That means it is needed better knowing which and where in terms of content paradata can be useful, so that they will be collected and treated intentionally not just as byproduct. For example, which paradata could be potentially useful in order to examine whether the observations in the case of sensitive issues really reflects respondents actual opinion or is it merely the result of answering on quickly and superficially read question without a clearly articulated views on the subject at which the question asks. With knowing the possible usefulness of concrete paradata the set of 'by-product paradata' will be smaller. There will be only those paradata for which we will not know what their specific usable value and another important group of paradata, the one that will eventually be used to respond to the behavior of the respondent 180 | RSC, Number 6, Issue 2,May 2014 when answering survey. For example paratada which will be used to encourage the respondent to participate till the end of questionnaire. If we draw a line, we can find necessary to conduct additional research on concrete applications of concrete paradata in connection to survey variables or primarily survey product data. Firmer link of paradata with content analysis is essential issue of testing the degree to which paradata are useful. Additional research is also needed to identify possible usefulness of paradata do activate and address some kind of motivational respond to the behavior of the respondent during participation in survey in light of problematic area of dropouts and decreasing response rates. | 181 References Casas-Cordero C., F. Kreuter, Y.Wang & S. Babey. (2013). 'Assessing the measurement error properties of interviewer observations of neighborhood characteristics.' Journal of the Royal Statistical Society 176 (1): 227-249. Couper, Mick P. and Frauke Kreuter. (2013). 'Using paradata to explore item level response times in surveys.' Journal of the Royal Statistical Society 176 (1): 271-286. Couper Mick P., Eleanor Singer. 2013. 'Informed Consent for Web Paradata Use.' Survey Research Methods, 7(1): 57-67. Horwitz, Rachel, Jennifer Guarino Tancreto, Mary Frances Zelenak & Mary Davis. (2012). Use of Paradata to Assess the Quality and Functionality of the American Community Survey Internet Instrument. U.S. Census Bureau: Washington. Scheuren, Fritz. Macro and Micro Paradata for Survey Assessment. Availble at: http://www.unece.org/fileadmin/DAM/stats/documents/2000/11/metis/c rp.10.e.pdf . (december 2013). Frost Hubbard, Ben Duffey. (2010). Paradata, Metadata, Auxiliary Data and other Data about Data: Studying Survey Costs and Errors in the New Millennium. Matt Jans U.S. Census Bureau: Michigan. Heerwegh, Dirk and Geert Loosveldt. (2002). 'An evaluation of the effect of response formats on data quality in Web surveys.' Social Science Computer Review 20 (4): 471-484. Heerwegh, Dirk. (a) Uses of Client Side Paradata in Web Surveys. Available at: https://perswww.kuleuven.be/~u0034437/public/Files/Heerwegh%20Uses %20of%20Client%20Side%20Paradata%20in%20Web%20Surveys.pdf (December 2013). 182 | RSC, Number 6, Issue 2,May 2014 Kdaj lahko uporabimo piškotke? Smernice Informacijskega pooblascenca. 2013. Available at: https://www.ip- rs.si/fileadmin/user_upload/Pdf/smernice/Smernice_o_uporabi_piskotkov .pdf (december 2013). Kreuter, Frauke, Mick Coupery, Lars Lybergz. (2010). The use of paradata to monitor and manage survey data collection. Section on Survey Research Methods - JSM: Maryland. Nicolaas, Gerry. (2011). Survey Paradata: A review. National Centre for Research Methods: Southampton. O'Reilly, Jim. Paradata and Blaise: A Review of Recent Applications and Research. Available at: http://ibuc2009.blaiseusers.org/papers/7d.pdf (December 2013). Pirc Musar, Nataša. (2013). Informacijski pooblaščenec izdal smernice glede uporabe piškotkov. 2013. Assessable on: https://www.ip-rs.si/novice/detajl/informacijski-pooblascenec-izdal-smernice-glede-uporabe-piskotkov/?cHash=5842ad1118a2ae1915f350cc1aa98c22 (December 2013). U.S. Department of Health and Human Services. (2012). 2011 National Health Interview Survey (NHIS) Paradata File: Public Use Data Release. Hyattsville: Maryland. Wagner, James. (2009). Paradata and Alternative Measures of Survey Data Quality. University of Michigan: Michigan. FIGURES AND TABLES Table 6 Time needed to answer full questionnaire or each page. Full questionnaire 1st page 2nd page 3rd page 4th page 5th page 6th page Type of organization Big Small Big Small Big Small Big Small Big Small Big Small Big Small N 62 79 62 79 62 79 62 79 62 79 62 79 62 79 Estimated time 15,00 15,00 2,80 2,80 3,00 3,00 3,00 3,00 1,80 1,80 2,10 2,10 0,61 0,61 (min) Mean time needed 15,127 15,746 4,6207 4,3608 2,7065 2,8152 3,0696 2,6835 1,8167 1,4620 1,534 2,5443 1,3793 1,8808 4 6 7 Std. Error of Mean ,87565 ,76659 ,42282 ,21315 ,29536 ,14712 ,49509 ,16522 ,21264 ,11368 ,0895 6 ,43613 ,15210 ,31882 Median 13,408 14,583 3,5333 3,8000 2,2250 2,5833 2,5167 2,3000 1,1750 1,2500 1,400 1,5333 ,9083 ,9167 3 3 0 Mode 12,93a 16,37 2,65a 2,20a 2,23 2,40a 1,58 1,62 ,85 1,48 1,13a 1,32a ,63 ,67 Std. Deviation 6,8948 6,8136 3,3292 1,8945 2,3256 1,3076 3,8983 1,4685 1,6743 1,0104 ,7051 3,8764 1,1976 2,8337 8 3 9 2 7 6 5 0 0 0 9 2 4 4 Variance 47,539 46,426 11,084 3,589 5,409 1,710 15,197 2,156 2,803 1,021 ,497 15,027 1,434 8,030 Skewness 1,923 1,152 2,880 1,262 5,531 1,668 6,808 1,499 2,585 2,718 2,385 4,438 1,704 2,806 100 | RSC, Number 6, Issue 2,May 2014 Std. Error Skewness of ,304 ,271 ,304 ,271 ,304 ,271 Kurtosis 4,551 1,379 9,573 1,911 36,746 5,317 Std. Error Kurtosis of ,599 ,535 ,599 ,535 ,599 ,535 Minimum 6,13 5,65 1,32 1,68 ,82 ,70 Maximum 42,55 39,13 18,97 11,35 18,62 8,97 a. Multiple modes exist. The smallest value is shown ,304 ,271 ,304 ,271 ,304 ,271 ,304 ,271 50,397 2,091 7,108 9,279 7,847 21,123 2,190 7,013 ,599 ,535 ,599 ,535 ,599 ,535 ,599 ,535 1,12 ,15 ,43 ,32 ,47 ,07 ,30 ,05 31,82 7,73 9,15 6,12 4,45 25,37 5,25 12,70