Drop out Factors in Data Literacy and Research Data Management Survey : Experiences from Lithuania and Finland

[full article, abstract in English; abstract in Lithuanian] 
The purpose of this paper is to develop an understanding of factors that affect the respondents to drop out of an already started survey on research data management. We decided to take a questionnaire on data management survey at Vilnius University and Oulu University implemented in 2017 as a case study. The data for the analysis was collected using the questionnaire, which was used in multinational research for Data Literacy and Research Data Management, performed by a group of researchers in more than ten countries, initiated by Serap Kurbanoglu and Joumana Boustany. This paper describes the analysis of 1 185 survey samples, of which 515 were unfinished and 670 finished in both universities. For the analysis of the data, we used Framework for Web Survey Participation created by Andy Peytchev (2009). The collected data was analyzed using IBM SPSS Statistics ver. 19 with descriptive and inferential statistical tests. The most significant factors on deciding not to finish the survey were the length of the survey, the scientific field, experience, age and the topic of the survey. No statistically significant difference was measured between those who finished the survey and unfinished evaluating the data by gender and job position. An important factor in not finishing the survey was the design of the survey.


Introduction
Many higher education institutions (HEIs) have recognized the need to develop Research and Data Management (RDM) services and are currently engaged in this activity (in the UK, over 40 universities have been involved in developing RDM services within Managing Research Data (MRD) programs).The increasingly collaborative nature of research is a pressing argument for RDM services.Research data exchange across different platforms is derived from the demand for effective storage, accessing and sharing data securely across multi-institutional research teams.
A significant move toward the management and research of data sharing is implemented at Vilnius University by creating a national, institutional repository for data archiving and preparing legal documents and recommendations for researchers on data management and sharing.Despite the efforts, very few researchers began to archive their data and were eager to prepare a plan for data management and sharing when there are no requirements from research funding organizations or scholarly journals.The Lithuanian research council does not imperatively require that the scholars prepare a data management and sharing plan.As a result, the majority of research data are not archived and cannot be accessed and re-used.Creating a data management and sharing plan would benefit both researchers and universities themselves.It would ensure the possibility of finding and understanding data when there is a need to use it, avoiding unnecessary duplication.For example, when recollecting or reworking data, the data underlying their respective publications are maintained, allowing for the validation of results; data sharing leads to more collaboration and advances in research; research becomes more visible and has greater impact; researchers can cite the data of other scholars, who, in turn, get credited for their studies (Jones, Pryor, Whyte 2013).
In Finland, most universities have published their data management guidelines.It includes the University of Oulu (University of Oulu 2018).Their research data policy highlights the benefits of sharing research data and points out the actions taken at the University of Oulu.Furthermore, the Ministry of Education and Culture promotes After implementing, the survey researchers found out that almost half of the respondents had started but did not finish the questionnaire.This aspect raised questions about what could be the reasons for not finishing the questionnaire.
The purpose of this paper is to develop an understanding of the factors that had affected the respondents to drop out of the already started survey on research data management.
This paper describes a study that combines the survey data from both countries and presents an analysis of 1 185 survey samples, of which 515 are unfinished and 670 finished.For the analysis of the data, we used the Framework for Web Survey Participation created by Andy Peytchev (2009).The collected data were analyzed using IBM SPSS Statistics ver.19 with descriptive and inferential statistical tests.

Research Data Management as an Important Factor in Research
René Schneider (2018) indicates that data is generally seen as a subset of information in itself (e. g., information), as the content is generally seen as knowledge communicated.Also, good data management is a key conduit in ensuring the authenticity, integrity, longevity and utility of datasets and making data easier to use and reuse, which may translate to more collaboration for researchers (Strasser 2015) Schneider (2018, p. 140).Sonja Špiranec and Denis Kos note that "openness is one of the key premises of contemporary research, which is reflected in many of its facets and processes, like access or diffusion, collaboration, evaluation etc.Although research and scientific knowledge are open by their very nature and intention, just the last decade brought this intrinsic notion to the fore, mainly owing to the availability of new digital technologies and collaborative tools" (Špiranec & Kos 2018, 148).For real benefits of open science, it is not enough to develop infrastructure; the basic issue is the willingness of researchers to share and publish their research data and build upon open data culture (Špiranec, Kos 2018).
Researchers and academics are important actors and research performers; their research data firstly are in their disposition, and those actors should be motivated and have the necessary competence to process their valuable research data, thus enabling data to be processed, shared and accessed properly.Competences in research data literacy are necessary: identifying, scoping, planning, storing, evaluating, managing and providing your data is imperative (Schneider 2018).

A Framework for the Web Survey
The survey is a quite popular method to collect data in the social sciences.Online surveying instruments are an effective way to gather information on a variety of aspects in all fields when it is intended to investigate the respondents' opinions on the issue and enables researchers for a prompt research process as well as the efficient processing of the findings and results.It was found that the average response rate in academic studies was 55.6 with a standard deviation of 19.7 (Baruch 1999).
Researchers are interested mostly in full answers, and only that is the data for them.However, there is another kind of data that could be interesting to analyze -the respondents who dropped out the questionnaire at the very start or middle of the survey.There are various factors that affect the response rates of a web survey, such as content, the presentations of questions, contact delivery modes, the design of invitations, the use of pre-notifiactions, reminders and incentives (Weimiao Fan, Yan 2010).The dropout rate in general invitation web surveys has to be taken into consideration.The proportion of dropout in some cases may reach even 80% (O 'Neil, Penrod, Bornstein 2003).On average, the dropout rate could vary from 25.3% to 44.3% (Bosnjak, Tuten 2001).
For classifying the response and nonresponse patterns in web surveys, we chose the Framework for Web Survey Participation created by Andy Peytchev (2009).The framework has three sets of factors: a) The respondents' characteristics (environment, socio-demographics, survey predispositions, topic involvement, cognitive ability); b) The survey design (selection and recruiting, topic, sponsor, incentive structure); c) The page and question characteristics (question content, question type (s), number of questions, real-time validation.These three factors have an impact on the actions with a survey.The respondents' factors and survey design impact the decision to start and to continue a survey, and they impact how and if the questions are answered.Page and question characteristics have an impact on the decision to continue to answer the questions and how they are answered.
We intend to use the framework to identify the causal mechanisms for specific outcomes and common causal mechanisms.
For naming types of participants, we chose to use response and nonresponse patterns in web surveys created by Michael Bosnjak and Tracy L. Tuten (Bosnjak, Tuten 2001).
Having in mind the structure of the survey, we decided to use three of these processing types in an analysis of the results: 1) Complete Responders (answer all questions); 2) Unit nonrespondents (did not participate in the survey), who are either (a) individuals that could have been technically hindered from participating or (b) individuals that have purposefully withdrawn after the welcome screen was displayed but before viewing any questions; 3) Answering Drop-Outs (provide answers to questions displayed but quit before completing the survey).
In our point of view, using all seven types would require to create a very specific questionnaire.Following the advice of the authors, the questionnaire should be with screen-by-screen design, non-restricted question design and each page of the questionnaire should be downloaded from the server.It helps to collect much more data on the respondents' behavior, but on the other hand, it leads to the creation of a less comfortable questionnaire for the respondents.In our point of view, it is more convenient to have a comfortable questionnaire with the possibility of collecting the necessary data on the respondents' behavior.

Respondents' Characteristics
Reasons for dropout could be analyzed by such characteristics of the respondents as environment, socio-demographics, survey predispositions, topic involvement and cognitive ability factors.
Education and socioeconomic status have an impact on the response rate.Those of higher status are more prone to finishing a survey (Vincent 1964).
A correlation between the survey topic and the participant's level of interest has a strong impact on finishing the survey (Armstrong, Overton 1977).It is related with the importance of motivation and the ability for processing messages fully (Chaiken 1980;Petty, Cacioppo 1984).
The responses from speedy respondents (those who manage to finish the survey much faster than average) should be analyzed with care.Their answers could be with primary effects when the first response is selected automatically not caring for what is asked (Malhotra 2008;Zhang, Conrad 2014).
An important positive factor for having a higher response rate is the personalization of the invitation (Joinson, Woodley, Reips 2007;Porter, Whitcomb 2003).This has to be done in an elaborate way.Using personalized greetings only has no significant impact on the responses.Also, the message should explain the reasons to why their response is important in a more elaborate way than just a precise introduction (Ding, Poquet, Williams, Nikam, Cox 2018).
A survey has to be mobile friendly.From 15% to 30% of the respondents participate in web surveys using mobile devices such as a tablet or phone (De Bruijne, Wijnant 2014;Lugtig, Toepoel 2016).Nowadays, this proportion is even higher.Some research showed a higher rate of drops outs in the web survey when using a mobile device (Bosnjak, Poggio, Becker, Funke, Wachenfeld, Fischer 2013; Wells, Bailey, Link 2014).We have to have this in mind when deciding on how long a survey should be.It was found that it takes on average 30% longer for mobile device users to finish surveys than for desktop users (Schlosser, Mays 2018).

Survey Design
Survey design, such as the selection and recruitment of respondents, the survey topic, sponsor and incentive structure, could also be an important factor for not finishing the survey.
The reminding elements in surveys is an important factor for preventing the decline of response rates (Keusch 2015).They do not necessarily have to be wordy (Klofstad, Boulianne, Basson 2008), but they have to be funny.Receiving the humour email remainder could increase the response rate up to 24% (Rath, Williams, Villanti, Green, Mowery, Vallone 2017).
Incentives have a positive impact on the response rate, but this does not necessarily have to be a lottery prize or something similar in value.Tuten, Bosnjak, and Bandilla (1999) found that altruistic motives (contribution to scientific research) stimulate more to finish the survey than a promise to get cash in return as a prize.The same results were proven in another study implemented by O 'Neil and Penrod (2001) as well.
Katja Lozar Manfreda et al. (2008) found out that the response rate is closely related to who the sponsors are.Surveys supported by government institutions have a higher response rate than those supported by commercial institutions.
The promise of the survey length has to be precise.Subjective experience during the survey (when one feels that finishing the survey will take longer than promised) increases the dropout rate (Galesic 2006).
Another important factor that influences the response rates is the topic.The topic with high interest to the surveyees helps in motivating to finish the survey (Dillman, Smyth, Christian, Dillman 2009); other studies also proved the topic as the most important factor for the response rates (Edwards, Roberts, Clarke, DiGuiseppi, Pratap, Wentz, Kwan 2002; Manfreda, Berzelak, Vehovar, Bosnjak, Haas 2008).

Page and Question Characteristics
Page and question characteristics, such as question content, question type, the number of questions and real-time validation have an impact on the response rate.
Surveyees do not like surveys with questions which are wordy, poorly-designed or flawed, with clause sentences, open-ended questions and a long list of response options, because it takes longer to finish the survey (Yan, Tourangeau 2008;Couper, Kreuter 2013;Lenzner, Kaczmirek, Lenzner 2010;Liu, Wronski 2018).Questions that are arranged in tables and in arrangements of questions that are graphically-complex (Frick, Bächtiger, Reips, 2001), inappropriate visual design (Heerwegh, Loosveldt 2002) and some questions in blocks (Galesic, 2006) have a negative impact on response rate.
Asking sociodemographic information at the start of the survey motivates to finish the survey up to 7% more often in comparison with positioned at the end of the questionnaire (Frick, Bächtinger & Reips 1999).
One of the most popular ways to organize the survey is a page-by-page version.One of the negative aspects of this type of survey is a progress indicator, which decreases the participation rates, especially at those cases when the survey is longer (up to 20 minutes) (Matzat, Snijders, van der Horst 2009).
Progress indicators when the survey is long tend to be a reason for higher drop out rates (Matzat, Snijders, van der Horst 2009).
Difficult questions should be placed at the end of the survey, especially when surveyees can scroll over the questions.When surveyees found difficult questions at the start of the survey, they concluded that finishing the survey will require too much effort, and this could have potentially influenced a lower response rate (Ganassali 2008).
Interestingly, surveys starting with open-ended questions tend to have a lower response rate in comparison with multiplechoice questions (Liu, Wronski 2018).
Response time could be a measure to evaluate the quality of survey responses.Ting Yan et al. (2015) and Mick P. Couper and Frauke Kreuter (2013) have found out answers alone do not provide enough information about how good or bad their answers are.It is important to measure how long the respondent spends to answer questions.Those who managed to finish much faster than average could be treated as skimmers and their responses as less valid.Also, respondents need to understand the meaning of the question and determine what the question refers to.Response options are very important, as the respondents have to decide what is the difference between answer options, for example, as "somewhat satisfied" and "not very satisfied."It is very important to propose the questions for respondents seeking to avoid a misunderstanding of the intent of the question (W Fan, Yan 2010; Yan, Ryan, Becker, Smith 2015).
One of the reasons for dropping out could be a form of the survey as well.Katja Lozar Manfreda et al. (2008) analyzed 45 studies examining differences in the response rate between web surveys and other survey modes.They found out that the response rate in the web survey on average is approximately 11% lower than that of other survey modes (Manfreda, Berzelak, Vehovar, Bosnjak, Haas 2008).

Data Collection and Participants
The study was conducted in the form of a survey using Computer-assisted web Interviewing (CAWI), with data being gathered via an online survey tool limesurvey.com.There, we present the basic data of the surveys, which we decided to analyze for drop out factors.
Data were collected between February and March 2017.Respondents to the survey were researchers, doctoral students and emeriti from Vilnius University.By the end of the survey period, data were gathered from 457 respondents from Lithuania, of which had 255 not finished and 202 finished their surveys.
In Finland, the survey was conducted by researchers from the University of Oulu, but the data were also collected by approaching several other universities and research institutions.The survey (in English, Finnish and Swedish) was distributed by utilizing the higher education institutions' and research organisations' personnel involved in the development of the national DMP tool [13].These persons were contacted via email and asked to distribute the survey in their organizations.The data were collected in June and July 2017.In Finland, the data were collected from 728 respondents, of which 260 had not finished and 468 finished their surveys.
Data analysis from both of the countries was combined, including a total of 1 185 answers.The data were collected in accordance with confidentiality procedures.In Tables 1 and 2, we present the data of those who finished the first part (demographic).
Gender equality is well represented in the survey.In the Lithuanian case: 44.6% of the respondents are males, 53.6 -females, and 1.8% did not want to disclose or did not answer.In the Finnish case: 38.6% of the respondents are males, 57.6 -females, and 3.1% did not want to disclose or did not answered, and 0.8% selected other.

Measures
The data for the analysis was collected using a survey that was used in multinational research for Data Literacy and Research Data Management, performed by a group of researchers in more than ten countries, initiated by Serap Kurbanoglu and Joumana Boustany.The survey instrument consists of 24 questions arranged into two groups: the awareness of data management issues and demographic information.Mixed type questions (response options according to a five-point Likert scale etc.) were used, and the response options, which ranged between strongly agree/strongly disagree; almost always/never; yes/no/uncertain etc., were presented.The purpose of the survey was to find out the current levels of the awareness and gaps in knowledge of data literacy and management of university communityacademic staff and doctoral students.
By analyzing the data from the two surveys, we seek to develop an understanding of the factors that represent the respondents who decided to drop out of already started surveys.Typically, we analyze those who responded to the survey.Nevertheless, there is another important group of researchers that had showed interest in the survey -they had begun answering it, but after some time decided to drop out.This analysis will help to see the other side of data management and will aid in finding out the factors that influenced some of the respondents to not finish the survey.

Analytical Approach
For classifying response and nonresponse patterns, we chose the Framework for Web Survey Participation created by Andy Peytchev (2009).We analyzed the data using three sets of factors: a) Respondents' characteristics (environment, sociodemographics, survey predispositions, topic involvement); b) Survey design (topic, sponsor, incentive structure); c) Page and question characteristics (question content, question type (s), number of questions, realtime validation).
For naming the types of participants, we chose to use the response and nonresponse patterns in web surveys created by Michael Bosnjak and Tracy L. Tuten (Bosnjak & Tuten 2001).We decided to use three of these processing types in an analysis of the results: 1) Complete Responders (answered all questions); 2) Unit nonresponders (did not participate in the survey); of these,

General Results
More than one quarter (27.35%) of the respondents from Lithuania and 15.52% from Finland opened the introduction only.We excluded these results from further analysis with the SPSS.For further analysis, only those who answered to demographic questions were left.In the Lithuanian case, this constitutes 332 answers, and in Finland's case -615 answers; in total, the further analysis consisted of 947 answers.
Time spent to answer all questions showed that the questionnaire was more difficult than average.See Table 4.
Surveymonkey.comtook a random sample of 100 thousand surveys that were from 1 to 30 questions in length (Chudoba 2018).They analyzed the average time spent to complete the survey with a certain number of questions.Their findings showed that 24 questions-long questionnaires should take to answer 9 minutes on average.There is a difference between the Lithuanian and Finnish results.In the Lithuanian case, for more than a half of the respondents, it took longer than 10 minutes to answer the questions.
The Finnish case is contrary to this -only one-fifth of the respondents stayed doing the survey for longer than 10 minutes.The topic of a questionnaire itself seems to be one of the reasons to drop out of the survey.At the Lithuanian case, 45.5% of all respondents did not start the second part of the questionnaire at all.In Finland case the number is even higher -66.2%.

Correlation between Actions with the Questionnaire and Demographic Factors
A Kruskal-Wallis test showed that in the Lithuanian case, there was a statistically significant difference in actions with a questionnaire between the discipline (p = 0.050) and time spent to answer the questions (p > 0.001).An inspection of group descriptive statistics suggests that scientific representatives were more eager to finish the questionnaire than representatives from the branches of either the social sciences or the humanities.In analyzing those who started the second part but did not finish the questionnaire, the evident leaders are respondents from the humanities.An inspection of the descriptive statistics of time spent answering the questionnaire clearly showed that almost all who have spent longer than 5 minutes have also finished the questionnaire.
In the Finnish case, there was a statistically significant difference in actions with the questionnaire between age (p = 0.036) and time spent answering the questions (p > 0.001).An inspection of group descriptive statistics suggests that young representatives (age group from 26 to 35) are more eager to finish the questionnaire.Also, they were those who most often finished only the first part and started the second part but unfinished the questionnaire.An inspection of the descriptive statistics of time spent answering the questionnaire clearly showed that almost all who have spent longer than 5 minutes have also finished the questionnaire.
Chi-square test results in the Finland case showed a strong association between the time of involvement in research and time spent on the survey.Descriptive statistics showed that those who have been involved in research from 5 to 15 years were more eager to finish the questionnaire.In the cases of Finland and Lithuania, it is evident that those who have spent longer than 5 minutes have also finished the questionnaire.(2) = 292.597,p = 0.000 Χ 2 (3) = 575.265,p = 0.000 between 5 and 10 minutes Χ 2 (2) = 79.191,p = 0.000 Χ 2 (3) = 109.900,p = 0.000 between 10 and 15 minutes Χ 2 (2) = 40.809,p = 0.000 Χ 2 (3) = 45.920,p = 0.000 longer than 15 minutes Χ 2 (2) = 25.531,p = 0.000

Discussion
An analysis of the unfinished responses is one of the methods for understanding another side of the surveyed community's attitude toward the survey topic.Those who decided not to finish are not interested (or at least less interested) in the topic of the survey.It could be clearly be seen from analyzing the number of participants who finished the first part (demographic) but dropped out soon after starting the second part (question on research data management).Commercial institutions did not support this survey, so it was not a factor for dropping out.The topic is not very familiar to the scholars both at Vilnius University and the University of Oulu.So not being familiar with the topic could be one of the reasons to drop out.We could support this claim with the data regarding the stage when most of the respondents decided to drop out -after finishing the demographic part and before starting the second part, where questions on research data management were listed.Only a minority of the respondents (who hadn't finished the survey) even began the second part.
The length of the survey could be a reason for dropping out, too.The questionnaire was more than 10 thousand characters long.It required a great effort to finish the questionnaire.As the analysis of time spent finishing the survey showed, the respondents spent more time finishing the questionnaire than is the average time usually spent on finishing surveys of this length.It shows that the questionnaire's questions were either very wordy or difficult to understand (this being an unfamiliar topic).It leads to the conclusion that the survey required a high cognitive effort to be finished.Also, the questionnaire's design (questions in wordy matrix tables) makes the survey friendlier for smartphones.
There were respondents who managed to finish the questionnaire much faster than the average.It leads to the conclusion that they just skimmed over the survey, and that more critical attention should be implemented to these respondents.
The response rate of the web surveys is approximately 11% lower than other forms.We should consider implementing the same survey using the paper and pencil method.
Many answers to the questions were the type of the Likert scale.The reasons to drop out could be an inability to decide how to interpret the difference between the Likert scale's options as well.
In our case, there is another factor, which was not discussed but has a high impact on the success of a survey -that what is done after the survey is delivered to its potential respondents.As we analyzed in the theoretical part, a reminding letter has to be written soon after sending the main invitation.Also, the letter itself has to be humorous and as much personal as possible.
The authors think that there is a need to analyze drop out data from more than two countries to prove the validity of the factors that have an impact on not finishing the survey.It could be valuable information for the improvement of the questionnaire and questionnaires in general.Also, these insights could benefit future studies with any related topic (complex and new), as some adjustments could be made before starting the survey and for making predictions in advance on what could the weakest points be.
There are some limitations to our study.First of all, we have used static questionnaire data and did not manipulate the order of the questions, the length of the survey, which makes the conclusions on survey lengths and the number of questions not fully proved.On the other hand, we think that we have proved this questionnaire to have been too long using other indirect methods.
What could be done to keep the respondents from dropping out of the survey on data management?In such kind of research where a topic for the surveyees is relatively new (as theoretical and contextual background showed it was), one has to give more attention to capturing the respondent's interest from the start to the end of the survey.The survey itself has to be more enjoyable.The authors of these kinds of surveys have to give infuse their surveys with additional content so as to keep the respondents from dropping out.Also, the survey design has to be suitable for smartphone users.

Conclusions
The most significant impact on deciding not to finish the survey was the topic of the survey and the length of the survey (the survey being too long).By analyzing the potential demographic factors behind the reason not to finish the survey, we have found the most important to be the scientific field, experience and age.Humanities are those who were least eager to finish the survey.Researchers at the start of their careers started and finished the survey more often than others.No statistically significant difference was measured between those who finished and not finished the survey when evaluating the data based on gender and job position.
The creators of the questionnaire on research data management should consider shortening the survey, arranging the questions in a less complex form -to give more attention to the terminology used in the survey -preparing different versions of the questionnaire for those who are familiar with some conceptions on research data management and for those who are amateurs on the topic.Also, there should be different survey versions designed for separate research fields.Peytchev, 2009).Gautieji duomenys apdoroti taikant IBM SPSS Statistics 19 versiją, naudojant aprašomosios ir inferencinės statistikos testus.Tyrimo rezultatai atskleidė, kad svarbiausi veiksniai, nulėmę nebaigtą anketos pildymą, yra anketos apimtis, respondentų atstovaujama mokslo sritis, mokslinio darbo patirtis, amžius ir tyrimo tematika.Einamos pareigos ir respondentų lytis reikšmingos įtakos tam neturėjo.Svarbus veiksnys, lemiantis anketos pildymo neužbaigimą, yra ir klausimyno dizaino bei struktūros sprendimai.

Table 2 .
Experience in research