202 EnetCollect – European Network for Combining Language Learning with Crowdsourcing Techniques (COST Action CA16105): a review of the project’s vision, organization, progress, and achievements Lionel NICOLAS Eurac Research, Institute for Applied Linguistics, Bolzano Verena LYDING Eurac Research, Institute for Applied Linguistics, Bolzano This article reviews the European Network for Combining Language Learn- ing with Crowdsourcing Techniques (enetCollect), an extensive network pro- ject created to foster research and innovation (R&I) on the combination of crowdsourcing and language learning. Accordingly, we explain how it began, introduce its overall logic and organization, and discuss its achievements in terms of both (1) creating a new R&I community through a concluded large network project, and (2) fostering R&I on a high-potential and mostly unex- plored subject. Nicolas, L., Lyding, V.: EnetCollect – European Network for Combining Language Learning with Crowdsourcing Techniques (COST Action CA16105): a review of the project’s vision, organization, progress, and achievements. Slovenščina 2.0, 10(2): 202–225. 1.03 Drugi znanstveni članki / Other Scientific Articles DOI: https://doi.org/10.4312/slo2.0.2022.2.202-225 https://creativecommons.org/licenses/by-sa/4.0/ 203 EnetCollect – European Network for Combining Language Learning... We also discuss the challenges involved and lessons learned, whether in orchestrating and leading a new R&I community or the challenges we faced and generally observed in the efforts of enetCollect members, as they ex- plored the many facets of such a versatile enterprise. Keywords: enetCollect, COST Action, Crowdsourcing, Language Learning 1 Introduction EnetCollect, the European Network for Combining Language Learning with Crowdsourcing Techniques, was a network project running as a COST Action funded by the COST (European Cooperation in Science and Technology) Association through the Horizon 2020 Framework Programme of the European Union. As explained on the COST website,1 “A COST Action is an interdisciplinary research network that brings re- searchers and innovators together to investigate a topic of their choice for four years. COST Actions are typically made up of researchers from academia, SMEs, public institutions and other relevant organizations or interested parties.” EnetCollect started in March 2017 and ended in September 2021 after an extension of six months was granted to partly remediate the challenges posed by the COVID-19 pandemic. Over its 4.5 years of existence, enetCollect involved more than 200 stakehold- ers from 41 different countries. It served as a starting point or catalyzer for more than 50 scientific publications, as well as several project pro- posal submissions and related funded research projects. EnetCollect sought to create a new Research and Innovation (R&I) trend combining the well-established domain of language learning with recent and successful crowdsourcing approaches to leverage for all languages, in the medium- to long-term, the crowdsourcing potential of an ever-growing number of language learners and teachers. Such potential would fuel an innovation breakthrough for producing two types of cost-intensive materials: language learning materials, such as lesson or exercise content, and language-related datasets, such as Natural Language Processing (NLP) resources. 1 https://www.cost.eu/cost-actions/what-are-cost-actions/ 204 Slovenščina 2.0, 2022 (2) | Articles In Section 2, we describe the premise and existing work that enet- Collect built upon and, in Section 3, detail its objectives and organiza- tion in working groups. In Section 4, we discuss the achievements that could be reached with respect to its objectives, as well as the chal- lenges we faced and the lessons we learned from it. Finally, Section 5 concludes the article and introduces D4Collect,2 a DARIAH Working Group created as a follow-up to enetCollect. 2 Background and related work In terms of international groups of stakeholders exploring the com- bination of crowdsourcing and language learning, enetCollect was a first-of-a-kind project. Indeed, unlike many other network projects that grew out of smaller initiatives (e.g. specialized workshops or task forc- es), most of the stakeholders had yet to work on the subject, and only a few had interacted with one another before participating in enetCol- lect. The majority thus spontaneously joined because of enetCollect’s appeal to their respective interests, be it in terms of language learning, crowdsourcing or the creation of language-related datasets. Likewise, they started exploring the subject almost from scratch, as very few past works were directly relevant to it. Indeed, before 2017, only a few initiatives had combined both lan- guage learning and crowdsourcing. Notable among them is the well- known online language learning platform Duolingo (von Ahn, 2013), which originally followed an enetCollect-compatible logic to crowd- source translations and, to this day, offers some loyal users the pos- sibility of participating in the creation of language learning material.3 Moreover, only a limited number of research efforts have combined language learning and crowdsourcing to produce part-of-speech cor- pora (Sangati et al., 2015) or syntactic knowledge (Hladká et al., 2014). Nonetheless, the individual states of the art for crowdsourcing, language-related datasets creation and language learning are ex- tensive. While their characteristics and similarities with enetCollect’s objectives fall far beyond the scope of this article, we can, however, 2 https://www.dariah.eu/act iv i t ies/working-groups/combining-language-lear- ning-with-crowdsourcing-techniques-d4collect/ 3 As do other platforms, like Memrise (https://www.memrise.com/) 205 EnetCollect – European Network for Combining Language Learning... point out that they include – with respect to crowdsourcing and NLP dataset creation4 – numerous efforts implementing approaches such as Wisdom-of-the-Crowd5 (WoC), Human-based Computation6 (HC) or Games-With-A-Purpose7 (GWAP), and – with respect to language learning – an even wider number of efforts related to, among others, the various different Computer-assisted Language Learning (CALL) communities8. 3 Objectives and organizational structure 3.1 Objectives As previously mentioned, the overall objective of enetCollect was to kickstart a new R&I trend on the combination of language learning and crowdsourcing in order to trigger an innovation breakthrough for the production of both language learning material and language-related datasets, such as NLP resources. Integrating crowdsourcing approach- es into the language-learning material-creation workflow promises to facilitate the production of even more diversified language learning materials and language-related datasets at reduced cost by outsourc- ing part of the cost-intensive manual work to crowds of teachers and learners (see Sections 4.2 and 4.3 for more details and references). This would contribute to addressing the two challenges of (a) foster- ing the language skills of all citizens in a globalizing world regardless of their diverse social, educational, and linguistic backgrounds, and (b) solving the longstanding challenge of creating extensive language-re- lated datasets for all languages taught, not only those that receive the most financial and research support. In order to foster this new R&I trend, enetCollect tackled several network- and research-oriented subgoals. 4 Relevant references can be found by searching for the term “crowdsourcing” in the ACL An- thology (https://aclanthology.org/) 5 https://en.wikipedia.org/wiki/Wisdom_of_the_crowd 6 https://en.wikipedia.org/wiki/Human-based_computation 7 https://en.wikipedia.org/wiki/Human-based_computation_game 8 E.g. the European Association of Computer Assisted Language Learning (http://www.euro- call-languages.org/) 206 Slovenščina 2.0, 2022 (2) | Articles Network-oriented goals (1) To bring together relevant stakeholders from different domains (language learning, crowdsourcing, language-related domains, es- pecially NLP, and computer science in a wider sense) interested in exploring the combination of language learning with crowdsourc- ing techniques to reach their respective objectives. (2) To establish and consolidate communication channels and dis- semination procedures. (3) To foster complementary and follow-up project building and fund- ing acquisition. Research-oriented goals (4) To create a shared understanding and theoretical framework to ap- proach the combination of language learning and crowdsourcing by revising the state of the art, analyzing directly and indirectly re- lated approaches, and establishing a shared terminology. (5) To research use cases, work on prototypes combining language learning and crowdsourcing and gather evaluation data. 3.2 Network structure EnetCollect was organized around five distinct yet interconnected working groups whose efforts directly tackled the two aforementioned research-oriented goals: • WG1, R&I on explicit crowdsourcing for language learning material production, • WG2, R&I on implicit crowdsourcing for language learning material production, • WG3, user-oriented design strategies for a competitive solution, • WG4, technology-oriented specifications for a flexible and robust solution, • WG5, application-oriented specifications for an ethical, legal and profitable solution. Working Groups (WGs) 1 and 2 were created to tackle the core objectives of enetCollect, namely researching how crowdsourcing 207 EnetCollect – European Network for Combining Language Learning... techniques could be applied to language learning. A practical dis- tinction was made between works focused on explicit crowdsourcing (WG1) and works focused on implicit crowdsourcing (WG2), where ex- plicit crowdsourcing refers to activities where the crowd is aware of their participation in a crowdsourcing effort and intentionally partici- pates. In contrast, in implicit crowdsourcing activities the crowd is not necessarily aware of their contribution to a crowdsourcing effort, or the act of contributing is not the primary motivation for their participation. Such a distinction was made pragmatically, as we expected WG1 ac- tivities to be mostly targeted at crowds of teachers and WG2 activities at crowds of learners. Accordingly, we expected WG1 members to be less interested in WG2-related activities, and vice versa, and wanted to ensure an effective use of the participants’ time and effort. Unlike the first two WGs, Working Group 3 (WG3) was focused on language learning only and aimed at reviewing and exploring user-ori- ented design strategies for online language learning applications, with the ultimate intent of fostering know-how with regard to attracting and retaining a crowd of teachers and learners. Finally, Working Groups 4 and 5 (WG4 and WG5) were focused on the technical aspects (WG4) and the ethical, legal, or business-related aspects (WG5) of applications for language learning and crowdsourc- ing. They were established to account for and study the transversal challenges met by the efforts undertaken in WG1, WG2 and WG3. Besides the five working groups, three additional coordinating groups called the Outreach coordination, Dissemination coordination and Exploitation coordination, were created to better address WG- transversal needs and ensure, whenever possible and relevant, homo- geneous approaches in doing so. Such coordination groups were thus designed to better monitor and support the efforts tackling the three aforementioned network-oriented goals. 4 Achievements, failures and lessons learned In this section, we discuss the extent to which enetCollect succeeded in pursuing the five goals. 208 Slovenščina 2.0, 2022 (2) | Articles 4.1 Network-oriented objectives 4.1.1 Bringing together relevant stakeholders from different domains EnetCollect was originally designed to involve stakeholders fitting four profiles: (1) content-creation experts, ranging from teachers to re- searchers; (2) content-usage experts, primarily teachers, who would provide end-user perspectives for the creation of crowdsourced ma- terial; (3) crowdsourcing experts, mostly researchers, concerned with crowdsourcing strategies and methods; and (4) Content Man- agement System (CMS) developers, especially Learning Management System (LMS) developers, who would provide expert knowledge to study the technical conditions needed to devise an adequate online environment. As the participants often matched more than one target profile, we are unable to provide precise statistics regarding the composition of the enetCollect network. Nonetheless, we can attest that all four targeted groups were represented, with university stakeholders (researchers and language teachers) making up the greatest part. In contrast, con- tent-creation experts, CMS developers and, in general, non-academic and commercial stakeholders, took much more effort to engage (even though some did participate, especially through meetings). This can be explained by enetCollect’s research-oriented nature (like most COST Actions) and by its funding scheme, which does not cover human re- sources but networking activities. EnetCollect’s topic fits into the agen- da of researchers, especially young ones, rather than those of language learning teachers, textbook creators or online providers, who usually follow output-oriented, well-defined and established procedures with little room for exploration, even more so when the cost of human re- sources is not covered. Overall, enetCollect brought together an interdisciplinary con- sortium of more than 120 Management Committee (MC) members, 200 associated members registered on the intranet (including MC members) and more than 275 people signing up to the main mail- ing list (including associated members). As shown in Figure 1, the growth in intranet and mailing list registration was constant until 209 EnetCollect – European Network for Combining Language Learning... the beginning of the COVID-19 pandemic, while the number of MC members almost reached the maximum possible after one year. As the network grew in a rather fast and organic fashion, no transversal need was identified, and the Outreach coordination group that was originally appointed to tackle any such related need quickly became inactive. Figure 1: EnetCollect member statistics over the lifetime of the Action. In terms of direct in-person interactions and collaborations, the Action funded 54 scientific exchanges between pairs of members and organized two hackathon-like events that allowed members (74 participants overall) to intensively collaborate and start new shared efforts over the span of a few days, and nine meetings allowing mem- bers to present and discuss their results (519 participants). Numer- ous collaborative efforts took place online, especially after the COV- ID-19 pandemic had started and in-person interactions were not recommended. We would have considered this goal entirely fulfilled had we man- aged to involve more of the less-represented profiles noted earlier. Nonetheless, our achievements were satisfying, especially regarding the relatively high number of interactions and participations, as it is comparable with some small- and medium-sized well-established re- search communities we know of and participate in. 210 Slovenščina 2.0, 2022 (2) | Articles 4.1.2 Establishing and consolidating communication channels and dissemination procedures In terms of communication, we set up a website, an intranet, three social media accounts (Twitter, Facebook and ResearchGate), a video channel (Videolectures), a Zotero repository for scientific publications, 19 different mailing lists, as well as branding materials (logo, flyers, a Microsoft PowerPoint template, etc.). So as to disseminate enetCol- lect’s achievements outside the network, the Action funded ten partici- pations at scientific events (mostly conferences). The website’s primary use was to share enetCollect’s objectives and achievements with a wider audience. Before the COVID-19 pan- demic, the website averaged 3,000 visits per month. The use of the intranet was minimal, but allowed us to make available documents and obtain basic yet practical information about the members at reg- istration. The social media accounts, video channels, and publication repository allowed for better internal dissemination among members. In contrast, the numerous mailing lists allowed for better targeting of the relevant set of members for every communication channel. The Dissemination coordination group greatly facilitated the organization of the dissemination efforts in a systematic fashion. Overall, the main communication channels showed a steady increase in use, which sub- sequently and suddenly dropped with the onset of the pandemic. While not all communication efforts were fruitful (e.g. some mail- ing lists remained inactive), we consider dissemination an achieved goal. As a positive side-effect, it also allowed the members contribut- ing to it to gain practical experience and skills, which will certainly be of interest to future network initiatives. 4.1.3 Strategies for related project building and funding acquisition Three levels of funding acquisition were actively considered and fos- tered by sharing information on the enetCollect website and mailing lists, motivating members at meetings and via email, and by offering information and specialized sessions at enetCollect events. The first level consisted of smaller project funding that could ac- company enetCollect as soon as possible and contribute to achieving 211 EnetCollect – European Network for Combining Language Learning... its objectives throughout the lifetime of the Action and thereafter. For this line of initiatives, we identified several options: national COST- related funding (as found in Switzerland, Turkey, etc.), PhD scholar- ships associated with enetCollect member institutions, Marie Cu- rie Individual Fellowship grants and small-scale national funding schemes. EnetCollect members were successful in acquiring some national COST-related funding, PhD scholarships and small-scale na- tional funding, while only one Marie Curie Individual Fellowship was obtained. The second level of funding corresponded to funding to acquire mid-way through the Action period in order to follow up on specific as- pects of enetCollect after its end. For this line of initiatives, the Eras- mus+ Key Action 2 scheme, European-funded joint projects across two countries and medium scale national funding schemes were identified as relevant. Related efforts led to a few project applications, which were unfortunately unsuccessful. This was mainly due to the pandem- ic-related cancellation of meetings and of intensive network interac- tion during the second half of the Action, leading enetCollect consortia to discontinue the preparation of new proposals and/or the improve- ment of rejected ones. The third level of funding was sought towards the second half of the Action to further develop enetCollect with the objective of creating a long-term stable research and application context. This funding ef- fort was expected to be piloted by a consortia of enetCollect leaders. For this line of initiatives, we considered the Horizon 2020/Europe re- search and innovation program and ICT training networks. Similarly to the second level of funding, the work on preparing such a large-scale proposal was discontinued when the COVID-19 pandemic hit, as the consortia of enetCollect leaders focused on keeping their respective parts of enetCollect as active as possible. For all of the above reasons, we would consider this objective as mostly unfulfilled. As lessons learned, we believe our efforts should have been more narrowly targeted at only a few well-identified fund- ing schemes accessible to most members. In that respect, we believe the Marie Curie Individual Fellowships and the Erasmus+ Key Action 2 schemes to be the most relevant. 212 Slovenščina 2.0, 2022 (2) | Articles 4.2 Research-oriented objectives 4.2.1 Transversal challenges faced Research-wise, we could observe three major transversal challenges. Regarding the first challenge, network project schemes such as COST Actions do not typically cover human resource costs but primarily rely on a stakeholder’s willingness to invest time in the short-term for a medium- to long-term return on investment in the form of scientific publications and/or funded projects. At the same time, because these schemes are open by nature, they rely on meetings and scientific ex- changes to define milestones and make progress. While the fact of not covering human resource costs naturally limits the participation of non- publicly funded stakeholders while fostering the participation of publicly funded ones, both this aspect and the need for scientific exchanges and meetings are impossible to fulfill if the participating stakeholders are un- able to allocate time or meet in person, which is what happened during the chaotic COVID-19 period. Our experience tells us that network pro- ject schemes should cover some minimal human resource costs, espe- cially for the leaders of the project, and should factor in the possibility of being put on hold if extraordinary circumstances require it. With respect to the second challenge, and as recorded in the Zote- ro repository, enetCollect members published more than fifty scientific journal, workshop and conference articles, thus creating the ground- work for a previously largely unexplored topic. Nonetheless, enetCol- lect’s own interdisciplinary nature (linguistics, lexicography, language studies, language pedagogy, Computer-Assisted Language Learning, NLP, etc.) proved challenging for the publication of scientific articles for various reasons connected to the emerging and interdisciplinary research subject tackled by enetCollect. As such, its publications in- directly relate to several research areas without having its own venue and audience outside of the project itself. Therefore, publishing works that are related to, but do not fully match the expectations of a research community, has proven to be challenging on various occasions for vari- ous reasons. First, reviewers naturally have specialized knowledge and expectations on only part of the interdisciplinary subject discussed (i.e. the language learning or the computational/crowdsourcing side). As a 213 EnetCollect – European Network for Combining Language Learning... result, scientific publications need to be tailored to the specific inter- ests of the targeted research community, thus forcing authors to prior- itize some research aspects over others. Second, and unexpectedly, a notable number of reviewers also had inadequate expectations regard- ing aspects they knew little about (e.g. NLP reviewers with respect to CALL-related evaluation procedures), thus compelling authors to spend time addressing their concerns. Third, the research outputs of enetC- ollect are rather exploratory and were often considered too vague or preliminary. Fourth, since very few related works exist, reference val- ues for evaluation are often missing, thus potentially undermining the credibility of the work. For all these reasons, the need to establish new publication venues for this emerging field seems inevitable. Regarding the third challenge, stakeholders participating in such networks usually have little time for them, and their innovative nature rarely aligns with their short-term interests, nor are they fully covered by their expertise. As such, most efforts can only be conducted by a group of participants and, in order to further enhance achievements, proper strategies to foster such collaboration are needed. Our experi- ence with enetCollect allowed us to identify one very suitable strategy which fostered a large number of the collaborations that led to the sci- entific achievements discussed in the upcoming sections. This is the organization of hackathon-like events where some members first an- swer an open call for topics they would like to tackle collaboratively, and are then asked to lead a taskforce. The topics are later disclosed to the remaining members, who can then ask to participate in one or more of the taskforces. The candidate participants of the topics that received enough attention are then invited to the hackathon-like event to kickstart the task forces by working intensively over the span of a few days and perform the groundwork needed for their collaborations to develop after the end of the event itself. 4.3 Creating a shared understanding and a theoretical framework for approaching the combination of language learning and crowdsourcing With respect to this theory-oriented goal, a literature review conduct- ed in 2017/2018 by WG1 members revealed that there were very few 214 Slovenščina 2.0, 2022 (2) | Articles examples of past crowdsourcing efforts in the field of language learning at that time. They also gathered the opinions of relevant stakeholders in three ways. Firstly, they conducted a short survey among themselves to identify the aspects of language learning with the most potential for crowdsourcing. Secondly, they circulated a survey among teachers to assess their familiarity with crowdsourcing methods, and find possible use cases in teaching practice (Arhar Holdt et al., 2020). Finally, they circulated another survey among learners to determine their familiarity with crowdsourcing, and their attitudes towards materials (potentially) produced in this way (Hatipoglu et al., 2020; Miloshevska et al., 2021). Understanding stakeholders’ perspectives was crucial in setting up a suitable theoretical framework, and to identify the areas with the most potential for crowdsourcing. WG1 members also explored specific sub- jects, including how to develop an open dictionary for the contemporary Serbian language using crowdsourcing techniques (Lazić Konjik and Milenković, 2021), how to develop pedagogically appropriate language corpora through crowdsourcing and gamification (Zviel-Girshin et al., 2021) to crowdsource linguistic knowledge regarding Dutch blends, neologisms and language variation (Dekker and Schoonheim, 2018), and how to crowdsource second language learning material, with a fo- cus on vocabulary lists, in order to reduce dependency on costly expert manpower (Alfter et al., 2020). The efforts of WG2 members mostly focused on learners as the most relevant crowd to perform implicit crowdsourcing, and can for the most part be related to an overarching paradigm that pairs up a type of exercise with a specific type of language-related dataset, which can be used to generate exercise content (Nicolas et al., 2020, 2021). More specifically, in order to understand how such a paradigm could be implemented, these efforts studied its context: some efforts studied the exercises compatible with the paradigm (i.e. which content could be automatically generated from specific language-related datasets, Lyding et al., 2022), other efforts studied the type of language-related datasets most commonly crowdsourced,9 or aimed at mapping the existing language learning platforms where such a paradigm could be integrated. Other specific efforts researched how to adequately apply 9 The results of these efforts have yet to be published. 215 EnetCollect – European Network for Combining Language Learning... this paradigm to crowdsource semantic relations between English or Romanian words (Lyding et al., 2019; Nicolas et al., 2021; Rodosthe- nous et al., 2019, 2020), defined new workflows to include teachers and crowdsource linguistic knowledge about English verb-particle constructions (Grace Araneta et al., 2020), as well as crowds of rela- tives of learners to crowdsource Alsatian lexical knowledge (Millour et al., 2019). While they did not specifically target a crowd of learners or teachers, others studied how to crowdsource recordings of Italian Dialects (Sangati et al., 2018) and complex associations among words by means of a board game workflow (Smrz, 2019). WG3 objectives were largely pursued together with WG1, WG2 and WG5 objectives. Some related efforts made it possible to map and study a rather large number of existing language learning solutions (Bączkowska, 2021; Bodorík and Bédi, 2018; Grygo and Gajek, 2018). Other work allowed us to better understand teachers’ and learners’ perception of crowdsourcing as a concept (Arhar Holdt et  al., 2020; Hatipoglu et al., 2020), while other studies presented enetCollect from a language learning perspective (Gajek, 2020; Lyding et al., 2018). In a number of publications (Cornillie, 2018; Gajek, 2018; Murray and Gi- ralt, 2018), WG3 members also discussed important design choices when creating a language learning application, especially one that made use of crowdsourcing, and in others (Cucchiarini and Strik, 2018; Ostanina-Olszewska, 2019; Pereira et al., 2018) they discussed recent language learning technologies. WG4 efforts, which aimed at defining technology-oriented specifi- cations, allowed us to draw two conclusions. Firstly, the developments made in the context of enetCollect were still too heterogeneous and prototypical to define any transversal technical solution they could share and rely on. Indeed, even though many approaches undertaken shared some common needs (e.g. aggregation methods to cross-check the linguistic inputs that were crowdsourced), it was too early to es- tablish technical solutions encoding sophisticated and standardized methods. Secondly, no open-source solution was readily available to implement a language learning platform and, regarding the closest so- lutions that could have been adapted (i.e. the Learning Management Systems, also known as LMS), the related communities at the time 216 Slovenščina 2.0, 2022 (2) | Articles had little interest in language learning but were more focused on other subjects (e.g. mathematics or physics) posing fewer subject-specific technical challenges. Indeed, because of its nature, language learning requires specific technologies, such as automatic speech recognition. This explained the absence of readily available solutions for Language Learning and the difficulty in involving LMS-oriented stakeholders. Finally, WG5 aimed at devising application-oriented specifica- tions for an ethical, legal and profitable solution. Similarly to WG4, the efforts pursued in the context of enetCollect were still too recent to define any transversal specifications on these aspects. Nonetheless, several WG5 members managed to tackle relevant issues in a pro- spective fashion and discussed aspects such as the ownership of the data, the need for private or open-source code, third-party depend- encies or privacy (Chua et  al., 2018; Chua and Rayner, 2018); how to balance a collaboration between teachers and academics (Chua and Rayner, 2019); how to implement gamification strategies in an ethical fashion (Murray and Giralt, 2018); as well as legal issues with respect to European regulations of online learning platforms such as LMS, Massive Open Online Courses (MOOCS) or Open Educational Re- sources (OERs) (Zdravkova, 2018, 2019). Finally, a framework to ad- dress ethical issues affecting three groups of stakeholders (collabora- tive content creators, prospective users, and the institutions intending to implement the approach for educational purposes) was proposed (Zdravkova, 2020). As no direct collaboration with business-oriented stakeholders could be established, the question of defining business guidelines was not explored. Overall, with respect to this objective, the versatility of the relevant aspects discussed is far greater than we originally expected and with- out much overlap in terms of main focus. Such versatility allows us to draw two further conclusions. Firstly, a dedicated R&I community is needed to adequately take on the topic. Second, the lack of conver- gence in terms of main focus (most of the aforementioned publica- tions could hardly cite one another in their state of the art as a directly comparable work), might lead one to think that the paths followed still have many interesting results to yield, while others are still waiting to be explored. In other words, we believe that the research on the 217 EnetCollect – European Network for Combining Language Learning... combination of crowdsourcing and language learning has progressed and gained notable results, but is still in its early stages. 4.4 To research use cases and work on prototypes With respect to this output-oriented objective, most achievements were obtained in the context of WG1 and WG2, often in collaboration with WG3 and WG4. In the context of WG1, experiments were performed to crowd- source linguistic knowledge regarding Dutch blends, neologisms and language variation (Dekker and Schoonheim, 2018), and to crowd- source vocabulary lists to be used as L2 learning material (Alfter et al., 2020). Other efforts fostered the development of a mobile application for the gamified improvement of two automatically compiled dictionar- ies for Slovene (Arhar Holdt et al., 2021; Arhar Holdt and Čibej, 2020; Čibej and Arhar Holdt, 2019), and the development of LARA, a learn- ing and reading assistant with explicit crowdsourcing abilities aimed at teachers (Akhlaghi et al., 2019b, 2019a, 2020; Bédi, Bernharðsson, et al., 2020; Bédi, Butterweck, et al., 2020; Bédi et al., 2019; Butter- weck et al., 2019; Chua and Rayner, 2019; Habibi, 2019). In the context of WG2, the V-trel vocabulary trainer with implicit crowdsourcing abilities is geared toward learners and teaches them – through a Telegram bot10 – English and Romanian semantic relations between words, while crowdsourcing their linguistic judgements (Lyd- ing et al., 2019; Nicolas et al., 2021; Rodosthenous et al., 2019, 2020). Two other prototypes were also implemented – again through Tele- gram bots – a new learning and teaching workflow to generate exercis- es and crowdsource linguistic knowledge about English verb-particle constructions (Grace Araneta et al., 2020), as well as a crowdsourc- ing mechanism to obtain recordings of Italian dialects (Sangati et al., 2018). The prototype described in Millour et al. (2019), however, re- lied on a role-playing game framework to crowdsource Alsatian lexical knowledge from learners and their relatives. The prototype discussed in Smrz (2019) fully reimplemented a popular board game in order to crowdsource complex associations among words. 10 https://telegram.org/ 218 Slovenščina 2.0, 2022 (2) | Articles Overall, the research on use cases and the work on prototypes has been more limited than we originally hoped, as attested by the lim- ited number of outputs to crowdsource linguistic knowledge other than lexical knowledge. We identified two main reasons for this. Firstly, the minimal involvement of non-academic stakeholders had noticeable implications for devising and testing prototypes. Indeed, enetCollect was somehow lacking direct evaluation and feedback from those stake- holders who use and create language learning solutions daily. It also prevented enetCollect from accessing existing exercise content or the involvement of large crowds of students and online learners needed to more extensively test the prototypes that were devised. Secondly, the efforts to tackle this goal were mostly planned for the second half of the action, while the efforts planned for the first half would for most part focus on building the network and researching the theoretical frame- work. As such, the bulk of efforts with regard to research on use cases and work on prototypes only began some months before the COVID-19 pandemic itself started, which obviously limited many of these efforts and completely halted others. 5 Conclusions and future steps While the achievements of enetCollect were rated by the COST agency as “excellent” and “very good” in their mid-way and final formal evalu- ations, our overall assessment is more modest. Regarding the network-oriented goals, we believe that enetCol- lect mostly fulfilled its role. Indeed, given the rather large number of stakeholders that participated and collaborated, we believe it is fair to say that a new research community was created. We also believe that enetCollect could have achieved even greater results had it been sup- ported by the COST agency with more readily available tools, proce- dures or guidelines to tackle various transversal aspects, such as dis- semination or funding acquisition. Regarding research-oriented objectives, the high-potential of the language learning and crowdsourcing combination was more widely acknowledged than we had originally imagined, as attested by the large participation of an international audience of stakeholders, who 219 EnetCollect – European Network for Combining Language Learning... deemed enetCollect worth their time. In terms of outputs, we believe the number of publications achieved and the prototypes devised to be fair considering the innovative nature of enetCollect and the disruption caused by COVID-19. Nonetheless, for the reasons discussed in Sec- tions 4.4 and 4.5, we believe that the research on this topic is still in its early stages. As a follow-up to enetCollect, we have established the DARIAH Working Group Combining Language Learning with Crowdsourcing Techniques (D4COLLECT), which will serve as a flexible and dynamic bottom-up institutional framework for knowledge exchange, research coordination and capacity building. Following in enetCollect’s footsteps, D4COLLECT aims to bring together language teachers and experts in linguistics, computational linguistics, educational sciences, software engineering and the digital humanities to explore digital workflows, tools, and solutions for deploying implicit and explicit crowdsourcing methods in the creation of language-learning materials and the collec- tion of language datasets. Our first efforts will target the organization of hackathon-like events (see Section 4.2.1). D4COLLECT will also serve as a practical context to promote the submission of project proposals that, if funded, would allow to speed up and better shape the efforts of the Working Group’s members. Acknowledgements While this article was authored by two people, it reviews and describes the efforts and achievements of many enetCollect members, including many whose names can be found in the references below and whom we would like to thank. We are particularly thankful to the members of the Core Group who led enetCollect with us for over 4.5 years. Finally, we would like to thank Greta H. Franzini and Egon W. Stemle for their support in polishing this article. References Akhlaghi, E., Bédi, B., Bektaş, F., Berthelsen, H., Butterweck, M., Chua, C., …, & Strik, H. (2020). Constructing Multimodal Language Learner Texts Using LARA: Experiences with Nine Languages. Proceedings of LREC 2020, 12th Language Resources and Evaluation Conference (pp. 323–331). 220 Slovenščina 2.0, 2022 (2) | Articles Akhlaghi, E., Bédi, B., Butterweck, M., Chua, C., Gerlach, J., Habibi, H., …, & Zuckermann, G. (2019a). Overview of LARA: A Learning and Reading Assistant. Proceedings of SLaTE 2019: 8th ISCA Workshop on Speech and Language Technology in Education (pp. 99–103). doi: 10.21437/ SLaTE.2019-19 Akhlaghi, E., Bédi, B., Butterweck, M., Chua, C., Gerlach, J., Habibi, H., …, & Zuckermann, G. (2019b). Demonstration of LARA: A Learning and Read- ing Assistant. Proceedings of SLaTE of the 8th ISCA Workshop on Speech and Language Technology in Education (pp. 37–38). doi: 10.21437/ SLaTE.2019-19 Alfter, D., Lindström Tiedemann, T., & Volodina, E. (2020). Expert judgments versus crowdsourcing in ordering multi-word expressions. Paper present- ed at Swedish Language Technology Conference, Gothenburg, Sweden. Retrieved from https://gubox.app.box.com/v/SLTC-2020-paper-16 Arhar Holdt, Š., & Čibej, J. (2020). Rezultati projekta Slovar sopomenk sodo- bne slovenščine: Od skupnosti za skupnost. Proceedings of the Language Technologies and Digital Humanities Conference (pp. 3–9). Retrieved from http://nl.ijs.si/jtdh20/pdf/JT-DH_2020_Arhar-Holdt-et-al_Rezultati-pro- jekta_Slovar-sopomenk-sodobne-slovenscine.pdf Arhar Holdt, Š., Logar, N., Pori, E., & Kosem, I. (2021). Game of Words: Play the Game, Clean the Database. Proceedings of the 14th Congress of the Euro- pean Association for Lexicography, EURALEX 2021 (pp. 41–49). Arhar Holdt, Š., Zviel-Girshin, R., Gajek, E., Durán-Muñoz, I., Bago, P., Fort, K., …, & Zanasi, L. (2020). Language Teachers and Crowdsourcing: Insights from a Cross-European Survey. Journal of the Institute of Croatian Lan- guage and Linguistics // Časopis Instituta Za Hrvatski Jezik i Jezikoslovlje, 46(1), 1–28. doi: 10.31724/rihjj.46.1.1 Bączkowska, A. (2021). An overview of popular website platforms and mobile apps for language learning. Forum Filologiczne Ateneum (pp. 9–35). Bédi, B., Bernharðsson, H., Chua, C., Björg Guðmarsdóttir, B., Habibi, H., & Rayner, M. (2020, August 20). Constructing an interactive Old Norse text with LARA. Proceedings of EUROCALL 2020, European Association for Computer Assisted Language Learning, Copenhagen. Bédi, B., Butterweck, M., Chua, C., Gerlach, J., Björg Guðmarsdóttir, B., Ha- bibi, H., …, & Vigfússon, S. (2020). LARA: An extensible open source platform for learning languages by reading. Proceedings of EUROCALL 2020, European Association for Computer Assisted Language Learning, Copenhagen. 221 EnetCollect – European Network for Combining Language Learning... Bédi, B., Chua, C., Habibi, H., Martinez-Lopez, R., & Rayner, M. (2019). Using LARA for language learning: A pilot study for Icelandic. CALL and Com- plexity – Short Papers from EUROCALL 2019, Louvain-la-Neuve, Belgium. Retrieved from https://books.google.fr/books?id=EHnCDwAAQBAJ&lpg= PA33&lr&hl=fr&pg=PA33#v=onepage&q&f=false Bodorík, M., & Bédi, B. (2018). In Search of the State of Language Learning On- line in Europe. Proceedings of the EnetCollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws.org/Vol-2390/Pa- perA1.pdf Butterweck, M., Chua, C., Habibi, H., Rayner, M., & Zuckermann, G. (2019). Easy Construction of Multimedia Online Language Textbooks And Lin- guistics Papers with LARA. Proceedings of the 12th Annual International Conference of Education, Research and Innovation (pp. 7302–7310). doi: 10.21125/iceri.2019.1737 Chua, C., Habibi, H., Rayner, M., & Tsourakis, N. (2018). Decentralising Power: How we are Trying to Keep CALLector Ethical. Proceedings of the EnetCol- lect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws.org/Vol-2390/PaperC3.pdf Chua, C., & Rayner, M. (2018). What do the Founders of Online Communities Owe to their Users? Proceedings of the EnetCollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws.org/Vol- 2390/PaperD1.pdf Chua, C., & Rayner, M. (2019). Vegetarians Vampires: Why the CALL Technol- ogy Provider Doesn’t Have to Suck the Teacher’s Blood. Proceedings of the 12th Annual International Conference of Education, Research and In- novation (pp. 7860–7869). doi: 10.21125/iceri.2019.1863 Čibej, J., & Arhar Holdt, Š. (2019). Repel the Syntruders! A Crowdsourc- ing Cleanup of the Thesaurus of Modern Slovene. Proceedings of the ELex 2019 Conference: Electronic lexicography in the 21st century, Sin- tra, Portugal. Retrieved from https://elex.link/elex2019/wp-content/up- loads/2019/09/eLex_2019_19.pdf Cucchiarini, C., & Strik, H. (2018, October 24). Crowdsourcing for Research on Automatic Speech Recognition-enabled CALL. Proceedings of the EnetC- ollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws.org/Vol-2390/PaperB2.pdf Dekker, P., & Schoonheim, T. (2018, October 24). Crowdsourcing for Dutch us- ing PYBOSSA: Case studies on Blends, Neologisms and Language Varia- tion. Proceedings of the EnetCollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws.org/Vol-2390/PaperB3.pdf 222 Slovenščina 2.0, 2022 (2) | Articles Gajek, E. (2020). Crowdsourcing in language learning as a continuation of CALL in varied technological, social, and ethical contexts. Proceedings of EUROCALL 2020, European Association for Computer Assisted Language Learning (pp. 75–80). doi: 10.14705/rpnet.2020.48.1168 Grace Araneta, M., Eryigit, G., König, A., Lee, J.-U., Luís, A., Lyding, V., …, & Sangati, F. (2020). Substituto—A Synchronous Educational Language Game for Simultaneous Teaching and Crowdsourcing. Proceedings of the 9th Workshop on Natural Language Processing for Computer Assisted Lan- guage Learning (NLP4CALL 2020) (pp. 1–9). doi: 10.3384/ecp201759 Grygo, A., & Gajek, E. (2018). Risks of Using Duolingo by Polish Learners at Primary Level. Proceedings of the EnetCollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws.org/Vol-2390/Pa- perA4.pdf Habibi, H. (2019). LARA Portal: A Tool for Teachers to Develop Interactive Text Content, an Environment for Students to improve Reading Skill. Proceed- ings of the 12th Annual International Conference of Education, Research and Innovation (pp. 8221–8229). doi: 10.21125/iceri.2019.1954 Hatipoglu, C., Gajek, E., Miloshevska, L., & Delibegovic Dzanic, N. (2020). Crowdsourcing for widening participation and learning opportunities: A view from language learners’ window. Proceedings of EUROCALL 2020, European Association for Computer Assisted Language Learning (pp. 81– 87). doi: 10.14705/rpnet.2020.48.1169 Hladká, B., Hana, J., & Lukšová, I. (2014). Crowdsourcing in language classes can help natural language processing. Proceedings of the AAAI Confer- ence on Human Computation and Crowdsourcing, Vol. 2 (pp. 71–72). Lazić Konjik, I., & Milenković, A. (2021). The Development of the Open Diction- ary of Contemporary Serbian Language Using Crowdsourcing Techniques. Proceedings of the 14th Congress of the European Association for Lexicog- raphy, EURALEX 2021 (pp. 479–484). Lyding, V., Nicolas, L., Bédi, B., & Fort, K. (2018). Introducing the European NETwork for COmbining Language LEarning and Crowdsourcing Tech- niques (enetCollect). Future-Proof CALL: Language Learning as Explo- ration and Encounters – Short Papers (pp. 176–181). doi: 10.14705/ rpnet.2018.26.833 Lyding, V., Nicolas, L., & König, A. (2022, June). About the Applicability of Com- bining Implicit Crowdsourcing and Language Learning for the Collection of NLP Datasets. Proceedings of the 2nd Workshop on Novel Incentives in Data Collection from People: models, implementations, challenges and results within LREC 2022 (pp. 46–57). 223 EnetCollect – European Network for Combining Language Learning... Lyding, V., Rodosthenous, C. T., Sangati, F., ul Hassan, U., Nicolas, L., König, A., Horbacauskiene, J., & Katinskaia, A. (2019). V-trel: Vocabulary Trainer for Tracing Word Relations—An Implicit Crowdsourcing Approach. Proceed- ings of the International Conference Recent Advances in Natural Language Processing, RANLP 2019, Varna, Bulgaria. Retrieved from http://lml.bas. bg/ranlp2019/proceedings-ranlp-2019.pdf Millour, A., Araneta, M. G., Lazić Konjik, I., Raffone, A., Pilatte, Y.-A., & Fort, K. (2019). Katana and Grand Guru: A Game of the Lost Words (DEMO). Proceedings of the Ninth Language & Technology Conference, Human Lan- guage Technologies as a Challenge for Computer Science and Linguistics (LTC’19), Poznan, Poland. Miloshevska, L., Delibegović Džanić, N., Hatipoğlu, Ç., & Gajek, E. (2021). Crowdsourcing for language learning in Turkey, Bosnia and Herzegovina, Republic of North Macedonia and Poland. Journal of Narrative and Lan- guage Studies, 9, 106–121. Murray, L., & Giralt, M. (2018, October 24). Motivational, Ethical and Gamifica- tion Issues in Crowdsourcing. Proceedings of the EnetCollect WG3 & WG5 Meeting 2018. enetCollect WG3 & WG5 Meeting 2018, Leiden, The Neth- erlands. Retrieved from http://ceur-ws.org/Vol-2390/PaperC1.pdf Nicolas, L., Aparaschivei, L., Lyding, V., Rodosthenous, C., Sangati, F., König, A., & Forascu, C. (2021). An Experiment on Implicitly Crowdsourcing Ex- pert Knowledge about Romanian Synonyms from L1 Language Learners. Proceedings of 10th Workshop on Natural Language Processing for Com- puter Assisted Language Learning (NLP4CALL 2021) (pp. 1–14). Retrieved from https://ep.liu.se/konferensartikel.aspx?series=ecp&issue=177&Arti cle_No=1 Nicolas, L., Lyding, V., Borg, C., Forascu, C., Fort, K., Zdravkova, K., Kosem, I., …, & HaCohen-Kerner, Y. (2020). Creating Expert Knowledge by Relying on Language Learners: A Generic Approach for Mass-Producing Language Resources by Combining Implicit Crowdsourcing and Language Learning. Proceedings of LREC 2020, 12th Language Resources and Evaluation Con- ference (pp. 268–278). Retrieved from https://www.aclweb.org/anthol- ogy/2020.lrec-1.34 Nicolas, L., Ostanina-Olszewska, J., Arhar Holdt, Š., Čibej, J., Borg, C., Lyding, V., & Barreiro, A. (2021). Introducing an implicit crowdsourcing oppor- tunity to teachers. CALL for Background (pp. 115–136). Peter Lang. Re- trieved from https://www.peterlang.com/view/9783631849200/html/ ch14.xhtml 224 Slovenščina 2.0, 2022 (2) | Articles Ostanina-Olszewska, J. (2019). Modern technology in language learn- ing and teaching. Linguodidactica, 22, 153–164. doi: 10.15290/ lingdid.2018.22.10 Pereira, M. J., Fialho, P., Coheur, L., & Ribeiro, R. (2018). Chatbots’ Greetings to Human-Computer Communication. Proceedings of the EnetCollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur- ws.org/Vol-2390/PaperD2.pdf Rodosthenous, C., Lyding, V., König, A., Horbacauskiene, J., Katinskaia, A., ul Hassan, U., Isaak, N., Sangati, F., & Nicolas, L. (2019). Designing a Proto- type Architecture for Crowdsourcing Language Resources. Poster Session of the 2nd Conference on Language, Data and Knowledge 2019. Retrieved from http://ceur-ws.org/Vol-2402/paper4.pdf Rodosthenous, C., Lyding, V., Sangati, F., König, A., ul Hassan, U., Nicolas, L., Horbacauskiene, J., Katinskaia, A., & Aparaschivei, L. (2020). Using Crowdsourced Exercises for Vocabulary Training to Expand ConceptNet. Proceedings of LREC 2020, 12th Language Resources and Evaluation Con- ference (pp. 307–316). Sangati, F., Abramova, E., & Monti, J. (2018). DialettiBot: A Telegram Bot for Crowdsourcing Recordings of Italian Dialects. Proceedings of the Fifth Italian Conference on Computational Linguistics. CLIC-it 2018, Torino. Sangati, F., Merlo, S., & Moretti, G. (2015). School-tagging: interactive lan- guage exercises in classrooms. In LTLT@ SLaTE (pp. 16–19). Smrz, P. (2019). Crowdsourcing Complex Associations among Words by Means of A Game. Proceedings of CSTY 2019, 5th International Confer- ence on Computer Science and Information Technology (p. 9). Retrieved from https://aircconline.com/csit/abstract/v9n14/csit91407.html Von Ahn, L. (2013). Duolingo: learn a language for free while helping to trans- late the web. In Proceedings of the 2013 international conference on In- telligent user interfaces (pp. 1–2). Zdravkova, K. (2018). Privacy of Crowdsourcing Educational Platforms in the Light of New EU Regulation. Proceedings of the EnetCollect WG3 & WG5 Meeting 2018, Leiden, The Netherlands. Retrieved from http://ceur-ws. org/Vol-2390/PaperC2.pdf Zdravkova, K. (2019). Compliance of MOOCs and OERs with the new privacy and security EU regulations. Proceedings of the 5th International Confer- ence on Higher Education Advances (HEAd’19), Vol. 1 (pp. 159–167). Re- trieved from http://www.headconf.org/wp-content/uploads/pdfs/9063. pdf 225 EnetCollect – European Network for Combining Language Learning... Zdravkova, K. (2020). Ethical issues of crowdsourcing in education. Journal of Responsible Technology, 2–3, 100004. doi: 10.1016/j.jrt.2020.100004 Zviel-Girshin, R., Zingano Kuhn, T., R. Luis, A., Koppel, K., Šandrih Todorović, B., Arhar Holdt, Š., Tiberius, C., & Kosem, I. (2021). Developing pedagogi- cally appropriate language corpora through crowdsourcing and gamifi- cation. Proceedings of EUROCALL 2021 (pp. 312–317). doi: 10.14705/ rpnet.2021.54.1352 EnetCollect – Evropska mreža za združevanje jezikovnega izobraževanja s tehnikami množičenja (COST Action CA16105): pregled projektne vizije, ureditve, napredka in dosežkov V tem članku predstavljamo pregled Evropske mreže za združevanje jezikov- nega izobraževanja s tehnikami množičenja (enetCollect), obseženega projek- ta za spodbujanje raziskav in inovacij (R&I) na področju združevanja množi- čenja in učenja jezikov. Opisujemo začetke projekta, predstavljamo njegovo splošno zasnovo in ureditev ter razpravljamo o dosežkih v smislu (1) ustvarja- nja nove skupnosti za raziskave in inovacije z zaključenim obsežnim mrežnim projektom in (2) spodbujanja raziskav in inovacij na večinoma neraziskanem področju z velikim potencialom. Razpravljamo tudi o povezanih izzivih in pridobljenih izkušnjah pri obli- kovanju in vodenju nove skupnosti R&I ter izzivih, ki smo jih opazili pri delu članic mreže enetCollect med spoznavanjem številnih plati tako raznolikega projekta. Ključne besede: enetCollect, COST Action, množičenje, jezikovno izobraževanje