Natural language processing for under-resourced languages: Developing a Welsh natural language toolkit
Tài liệu tham khảo
ap Dyfrig, R. (2013). Hanes y we gymraeg. http://www.tiki-toki.com/timeline/entry/84932/Hanes-y-We-Gymraeg/Online publication.
Baker, 2002, EMILLE, A 67-million word corpus of indic languages: data collection, mark-up and harmonisation, 819
Berger, K.C., Hernaiz, A.G., Baroni, P., Hicks, D., Kruse, E., Quochi, V., Russo, I., Salonen, T. Sarhimaa, A. and Soria, C. (2018). The DLDP digital language survival kit. The Digital Language Diversity Project, www.dldp.eu.
Binding, 2018, A study of semantic integration across archaeological data and reports in different languages, J. Inf. Sci., 45, 364, 10.1177/0165551518789874
Bontcheva, 2013, Twitie: an open-source information extraction pipeline for microblog text, 83
Bontcheva, 2003, GATE: a Unicode-based infrastructure supporting multilingual information extraction
Carter, 2013, Microblogging language identification: overcoming the limitations of short, unedited and idiomatic text, Lang. Resour. Eval, 47, 195, 10.1007/s10579-012-9195-y
Cavnar, 1994, N-gram-based text categorization, 161
Ceberio, K., Gurrutxaga, A., Soria, C., Russo, I. and Quochi, V. (2018). How to use the digital language vitality scale. The Digital Language Diversity Project, www.dldp.eu.
Cunningham, 2002, GATE, a General architecture for text engineering, Comput. Hum., 36, 223, 10.1023/A:1014348124664
Cunningham, 2002, GATE: a framework and graphical development environment for robust NLP tools and applications, 168
Derczynski, 2013, Microblog-genre noise and impact on semantic annotation accuracy, 21
Donnelly, K. (2018). Eurfa. http://eurfa.org.uk. Online publication.
Donnelly, 2011, Using constraint grammar in the Bangor Autoglosser to disambiguatemultilingual spoken text
2018, European parliament committee on culture and education
Evas, J. (2013). Y Gymraeg yn Yr Oes Ddigidol – the Welsh language in the digital age. META-NET White Paper Series. Available online at http://www.meta-net.eu/whitepapers.
Ezeani, 2019, Leveraging pre-trained embeddings for Welsh taggers, 270
Hardy, 2006, The Amitiés system: data-driven techniques for automated dialogue, Speech Commun., 48, 354, 10.1016/j.specom.2005.07.006
Hepple, 2000, Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers
Hicks, D., Baroni, P., Berger, K.C., Hernaiz, A.G., Kruse, E., Quochi, V., Russo, I., Salonen, T., Sarhimaa, A. and Soria, C. (2018). The DLDP road map. The Digital Language Diversity Project, www.dldp.eu.
Jones, 2010, Cilfachau electronig: geni'r Gymraeg ar-lein, 1989-1996, Cyfrwng, 7, 21
Jones, 2017, Porn shock for dons' (and other stories from Welsh pre-web history), 256
Jones, D.B., Robertson, P. and Taborda, A. (2015a). Corpus of Welsh language tweets. http://techiaith.org/corpora/twitter/?lang=en Online publication.
Jones, D.B., Robertson, P. and Prys, G. (2015b) Welsh language Lemmatizer API service. http://techiaith.cymru/api/lemmatizer/?lang=en Online publication.
Krauwer, 2003, The basic language resourse kit (BLARK) as the first milestone for the language resources roadmap
Liddy, 2003, Natural language processing, 2126
Maynard, 2003, NE recognition without training data on a language you don't speak, 15, 33
Maynard, 2002, Architectural elements of language engineering robustness, Nat. Lang. Eng., 8, 257, 10.1017/S1351324902002930
McMonagle, 2018, What can hashtags tell us about minority languages on Twitter?: a comparison of #cymraeg, #frysk, and #gaeilge, J. Multiling. Multicult. Dev., 40, 32, 10.1080/01434632.2018.1465429
Moseley, 2010
Nadeau, 2007, A survey of named entity recognition and classification, Lingvisticae Investig., 30, 3, 10.1075/li.30.1.03nad
Neale, 2018, Leveraging lexical resources and constraint grammar for rule-based part-of-speech tagging in Welsh, 3946
Nic Giolla Mhichíl, 2018, Twitter and the Irish language, #Gaeilge – agents and activities: exploring a data set with micro-implementers in social media, J. Multiling. Multicult. Dev., 39, 868, 10.1080/01434632.2018.1450414
Piao, 2018, Towards a Welsh semantic annotation system, 980
Pretorius, 2017, Introduction to the special issue, Lang. Resour. Eval., 51, 891, 10.1007/s10579-017-9405-8
Prys, 2006, The BLARK matrix and its relation to the language resources situation for the Celtic languages, 31
Prys, 2008, The ultimate Welsh language survival kit: an overview of ten years of language technology work at Canolfan Bedwyr, Mercat. Media Forum, 10, 4
Prys, 2016, National language technologies portals for LRLs: a case study, 10930
Prys, 2018, Gathering data for speech technology in the welsh language: a case study
Rivera Pastor, 2017
Soria, 2014, The language resource strategic agenda: the FLaReNet synthesis of community recommendations, Lang. Resour. Eval., 48, 753, 10.1007/s10579-014-9279-y
StatsWales (2021). Welsh speakers by local authority, gender and detailed age groups, 2011 Census. https://statswales.gov.wales/Catalogue/Welsh-Language/WelshSpeakers-by-LocalAuthority-Gender-DetailedAgeGroups-2011Census Online publication.
Steinberger, 2010, Challenges and methods for multilingual text mining, 19
Thorne, 1993
Vlachidis, 2012, A pilot investigation of information extraction in the semantic annotation of archaeological reports, Int. J. Metadata Semant. Ontol., 7, 222, 10.1504/IJMSO.2012.050183
Vlachidis, 2016, A knowledge-based approach to Information Extraction for semantic interoperability in the archaeology domain, J. Assoc. Inf. Sci. Technol., 67, 1138, 10.1002/asi.23485
Welsh Government (2017). Cymraeg 2050: welsh language strategy. http://gov.wales/topics/welshlanguage/welsh-language-strategy-and-policies/cymraeg-2050-welsh-language-strategy/?lang=en.
Welsh Government (2018). Welsh language technology action plan. https://gov.wales/topics/welshlanguage/welsh-language-strategy-and-policies/welsh-language-policies-upto-2017/wl-technology-and-digital-media/?lang=en.
Welsh Government (2019). Welsh language results: annual population survey, 2001- 2018 https://gov.wales/sites/default/files/statistics-and-research/2019-05/welsh-language-results-annual-population-survey-2001-to-2018.pdf.
Witt, 2009, Multilingual language resources and interoperability, Lang. Resour. Eval., 43, 1, 10.1007/s10579-009-9088-x