Coding an L2 phonological corpus: from perceptual assessment to non-native speech models – an illustration with French nasal vowels.
Detey, S. (2012). Coding an L2 phonological corpus: from perceptual assessment to non-native speech models – an illustration with French nasal vowels. In Tono, Y., Kawaguchi, Y. & Minegishi, M. (eds.), Developmental and Crosslinguistic Perspectives in Learner Corpus Research. Amsterdam/Philadelphia: John Benjamins, 229-250.
This chapter aims at showing that: 1) a coding system such as the schwa coding in the Phonologie du Français... more This chapter aims at showing that: 1) a coding system such as the schwa coding in the Phonologie du Français Contemporain project can be a useful procedure to exploit an L2 phonological corpus for research-oriented purposes; 2) such a coding procedure can turn an L2 database into a rated L2 database which can be used in the field of applied L2 speech sciences; 3) such an L2 coding issue highlights the socio- and psycho-linguistic links between L2 speech assessment and speech models. First, we remind the reader that transcription of oral data is a crucial stage in corpus processing, since it can affect the results of subsequent analyses. Then we illustrate data coding in L1 with examples from the PFC project. The transposition of such a coding system to L2 French phonology sheds some light on the evaluation process at stake in L2 production coding activity. This points to the complexity of defining L2 speech models. The impact of L2 speech characteristics on speech recognition systems and the importance of corpus assessment for such systems is highlighted. Finally, we illustrate the coding-rating activity and its methodological challenges with the French nasal vowels system in the Interphonologie du Français Contemporain framework.
Translation and corpus design
SYNAPS - A Journal of Professional Communication, 26, 2011, 14-23.
In this article I discuss the role of translated texts in different types of corpora. I first consider the role of... more In this article I discuss the role of translated texts in different types of corpora. I first consider the role of translations in corpus-based monolingual linguistics, arguing that while translated texts are often excluded from corpora on the basis of a more or less implicit assumption that they “corrupt” the reference norm for a language, this assumption does not seem to be justified on theoretical grounds. For the same reason, translated texts should also be included in bi- and multi-lingual comparable corpora. The incorporation of subcorpora of parallel texts within comparable corpora can also offer practical advantages for contrastive studies. Finally, I provide an overview of the different types of corpora which can be used in translation studies research, and discuss the role of (sub)corpora of translations within these corpora.
30 views
Seen by:A Game-based Corpus for Analysing the Interplay between Game Context and Player Experience
N. Shaker, S. Asteriadis, G. Yannakakis, K. Karpouzis, "A Game-based Corpus for Analysing the Interplay between Game Context and Player Experience", EmoGames workshop, International Conference on Affective Computing and Intelligent Interaction (ACII2011), October 9, Memphis, USA.
Recognizing players' affective state while playing video games has been the focus of many recent research studies. In... more Recognizing players' affective state while playing video games has been the focus of many recent research studies. In this paper we describe the process that has been followed to build a corpus based on game events and recorded video sessions from human players while playing Super Mario Bros. We present different types of information that have been extracted from game context, player preferences and perception of the game, as well as user features, automatically extracted from video recordings. We run a number of initial experiments to analyse players' behaviour while playing video games as a case study for using the corpus.
De la transcription de corpus à l’analyse interphonologique : enjeux méthodologiques en FLE
Racine, I., Zay, F., Detey, S. & Kawaguchi, Y. (2011). De la transcription de corpus à l’analyse interphonologique : enjeux méthodologiques en FLE. In Col, G. & Osu, S. N. (eds), Transcrire, écrire, Formaliser (1). Rennes: PUR. Travaux Linguistiques du CerLiCO 24. 13-30.
Résumé
Dans ce travail, nous présentons les enjeux méthodologiques liés à la transcription auxquels nous sommes... more
Résumé
Dans ce travail, nous présentons les enjeux méthodologiques liés à la transcription auxquels nous sommes confrontés dans le projet « Interphonologie du français contemporain », dont l’objectif est de constituer une base de données orales de FLE issues d’apprenants de multiples L1 en ciblant spécifiquement le niveau phonético-phonologique. L’examen des premières données du projet, issues d’apprenants hispanophones et japonophones, souligne l’importance de confronter données et théorie dans une étape préliminaire afin de déterminer le mode et les modalités précises de transcription. Ce travail préalable peut certes paraître coûteux au premier abord, mais constitue une étape essentielle afin d’assurer la qualité des analyses qui seront ensuite effectuées sur le corpus.
Abstract
In this study, we present the methodological challenges to which we are confronted about data’s transcription in the project “InterPhonology of Contemporary French”. This project aims to build a large multitask phonological corpus of French as a foreign language and consists of data collected from speakers of various L1s using a single methodological protocol. The screening of our first data, collected from Spanish and Japanese learners, raises several questions about the type and modalities of the transcription procedure which should be adopted in the project. Such a methodological pre-analysis is an essential prerequisite to any sound corpus analysis, on which the quality of the transcription can have a non-trivial impact.
203 views
Seen by:The CorDis Corpus: Mark-up and Related Issues
by Anna Marchi
Co-authored with Letizia Cirillo and Marco Venuti.
Proceedings from Corpus Linguistics Conference Series 2007.
CETA in the Context of the Coruña Corpus
2009. Literary and Linguistic Computing, doi: doi:10.1093/llc/fqp038 (co-authored with Begoña Crespo)
Writing Science, Compiling Science. The Coruña Corpus of English Scientific Writing”.
2008. In Lorenzo Modia, María Jesús (ed). Proceedings from the 31st AEDEAN Conference (531-544) A Coruña: Universidade da Coruña. ISBN: 978-84-9749-278-2 [Co-authored with Javier parapar-López]
Presenting the Coruña Corpus: A Collection of Samples for the Historical Study of English Scientific Writing
2007. In Pérez Guerra, Javier et al. (eds) ‘Of Varying Language and Opposing Creed’: New Insights into Late Modern English (341-357). Bern: Peter Lang. ISBN: 978-3-03910-788-9 [Co-authored with Begoña Crespo-García]
The Coruña Corpus Tool.
2007. Revista del Procesamiento de Lenguaje Natural, 39: 289-290 {Co-authored with Javier Parapar-Lópz].
23 views
Seen by:Selecting query terms to build a specialised corpus from a restricted-access database.
Gabrielatos, C.
2007
ICAME Journal, 31, 5-43.
This paper proposes an accessible measure of the relevance of additional terms to a given query, describes and... more This paper proposes an accessible measure of the relevance of additional terms to a given query, describes and comments on the steps leading to its development, and discusses its utility. The measure, termed relative query term relevance (RQTR), draws on techniques used in information retrieval, and can be combined with a technique used in creating corpora from the world wide web, namely keyword analysis. It is independent of reference corpora, and does not require knowledge of the number of (relevant) documents in the database. Although it does not make use of user/expert judgements of document relevance, it does allow for subjective decisions. However, subjective decisions are triangulated against two objective indicators: keyness and, mainly, RQTR.
Frequency List Wizard 1.0.0 (Perl script / source code)
More info, other versions of the program, and source code are available using the "View on..." link below.
Frequency List Wizard is a command-line program that does various useful things with... frequency lists. It's free... more Frequency List Wizard is a command-line program that does various useful things with... frequency lists. It's free software, written in Perl and licensed under the GPL v3.
Frequency List Wizard 1.0.0 (Windows Executable)
More info, other versions of the program, and source code are available using the "View on..." link below.
Frequency List Wizard is a command-line program that does various useful things with... frequency lists. It's free... more Frequency List Wizard is a command-line program that does various useful things with... frequency lists. It's free software, written in Perl and licensed under the GPL v3.
From Tombstones to Corpora: TSML for Research on Language, Culture, Identity and Gender Differences.
Co-authored with Leonhard Voltmer and Yoann Goudin, in: PACLIC21, 21st Pacific Asia Conference on Language, Information and Computation. Nov 1-3 2007, Seoul.
Tombstone inscriptions represent a genre which yields insights into cultures and languages. Applying the idea of... more Tombstone inscriptions represent a genre which yields insights into cultures and languages. Applying the idea of linguistic corpora to tombstones, we propose to create tombstone corpora as sustainable resource for the study of languages and cultures. For the annotation of tombstone corpora, we propose TSML, the Tombstone-Markup- Language, developed during the annotation of tombstones from Taiwan plus, in addition, some from China, Indonesia and Europe. We develop and discuss our conceptual framework in the annotation of tombstones with its cultural, linguistic and psychological perspectives. We will outline possible research strategies which can be followed with TSML- annotated corpora, ranging from the analysis of word meanings, to models of identity and the comparison of cultures with respect to the patterns of reference systems they provide.
29 views
Seen by:
