Generative Oscillation - A Cognitive Model for the Emergence of Language
Research Material for a discontinued PhD
DRAFT COPY ONLY
NOT READY FOR PRINT PUBLICATION
The GO model proposes a co-generative view of the emergence of language. Most conventional linguistics models conceive... more The GO model proposes a co-generative view of the emergence of language. Most conventional linguistics models conceive of language as a representational system of symbols which refer to events, either mental or external to the organism. This representational function is said to motivate the linguistic system and (depending upon the linguistic model) largely control its form. The GO (Generative Oscillation) model proposed here recognizes the representational role of language. However it notes that as the mental linguistic system itself becomes efficiently organized, it creates an internal logic and drive of its own. To some extent this internally motivated linguistic system is conceived to override the external motivation to represent another reality. Since the internal linguistic system is dynamic and generative, it may give rise to linguistic output which seems strange in an inter-human communicative context (or even within the reflective mind of the creator). Thus while the external communicative context can become a constraint on unmotivated non-representational "internal language", it might not eliminate it. The Generative Oscillation model proposes that actual language production is an oscillating compromise between the representational function of language and the mental "language bot" itself (i.e. an internal self-organizing system) which is generating language strings just because that is what language language bots do. As far as I know, the Generative Oscillation Model, or anything like it, had not been suggested before in linguistics at the time of writing. Some conventional linguists may find it a bit "off the wall".
The Influence of Typological Features on Stylometric Text Classification
draft only
This work aims to establish whether the features of morphological typology that a particular language exhibits affect... more This work aims to establish whether the features of morphological typology that a particular language exhibits affect parameters such as accuracy and precision is stylometric measurements conducted for the purpose of text classification. This work provides insights for such fields as plagiarism detection, authorship attribution, automatic essay scoring, and sentiment classification.
Applying ISO-Space to Healthcare Facility Design Evaluation Reports
This paper describes preliminary work on the spatial annotation of textual reports about healthcare facility design to... more This paper describes preliminary work on the spatial annotation of textual reports about healthcare facility design to support the long-term goal linking of report content to a three-dimensional building model. Emerging semantic annotation standards enable formal description of multiple types of discourse information. In this instance, we investigate the application of a spatial semantic annotation standard at the building-interior level, where most prior applications have been at inter-city or street level. Working with a small corpus of design evaluation documents, we have begun to apply the ISO-Space specification to annotate spatial information in healthcare facility design evaluation reports. These reports present an opportunity to explore semantic annotation of spatial language in a novel situation. We describe our application scenario, report on the sorts of spatial language found in design evaluation reports, discuss issues arising when applying ISO-Space to building-level entities and propose possible extensions to ISO-Space to address the issues encountered.
TIMEN: An Open Temporal Expression Normalization Resource
Co-authored with Hector Llorens, Robert Gaizauskas and Estela Saquete
Temporal expressions are words or phrases that describe a point, duration or recurrence in time. Automatically... more Temporal expressions are words or phrases that describe a point, duration or recurrence in time. Automatically annotating these expressions is a research goal of increasing interest. Recognising them can be achieved with supervised machine learning, but interpreting them accurately (normalisation) is a complex task requiring human knowledge. In this paper, we present TIMEN, a community-driven tool for temporal expression normalisation. TIMEN is derived from current best approaches and is an independent tool, enabling easy integration in existing systems. We argue that temporal expression normalisation can only be effectively performed with a large knowledge base and set of rules. Our solution is a framework and system with which to capture this knowledge for different languages. Using both existing and newly-annotated data, we present results showing competitive performance and invite the IE community to contribute to the resource in order to solve the temporal expression normalisation problem.
2 views
Seen by:Kleinschmidt, D.F., Fine, A.B., and Jaeger, T.F. 2012. A belief-updating model of adaptation and cue combination in syntactic comprehension. The 34th Annual Meeting of the Cognitive Science Society (CogSci12). Sapporo, Japan. July, 2012.
Feel free to cite. For page numbers, pls see the CogSci Proceedings webpage.
We develop and evaluate a preliminary belief-updating model which links intermediate-term (i.e., over several days)... more
We develop and evaluate a preliminary belief-updating model which links intermediate-term (i.e., over several days) syntactic adaptation to the joint statistics of syntactic structures and lexical cues to those structures. This model shows how subjects differentially depend on different cues to syntactic structure following changes in the reliability of those cues, as shown by Fine and Jaeger (2011). By relating syntactic adaptation and cue combination to rational inference under uncertainty, this work links learning and adaptation in sentence processing with adaptation in speech perception and non-linguistic domains.
Keywords: sentence processing, adaptation, Bayesian modeling
63 views
Seen by: and 1 moreType-token and Hapax-token Relation: A Combinatorial Model
by Jiří Milička
Published in Glottotheory 2/1 2009
Contains an exact formula for computing Type-token relation curve from a frequency distribution of types of a text (or... more Contains an exact formula for computing Type-token relation curve from a frequency distribution of types of a text (or from rank-frequency distribution). The formula is generalized to compute not only the number of the types, but also the number of the types of a certain frequency.
8 views
Seen by:Detectando la mentira en lenguaje escrito
Co-authored with Rafael Valencia-García & Pascual Cantos. Published in Procesamiento de Lenguaje Natural.
Deception in language has been studied from the perspective of several disciplines, being the most recent one opinion... more Deception in language has been studied from the perspective of several disciplines, being the most recent one opinion mining. Within this framework, the present study attempts to explore cues to deception in written Spanish, which has not been fully investigated yet. For our purposes, we have developed a framework based on a classifier using a Support Vector Machine (SVM) in order to detect deception in an ad hoc opinion corpus. We have used the psycholinguistic categories defined in LIWC (Pennebaker, Francis and Booth, 2001) through its four broad dimensions for the subsequent training of the abovementioned classifier. The findings reveal that truthful and deceptive texts in Spanish are indeed separable, being the two first dimensions, linguistic and psychological processes, the most relevant ones for fulfilling our aim.
Sentiment Analysis amidst Ambiguities in YouTube Comments on Yoruba Language (Nollywood) Movies
In Proceedings of the 21st international World Wide Web Conference (WWW2012), April 16 - April 20, 2012, Lyon, France
Nollywood is the second largest movie industry in the world in terms of annual movie production. A dominant number of... more Nollywood is the second largest movie industry in the world in terms of annual movie production. A dominant number of the movies are in Yoruba language spoken by over 20 million people across the globe. The number of Yoruba language movies uploaded to YouTube and their corresponding comments is growing exponentially. However, YouTube comments made by native speakers on Yoruba movies combine English language, Yoruba language, and other commonly used “pidgin” Yoruba language words. Since Yoruba is still a resource constrained language, existing sentiment or subjectivity analysis algorithms have poor performances on YouTube comments made on Yoruba language movies. This is because of the constrained language ambiguities. In this work, we present an automatic sentiment analysis algorithm for YouTube comments on Yoruba language movies. The algorithm uses SentiWordNet thesaurus and a lexicon of commonly used Yoruba language sentiment words and phrases. In terms of precision-recall, the algorithm performs more than a state-of-the-art sentiment analysis technique by up to 20%.
La Variation Prosodique Dialectale en Français. Données et Hypothèses
by Nicolas Obin
Mathieu Avanzi, Nicolas Obin, Guri Bordal, Alice Bardiaux
Journées d'Etude de la Parole, Grenoble, France
Dans cet article, nous comparons la prosodie de 6 variétés de français parlées en France (Paris et Lyon), en Belgique... more Dans cet article, nous comparons la prosodie de 6 variétés de français parlées en France (Paris et Lyon), en Belgique (Tournai et Liège) et en Suisse (Genève et Neuchâtel). L’objectif est de voir si les 6 variétés considérées peuvent être discriminées sur la base de critères exclusivement prosodiques. Les enregistrements du même texte lu par 4 locuteurs pour chacune des variétés sont transcrits, alignés et codés pour l’étude de l’accentuation, du phrasé et du rythme. Les résultats d’une méthode de classification non- supervisée guidée par les hypothèses (top-down) aboutissent à des résultats cohérents avec une classification a priori des variétés sur une échelle d’éloignement dialectal, alors qu’une méthode de classification non-supervisée émergente (bottom-up) donne lieu à des
8 views
Seen by:A la Recherche des Temps Perdus : Variations sur le Rythme en Français
by Nicolas Obin
Nicolas Obin, Mathieu Avanzi, Guri Bordal, Alice Bardiaux
Journées d'Etude de la Parole, Grenoble, France
Dans cet article, nous étudions la pertinence des mesures acoustiques du rythme en vue de rendre compte de la... more Dans cet article, nous étudions la pertinence des mesures acoustiques du rythme en vue de rendre compte de la variation dialectale en français (variétés standard, dialectales et en contact). Dans un premier temps, nous soulevons les limites des mesures conventionnelles de rythme (comme le %V, ∆C ou PVI). Dans un second temps, nous introduisons des mesures acoustiques du rythme fondées sur la description de caractéristiques suprasegmentales, et associées aux concepts de métrique (régularité des syntagmes accentuels) et de tempo (mesures de débit). Les mesures proposées conduisent à une classification consistante des variétés de français en regard de la classification attendue.
7 views
Seen by:Lexical surprisal as a general predictor of reading time
by Stefan Frank
Fernandez Monsalve, I., Frank, S.L., & Vigliocco, G. (2012). Lexical surprisal as a general predictor of reading time. Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics.
Probabilistic accounts of language processing can be psychologically tested by comparingword-reading times (RT) to the... more
Probabilistic accounts of language processing can be psychologically tested by comparingword-reading times (RT) to the conditional word probabilities estimated bylanguage models. Using surprisal as a linking function, a significant correlation between unlexicalized surprisal and RT has been reported (e.g., Demberg and Keller, 2008), but success using lexicalized modelshas been limited. In this study, phrase structure grammars and recurrent neural networks estimated both lexicalized and unlexicalized surprisal for words of independent
sentences from narrative sources. These same sentences were used as stimuli ina self-paced reading experiment to obtain RTs. The results show that lexicalized surprisalaccording to both models is a significant predictor of RT, outperforming its unlexicalizedcounterparts.
