Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
10 pages
1 file
With technological development and the emergence of advanced forms of communication such as chat and SMS, a new form of writing, having high deviation from standard language, has appeared. This paper focuses specifically on electronic writing with Latin letters in Tunisian dialect. We describe the methodology used for the construction of a dialectal corpus, present the characteristics of this new form of writing and detail its peculiarities with numbers.


























This paper is set within the context of Electronic Amiyyas (EA) written dialects in Facebook used by youth in the Arab World. In recent years, electronic writings have begun to replace print writing in many circumstances. Similar to the printing press, this new technology has also influenced the written colloquial forms of languages, since youngsters all over the world are formulating and devising new conventions for writing their indigenous spoken languages electronically. This linguistic development is now occurring in many languages in the world, and Arabic is just one example. Chinese youngsters, for example, are also deviating from the standard Chinese writing system by mixing spoken and written language features in order to facilitate the written use of oral dialects on the Internet. In Japan, young Japanese employ colloquial language online, for example, by using eccentric spelling which reproduces actual articulation in the typed message. This pattern of electronic writings is particularly salient in diglossic languages, such as Arabic, Persian and several languages of the Indian subcontinent such as Tamil, Sinhala, Telugu and Bengali. In such cases, the invention of a new writing system for the spoken language might endanger the formal written language. In this research, I describe the development of new systems for writing colloquial Arabic dialects. I am studying the consonantal system of Electronic Amiyya in several countries in the Arab World. This is the first empirical study of this kind. The purpose of the study is to examine the differences in this system amongst the different users in different countries and to investigate any occurring standardisations.
Social and technological changes over the past several decades have led to widespread writing of "spoken" Arabic dialects. However, there is little quantitative research on this phenomenon and most existing research is limited to Egypt and Morocco. In addition, little is known about the characteristics of these newly written vernaculars, even though encoding an unwritten language in writing is not merely a technical assignment of sound to letter. Rather, it is a complex process that must balance practical considerations with ideological stances, such as autonomy from the standard language (Mühleisen 2005). The spread of vernacular into writing and the accompanying tension over its form constitutes the process of vernacularization. This dissertation documents and analyzes this vernacularization as it is occurring in Tunisia, examining how Tunisians writing in dɛrja collectively position themselves in relation to Standard Arabic, French, and other Arabic vernaculars. Using a 32-million-word online corpus and an innovative method for quantifying language choice, I found that the proportion of Tunisian Arabic on the online forum studied increased from 19.7% in 2010 to 69.9% in 2021.
The new electronic communications increasingly induce the language change. Like many languages of the worlds due to electronic communications the change is visible in Arabic too and consequently we have a variety that many be labelled as the ‘Latinized Arabic’, The paper endeavours to examine the Latinized Arabic language and discusses the impact of electronic communication on Arabic language script among Saudi students. In addition, it investigates the linguistic and sociolinguistic issues of the Latinized Arabic on the basis of analysis of the language used in students’ writings in the electronic communication. The data for the paper was collected from voluntary students’ electronic conversation. The findings indicate that the students are using new variety of Arabic language by writing it in English script when they chat in the electronic communication. Another finding is that Latinized Arabic language has its own orthographic system. Further, the students use code mixing and code switching with English language. The paper conclude that further research should continue so that we may investigate and analyse the features of Latinized Arabic at all levels of linguistics. Obviously such investigation must examine a large body of data involving mixed sexes and sometimes female samples too.
2020
This article describes the constitution process of the first morpho-syntactically annotated Tunisian Arabish Corpus (TArC). Arabish, also known as Arabizi, is a spontaneous coding of Arabic dialects in Latin characters and “arithmographs” (numbers used as letters). This code-system was developed by Arabic-speaking users of social media in order to facilitate the writing in the Computer-Mediated Communication (CMC) and text messaging informal frameworks. Arabish differs for each Arabic dialect and each Arabish code-system is under-resourced, in the same way as most of the Arabic dialects. In the last few years, the attention of NLP studies on Arabic dialects has considerably increased. Taking this into consideration, TArC will be a useful support for different types of analyses, computational and linguistic, as well as for NLP tools training. In this article we will describe preliminary work on the TArC semi-automatic construction process and some of the first analyses we developed o...
This paper describes the process of creating a novel resource, a parallel Arabizi-Arabic script corpus of SMS/Chat data. The language used in social media expresses many differences from other written genres: its vocabulary is informal with intentional deviations from standard orthography such as repeated letters for emphasis; typos and nonstandard abbreviations are common; and nonlinguistic content is written out, such as laughter, sound representations, and emoticons. This situation is exacerbated in the case of Arabic social media for two reasons. First, Arabic dialects, commonly used in social media, are quite different from Modern Standard Arabic phonologically, morphologically and lexically, and most importantly, they lack standard orthographies. Second, Arabic speakers in social media as well as discussion forums, SMS messaging and online chat often use a non-standard romanization called Arabizi. In the context of natural language processing of social media Arabic, transliterating from Arabizi of various dialects to Arabic script is a necessary step, since many of the existing state-of-the-art resources for Arabic dialect processing expect Arabic script input. The corpus described in this paper is expected to support Arabic NLP by providing this resource.
BRISMES 2012 Graduate Conference Papers, 2012
The role of the internet in the popular protests of 2011 cannot be overestimated. Most importantly, the internet allowed online activists to escape censorship and communicate to thousands if not millions of people in real time. What is interesting about this form of communication is the language of choice particularly in Egypt – for centuries Classical (CA) or Modern Standard (MSA) Arabic have been the accepted forms of writing; however, the form of language being used online leans more towards colloquial Arabic, which has up until now only been accepted as a spoken form. The relationship between the written and spoken forms of Arabic in Egypt has been detailed by Haeri (2003), but the use of spoken Arabic in online writing is yet to be explored. This paper looks at the relationship between the form of the language used in online writing and the messages being conveyed. The suggestion is that away from the censorship of state media and the press, writers are free to use dialectal forms of the language for a freer, more direct approach to their readers, which has been more effective in communicating their message than the use of CA or MSA would have been.
Because of the many varieties of Arabic, there can never be 'one' authoritative corpus of the language. To achieve the best results for language-learning resources and natural language processing, corpora for both the standard language and the spoken varieties are needed. To this end, the Tunisian Arabic Corpus (TAC) is a project, led by Karen McNeil and Miled Faiza, seeking to build a four-million-word corpus of Tunisian Spoken Arabic. There are many challenges to creating Arabic corpora, and dialectal corpora in particular, including those of sources, balance, and parsing. The corpus currently consists only of about 881,000 words, and issues of balance and parsing have not been completely solved. Nonetheless, the corpus has proved to be a useful resource to Arabic students and researchers, and also presents a model for others who wish to create dialectal Arabic corpora.
Studies on Arabic Dialectology and Sociolinguistics, 2019
This volume contains over fifty articles related to various fields of modern Arabic dialectology. All the articles are revised and enhanced versions of papers read on the 12 th Conference of the Association Internationale de Dialectologie Arabe (AIDA) held in Marseille in June 2017. Since its first conference in Paris in 1993, AIDA members gather every two years in different country. The collection of the AIDA proceedings offer an updated insight of the development of the field. During the past few decadesthe the study of Arabic dialects has become an important branch of research covering a wide range of subjects from phonological analyses, morphosyntax, semantics to pragmatics, sociolinguistics, folk linguistics, studies on literacy and writings, cultural and artistic practices, etc. As many articles of this volume illustrate, the study of Arabic dialects explores different aspects of the languages and cultures of the contemporary Arab world. A remarkable feature is the growing and constant participation of young scholars from all around the globe.
SMS (Short Message Service) messages have become a common means of communication throughout the world. With new colors and tones in the communication world, Romanized Persian (RP) is an emerging code employed by Iranian texters. The current study is to illuminate the main findings of a linguistic analysis of RP SMS messages written by Iranian cell phone users. To this end, a corpus of 719 RP SMS messages was collected manually from informants. The data were transcribed and analyzed linguistically. Analysis was based on some linguistic features found frequently among the SMS messages and examples related to each linguistic feature were described in detail. The findings revealed that there are several main characteristics including punctuation, omissions, consonant writing, graphical means and symbols, contractions, letter repetitions, loanwords, and letternumber homophones. The implications of the study concern the way cell phone users in Iran use the language of SMS in their messages as well as the common linguistic features of RP SMS messages. The paper concludes by promoting the approach to analysis taken in this study as a basis for future research into SMS communication in Iran and even in global contexts.
Romanised Jordanian Arabic is a newly emerging code of electronic communication extensively used by first generation e-message senders, which might be described as a hybrid lingua franca or even a pidgin. This study, based on 1098 e-mail messages sent by 257 undergraduate students, and on 1400 chat turns exchanged between nick-named senders, as well as on an A4 eight-page conversation run by seven participants, all of whom having a workable knowledge of English, reveals that notational formalism representing consonants is 37% systematically employed while the rest is variably represented; for one Arabic character there can be up to 6 corresponding symbols, mainly Roman, and Arabic numerals whose selection finds justification on pictorial and pronunciation basis. Vowels, on the other hand, are found less systematic where different sounds are assigned the same vowel character. Since all of the messages seem to have been exchanged between university students and/or graduates, code-switching is too obvious. It has been found out that 60% of the messages involve switching from English into Romanised Jordanian Arab. The majority of switches mainly involve nouns (61.84%), a conclusion which moderately supports previous sociolinguistic findings. Where a switch happens to be clausal, code-switching turns out to be ‘code-mixing’, the function of which is at best rhetorical. When switching is intra-sentential, the grammars of English and Arabic match each other but with noticeable word-order reversal.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
The Levantine Review, 2012
HAL (Le Centre pour la Communication Scientifique Directe), 2022
Canadian Journal of Learning and Technology / La revue canadienne de l’apprentissage et de la technologie, 2015
International Journal of Linguistics, Literature and Translation
Written Language & Literacy, 2008
Lingua Cultura, 2018
ACM Transactions on Asian and Low-Resource Language Information Processing