Acoustic Properties Of Dutch Steady-State Vowels: Contextual Effects And A Comparison With Previous Studies
Co-authored with Jan-Willem van Leussen and Paola Escudero, published in Proceedings of the ICPhS XVII, 1194-1197, 2011.
Recent vowel corpora show that there are often
clear acoustic differences between vowels produced
in... more
Recent vowel corpora show that there are often
clear acoustic differences between vowels produced
in different phonetic contexts. We expand on a
recent corpus of Northern Standard Dutch (NSD)
vowels by including a variety of consonantal
contexts. Our results show that there are very clear
contextual effects on the spectral and temporal
properties of NSD vowels. The most striking effect
is the apparent 'fronting' of vowels in alveolar
contexts, which has not previously been reported for
Dutch. Classification with a supervised learning
algorithm reveals some substantial differences
between our acoustic measurements and
Using a Web Page Design Activity to Promote Active Learning of Course Content In An Undergraduate Anatomy and Physiology Course
Published in Perspectives on Issues in Higher Education
Purpose: The purpose of this study was to evaluate student perceptions of a web page design activity as a learning... more Purpose: The purpose of this study was to evaluate student perceptions of a web page design activity as a learning activity in an undergraduate anatomy and physiology course. Methods: One class of undergraduate majors in Speech-Language Pathology and Audiology who took part in CSD 1720: Anatomy and Physiology of the Speech Mechanism on St. John’s University’s Staten Island Campus was solicited to voluntarily and anonymously complete an online survey regarding their perceptions of the utility of building a web page to learn course content. Results: Nine (34.6%) of the possible 26 students enrolled in the course completed the survey. Most respondents were freshmen or sophomores. No respondent indicated that he or she was technologically incompetent. Complete survey results were communicated in tabular format. Conclusions: Overall, student respondents to the survey indicated positive perceptions regarding the utility of building a web page to enhance recall and understanding of course concepts in an undergraduate anatomy and physiology course. Recommendations for future research include the continued survey of future students to make a better informed judgment regarding the use of this activity to improve student learning of course content.
Student Perceptions of Learning Speech Science Concepts in a Hybrid Environment
Published in PSHA Journal
The purpose of this descriptive, mixed methods case study was to explore undergraduate speech-language pathology... more The purpose of this descriptive, mixed methods case study was to explore undergraduate speech-language pathology students’ perceptions of a hybrid delivery format and their own learning of course content in a speech science course. An anonymous, online survey was administered to elicit students’ perceptions of the course format and their learning due to this format. Survey data were made up of both quantitative data and qualitative data. Quantitative data were analysed through descriptive statistics and qualitative data were subjected to thematic analysis. Results indicated that laboratory activities enhanced comprehension of speech science concepts as well as how speech-language pathologists use speech science concepts clinically. Use of asynchronous online discussions enhanced students’ comprehension of course concepts. Overall, the course structure allowed more time for students to access course information and increased students’ independent learning skills. Findings are discussed and directions for future research are recommended.
Student Preferences for Learning Speech Acoustics using Active Learning Methods
Co-authored with Dr. Jo Ann Bamdas, Florida Atlantic University. Published in PSHA Journal
The current study explored student preferences regarding course activities in an undergraduate speech science course... more The current study explored student preferences regarding course activities in an undergraduate speech science course and students’ familiarity and comfort level with acoustical concepts after participation in the course. Thirteen of 20 students enrolled in a hybrid undergraduate speech science course completed a survey aimed at eliciting student preferences for course learning activities. A paper-based survey was administered to discern students’ comfort and familiarity with acoustic principles after completing the course. Learning activities included the use of electronic lectures in lieu of face-to-face lectures, online group discussions, and completion of an electronic portfolio consisting of laboratory results and interpretations. Results suggested that the students preferred the use of laboratory activities over more traditional lectures and hypothetical in-class quizzes. Also, students reported feeling more comfortable with basic acoustic principles after completion of the course. Results are discussed as well as the significance of the research and future research directions.
Speech and language therapy for aphasia following stroke (2012 update)
by Marian Brady
Brady MC, Kelly H, Enderby P. Speech and language therapy for aphasia following stroke. Cochrane Database of Systematic Reviews. (submitted)
Speech and language therapy for aphasia following stroke (2012 update)
by Marian Brady
Brady MC, Kelly H, Enderby P. Speech and language therapy for aphasia following stroke. Cochrane Database of Systematic Reviews. (submitted)
Speech and language therapy versus placebo or no intervention for speech problems in Parkinson's disease
by Marian Brady
Clare P Herd, Claire L Tomlinson, Katherine HO Deane, Marian C Brady, Christina H Smith, Catherine Sackley, Carl E Clarke
Background
Parkinson's disease patients commonly suffer from speech and vocal problems including dysarthric... more
Background
Parkinson's disease patients commonly suffer from speech and vocal problems including dysarthric speech, reduced loudness and loss of articulation. These symptoms increase in frequency and intensity with progression of the disease). Speech and language therapy (SLT) aims to improve the intelligibility of speech with behavioural treatment techniques or instrumental aids.
Objectives
To compare the efficacy of speech and language therapy versus placebo or no intervention for speech and voice problems in patients with Parkinson's disease.
Search methods
Relevant trials were identified by electronic searches of numerous literature databases including MEDLINE, EMBASE, and CINAHL, as well as handsearching of relevant conference abstracts and examination of reference lists in identified studies and other reviews. The literature search included trials published prior to 11th April 2011.
Selection criteria
Only randomised controlled trials (RCT) of speech and language therapy versus placebo or no intervention were included.
Data collection and analysis
Data were abstracted independently by CH and CT and differences settled by discussion.
Main results
Three randomised controlled trials with a total of 63 participants were found comparing SLT with placebo for speech disorders in Parkinson's disease. Data were available from 41 participants in two trials. Vocal loudness for reading a passage increased by 6.3 dB (P = 0.0007) in one trial, and 11.0 dB (P = 0.0002) in another trial. An increase was also seen in both of these trials for monologue speaking of 5.4 dB (P = 0.002) and 11.0 dB (P = 0.0002), respectively. It is likely that these areclinically significant improvements. After six months, patients from the first trial were still showing a statistically significant increase of 4.5 dB (P = 0.0007) for reading and 3.5 dB for monologue speaking. Some measures of speech monotonicity and articulation were investigated; however, all these results were non-significant.
Authors' conclusions
Although improvements in speech impairments were noted in these studies, due to the small number of patients examined, methodological flaws , and the possibility of publication bias, there is insufficient evidence to conclusively support or refute the efficacy of SLT for speech problems in Parkinson's disease. A large well designed placebo-controlled RCT is needed to demonstrate SLT's effectiveness in Parkinson's disease. The trial should conform to CONSORT guidelines. Outcome measures with particular relevance to patients with Parkinson’s disease should be chosen and patients followed for at least six months to determine the duration of any improvement.
SpeCT - The Speech Corpus Toolkit for Praat
Formerly known as "Mietta's Praat scripts"
The aim of the Speech Corpus Toolkit (SpeCT) is to provide an organized collection of well-documented Praat scripts... more The aim of the Speech Corpus Toolkit (SpeCT) is to provide an organized collection of well-documented Praat scripts that can be easily downloaded, modified and used in order to perform small tasks during the various stages of building, organizing, annotating, analysing, searching and exporting data from a speech corpus.
MR-compatible registration of speech-related movements using a bend sensor
by Peter Soros
Published in:
Front Hum Neurosci. 2010 Mar 22;4:24.
fMRI-Compatible Registration of Jaw Movements Using a Fiber-Optic Bend Sensor.
Sörös P, Macintosh BJ, Tam F, Graham SJ.
A real-time articulatory controlled vowel synthesizer for research on speech motor learning
by Jordan Green
Introduction
When learning to speak, young children use auditory feedback to learn associations between... more
Introduction
When learning to speak, young children use auditory feedback to learn associations between articulatory movements and their acoustic consequences (Guenther et al., 1998). Presumably, this process involves the inverse mapping from acoustic goals to vocal tract shapes to muscular forces. Adults with acquired speech impairments may also undergo a similar inverse mapping process when regaining speech following injury to the vocal tract or the neural structures that govern speech. The principles underlying speech motor learning and re-learning are poorly understood though such knowledge is essential for treatments designed to improve speech. In this investigation, we examine the usefulness of a real-time articulatory-controlled vowel synthesizer for conducting experiments on auditory-motor associative learning in speech. Experiments were conducted to determine participants’ ability to generate corner vowels using the synthesizer and to adapt to experimental manipulations of the mappings between mouth shape and vowel sounds.
Methods and Results
Three neurologically intact adult participants with normal hearing were studied. Lip movements were registered in 3D at 30 fps using an infrared motion capture system. Mouth shape was computed in near real-time and was used as a MIDI controller to trigger the playback of prerecorded vowel samples. In the first condition, speakers were instructed to use the synthesizer to generate corner vowels using an unaltered mouth shape-to-vowel map. In the second condition, the participants were required to adapt to large alterations in mapping between mouth shapes and vowel sounds. Articulatory accuracy, time-to-target, and path distance were measured across trials for the unaltered and altered map conditions. Preliminary data indicate that all of the participants were able to generate accurate vowel sounds on their first trial using the synthesizer during the unaltered map condition. In addition, two of the participants rapidly adapted to the experimental manipulations of mouth shape-to-sound relations.
Discussion
These preliminary findings suggest that the articulatory-controlled synthesizer is a useful tool for investigating auditory-motor associative learning. Participants were able to use auditory feedback provided by the synthesizer to form new mappings between articulatory targets and acoustic goals. This findings is consistent with prior studies demonstrating that healthy adult talkers rapidly adapt to perturbations of speech movements and speech/vocal feedback, and alterations of vocal tract anatomy (Houde & Jordan, 2002; Tremblay, Shiller, et al. 2002). Additional work is needed to determine if these preliminary findings generalize to a larger group of subjects.
References
Guenther, F. H. (1995). Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review, 102, 594-621.
Houde, J. F. & Jordan, M. I. (1998). Sensorimotor adaptation in speech production. Science, 279, 1213-1216.
24 views
Seen by:A real-time articulatory controlled vowel synthesizer for research on speech motor learning
by Jordan Green
Jordan R. Green, Lien Phan, Ignatius Nip, and Antje Mefferd
Department of Special Ed. and Com. Disorders, University of Nebraska - Lincoln,
Dysarthria following stroke – the patient’s perspective on management and rehabilitation. Clinical Rehabilitation. 2011;25:935-952. doi:10.1177/0269215511405079.
by Marian Brady
Brady MC, Clarke A, Dickson S, Paton G, Barbour R.
Objective: To explore the perceptions of people with stroke-related dysarthria in relation to the management and... more
Objective: To explore the perceptions of people with stroke-related dysarthria in relation to the management and rehabilitation of dysarthria.
Design: Qualitative semi-structured interviews.
Setting: Community setting
Subjects: Twenty-four people with an acquired dysarthria as a result of a stroke in the previous three years. All were living at home at the time of the interview. None exhibited a co-existing impairment (for example, aphasia, apraxia or cognitive impairment) that might have contributed to their communicative experiences.
Results: Participants described the considerable efforts they made to maximize their communicative effectiveness prior to, and during, communicative interactions. Activities described included careful articulation and vocal projection as well as more inconspicuous strategies including pre-planning interactions, focused, effortful speech and word substitution. Communication was facilitated by a range of strategies including drafting, rehearsal, manoeuvring and ongoing monitoring and repair. Self-led speech rehabilitation activities were functionally based and often undertaken regularly. Some novel reading-aloud and speaking-aloud activities were described.
Conclusion: The quantity and nature of inconspicuous, internalized, cognitive activities people with dysarthria engage in to maximize their communicative effectiveness should be considered in evaluating the impact of dysarthria following stroke. Focusing upon externally observable characteristics alone is insufficient. Challenging, functionally relevant, patient-focused activities, materials and targets are more likely to be perceived by the patient as relevant and worthwhile and are thus more likely to ensure adherence to recommended rehabilitation activities.
Relative Timing of Bilabial Gesture in Finnish
O'Dell, Michael & Šimko, Juraj & Nieminen, Tommi & Vainio, Martti & Lehtinen, Mona 2011: Relative timing of bilabial gesture in Finnish. - Wai-Sum Lee & Eric Zee (eds.), Proceedings of the 17th International Congress of Phonetic Sciences. City University of Hong Kong.
The Embodied Task Dynamic model of gestural sequencing predicts that an intervocalic consonantal lip closing gesture... more The Embodied Task Dynamic model of gestural sequencing predicts that an intervocalic consonantal lip closing gesture should come later if the tongue is moving from /i/ to /a/ rather than from /a/ to /i/ because this relation is more efficient in terms of production and perceptibility. We tested this prediction for Finnish /ipa/ and /api/ using EMA to track articulation. The results confirmed the predictions of the model for single /p/ and also revealed a significantly greater lag for geminate /pp/. This quantity effect is also born out by the model.
Quantitative association of vocal-tract and facial behavior
@ARTICLE{hani-98a,
author = "H. Yehia and P. Rubin and E. Vatikiotis-Bateson",
title = "Quantitative Association of Vocal-Tract and Facial Behavior",
journal = {Speech Communication},
volume = 26,
number = "1--2",
pages = "23--43",
year = 1998
}
This paper examines the degrees of correlation among vocal-tract and facial movement data and the speech
acoustics. Multilinear techniques are applied to support the claims that facial motion during speech is largely a by-
product of producing the speech acoustics and further that the spectral envelope of the speech acoustics can be better
estimated by the 3D motion of the face than by the midsagittal motion of the anterior vocal-tract (lips, tongue and jaw).
Experimental data include measurements of the motion of markers placed on the face and in the vocal-tract, as well as
the speech acoustics, for two subjects. The numerical results obtained show that, for both subjects, 91% of the total
variance observed in the facial motion data could be determined from vocal-tract motion by means of simple linear
estimators. For the inverse path, i.e. recovery of vocal-tract motion from facial motion, the results indicate that about
80% of the variance observed in the vocal-tract can be estimated from the face. Regarding the speech acoustics, it is
observed that, in spite of the nonlinear relation between vocal-tract geometry and acoustics, linear estimators are
sucient to determine between 72 and 85% (depending on subject and utterance) of the variance observed in the RMS
amplitude and LSP parametric representation of the spectral envelope. A dimensionality analysis is also carried out,
and shows that between four and eight components are sucient to represent the mappings examined. Finally, it is
shown that even the tongue, which is an articulator not necessarily coupled with the face, can be recovered reasonably
well from facial motion since it frequently displays the same kind of temporal pattern as the jaw during speech. Ó 1998
Elsevier Science B.V. All rights reserved.
Keywords: Vocal-tract motion; Facial motion; Line spectrum pair (LSP); Singular value decomposition; Principal component analysis; Dynamic time warping (DTW); Linear estimator
VoxSecure 2: An Engine for Perception-Based Speaker Identification
M.Sc. dissertation:
A dissertation submitted to the Department of Intelligent Computer Systems
in fulfilment of the requirements for the degree of Master of Science in Computer Science and Artificial Intelligence - Faculty of ICT, University of Malta.
Citation:
A. DeMarco, “VoxSecure 2: An Engine for Perception-Based Speaker
Identification”. M.Sc. dissertation, University of Malta, December 2010.
In a previous undergraduate final year project we developed a state of the art baseline library for voice recognition... more
In a previous undergraduate final year project we developed a state of the art baseline library for voice recognition and verification. The target of this project (to evaluate and discuss a baseline speaker identification system) was reached. However scenarios that challenge this biometric technique still exist. Just as research on voice biometrics is valid, so is research on ways to break the system. Any biometric system has to be continuously improved.
However, as our project showed, voice recognition is a computing intensive task. Therefore we must keep in mind that any process we add to its pipeline will result in a longer delay for a result. The dependence on strong computing power starts to automatically rule out its use on power-limited devices, and therefore the idea starts to show itself as an impractical solution in the real world.
The acoustic features of speech are represented using cepstral vectors. These vectors represent voice features over a very short time segment (25ms-40ms). This time window is a rough estimate for the duration of phoneme sounds in speech. Therefore, the actual characteristics that are being gathered should collectively build a voice model over the entire distribution of phonemes as uttered by an individual speaker. However, speech signals are never “pure”. There are unvoiced regions, there is noise, and there is no way to correctly map data to a specific phoneme if the boundaries are simply an arbitrary calculation over an entire speech signal.
Therefore, even though the statistical model that is built over the cepstral vectors represents the vocal range of an individual, the model is in fact gathering data that has nothing to do with the individual, and the probabilistic peaks that will be used to infer an identity are misaligned. When the size of the speaker population starts to grow, these misalignments will cause erroneous detections.
In this dissertation we design and develop an enhanced voice recognition system, with the task of optimizing performance via a new recognition algorithm that focuses on perceived voiced speech units rather then the entire acoustic data train.
86 views
Seen by: and 2 moreAn Accurate and Robust Gender Identification Algorithm
INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31.
We describe a robust, unsupervised method of automatic gender identification from speech. We first design a baseline... more
We describe a robust, unsupervised method of automatic gender identification from speech. We first design a baseline gender classifier based on MFCC features, and add a second classifier that uses context-dependent but text-independent pitch features. The results of these classifiers are then examined for disagree ments in gender classification. Any disagreements are resolved by the use of a novel pitch-shifting mechanism applied to the ut- terances. We show how the acoustic context classifier provides very good gender identification results, and how these are further enhanced by the pitch-shifting process. Furthermore this enhancement is preserved across a set of different corpora.
Index Terms: gender identification, speaker recognition, pitch

