TED-LIUM: an Automatic Speech Recognition dedicated corpus
URL for downloading the corpus: http://www-lium.univ-lemans.fr/TED-LIUM
This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based on the TED Talks.... more This paper presents the corpus developed by the LIUM for Automatic Speech Recognition (ASR), based on the TED Talks. This corpus was built during the IWSLT 2011 Evaluation Campaign, and is composed of 118 hours of speech with its accompanying automatically aligned transcripts. We describe the content of the corpus, how the data was collected and processed, how it will be publicly available and how we built an ASR system using this data leading to a WER score of 17.4%. The official results we obtained at the IWSLT 2011 evaluation campaign are also discussed.
Iban Speech Recognition using Hidden Markov Model
In Proceedings of the Second Malaysian Joint Conference on Artificial Intelligence (MJCAI), Kuching, Malaysia, 2010.
Vietnamese Language Recognition with Sphinx4
by Khai Tran
Research into the concept of Speech technology began as early as 1936 at Bell labs.
But at that time, Bell labs... more
Research into the concept of Speech technology began as early as 1936 at Bell labs.
But at that time, Bell labs abandoned the project because of taking wrong path: artificial
intelligence is the key to the success. There was no further progress until 1970s;when
Lenny Baum of Princeton University invented Hidden Markov Model (HMM) which
provided a statistical-based approach to generate text from speech. From that point till
now, HMM, with the help of remarkable development in terms of computer processing
powers, has been widely used by companies and organizations to develop their own
speech recognition systems.
Currently, USA is the leading country in Automatic Speech Recognition (ASR) field of
research. Their ASR systems are known to be capable of recognizing natural speech in
ideal conditions with accuracy above 90%. There are as well other ASR systems have
been developed for other languages, for example Mexican, Australian, Spanish, French
…. However, because of language particularity and matters of business competition,
each country or company have their own recognition technology and normally they keep
it confidential within the country or organization.
In Viet Nam, ASR is quite new and there are research groups all over country trying to
develop ASR for Vietnamese. However, they are working independently with their own
engines and not yet to release any practical product. Therefore, with the purpose of
contributing to the research community, we are trying to develop an open and practical
Vietnamese Speech Recognition Technology (VSRT).
The purpose of the thesis is to propose a practical method of recognizing Vietnamese
words and put it into real software application that will be able to recognize digits (0-10).
The two first chapters of the thesis serves as a preparatory document with the purpose
of gathering, classifying and analyzing various sources of knowledge about ASR
technology and Vietnamese language necessary to build a new Vietnamese digit
Recognition system. Proposal, technical details and evaluating results will be appeared
in chapter 3 and 4 of the thesis.
LIUM’s systems for the IWSLT 2011 Speech Translation Tasks
This paper describes the three systems developed by the LIUM for the IWSLT 2011 evaluation campaign. We participated... more This paper describes the three systems developed by the LIUM for the IWSLT 2011 evaluation campaign. We participated in three of the proposed tasks, namely the Automatic Speech Recognition task (ASR), the ASR system combination task (ASR_SC) and the Spoken Language Translation task (SLT), since these tasks are all related to speech translation. We present the approaches and specificities we developed on each task.
11 views
Seen by:Automatic Speech Recognition for Assistive Technology Devices
by Pat Parslow
A P Harvey, R J McCrindle, K. Lundqvist and P Parslow,
Procedures of the 8th International Conference on Disability, Virtual Reality and Associated Technologies, Valparaiso, Chile, 2010, ICDVRAT, ISBN 978 07049 15022
Speech and language therapy for aphasia following stroke (2012 update)
by Marian Brady
Brady MC, Kelly H, Enderby P. Speech and language therapy for aphasia following stroke. Cochrane Database of Systematic Reviews. (submitted)
A comparison of waveform fractal dimension techniques for voice pathology classification
accepted for presentation in 2012 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12), Japan, 25th-30th March 2012.
In this paper, an attempt is made to compare and analyze the various waveform fractal dimension techniques for voice... more In this paper, an attempt is made to compare and analyze the various waveform fractal dimension techniques for voice pathology classification. Three methods of estimating the fractal dimension directly from the time-domain waveform have been compared. The methods used are Katz algorithm, Higuchi algorithm and the Hurst exponent calculated using the rescaled range (R/S) analysis. Furthermore, the effects of the window size, the base waveform used and score-level fusion with Mel frequency cepstral coefficients (MFCC) has also been evaluated. The features have been extracted from two different base waveforms, the speech signal and the Teager energy operator (TEO) phase of the speech signal. Experiments have been carried out on a subset of the Massachusetts Eye and Ear Infirmary (MEEI) database and classifier used is a 2nd order polynomial classifier. A classification accuracy of 97.54% was achieved on fusion, an increase in performance by about 2% as compared to MFCC alone.
Novel VTEO Based Mel Cepstral Features for Classification of Normal and Pathological Voices
in Proc. of 12th Annual Conference of ISCA (INTERSPEECH'11), Florence, Italy, Aug. 28-31, 2011, pp 509-512.
In this paper, novel Variable length Teager Energy Operator
(VTEO) based Mel cepstral features, viz., VTMFCC are... more
In this paper, novel Variable length Teager Energy Operator
(VTEO) based Mel cepstral features, viz., VTMFCC are proposed for automatic classification of normal and pathological voices. Experiments have been carried out using this proposed feature set, MFCC and their score-level fusion. Classification was performed using a 2nd order polynomial classifier on a subset of the MEEI database. The equal error rate (EER) on fusion was 3.2% less than EER of MFCC alone which was used as the baseline. Effectiveness of the proposed feature-set was also investigated under degraded conditions using the NOISEX-92 database for babble and high frequency channel noise.
27 views
Seen by:5 views
Seen by:7 views
Seen by:
