MIT Lincoln Laboratory: Publications: Journal Archives: 8-2

Volume 8, Number 2

Automatic Language Identification of Telephone Speech
Marc A. Zissman

Lincoln Laboratory has investigated the development of a system that can automatically identify the language of a speech utterance. To perform the task of automatic language identification, we have experimented with four approaches: Gaussian mixture model classification; single-language phone recognition followed by language modeling (PRLM); parallel PRLM, which uses multiple single-language phone recognizers, each trained in a different language; and language-dependent parallel phone recognition. These four approaches, which span a wide range of training requirements and levels of recognition complexity, were evaluated with the Oregon Graduate Institute Multi-Language Telephone Speech Corpus. Our results show that the three systems with phone recognizers achieved higher performance than the simpler Gaussian mixture classifier. The top-performing system was parallel PRLM, which performed two-language, closed-set, forced-choice classification with a 2% error rate for 45-sec utterances and a 5% error rate for 10-sec utterances. For eleven-language classification, parallel PRLM exhibited an 11% error rate for 45-sec utterances and a 21% error rate for 10-sec utterances.

Toolkit for Image Mining: User-Trainable Search Tools
Richard L. Delanoy

A computer environment called the Toolkit for Image Mining (TIM) is being developed to assist users in creating search tools for pattern-matching tasks such as content-based image retrieval. TIM provides users who have diverse interests and skill levels with the ability to create and refine search tools in an interactive process. The user simply points at examples and counterexamples of the object of interest. A learning algorithm then uses these inputs to build a model of the user's intentions incrementally; from which a search tool is constructed and a visual feedback of search results is presented to the user. The user may then point at mistakes made by the search tool to refine performance further.

Search tools are constructed in the form of functional templates, which are generalized matched filters capable of knowledge-based image processing. The ability of this system to learn the user's intentions from experience contrasts with other existing approaches to content-based image retrieval that perform searches on the characteristics of a single input example or on a predefined and semantically constrained textual query. Currently, TIM is capable of learning spectral and textural patterns, but should be adaptable to learning shapes as well. Other possible applications of TIM include quantitative image analysis, generation of metadata for annotating images, prioritization or reduction of data in bandwidth-limited situations, and construction of components for more complex computer-vision algorithms.

Fabrication and Theory of Diamond Emitters
Michael W. Geis, Jonathan C. Twichell, and Theodore M. Lyszczarz

This article describes the fabrication of gated diamond field-emission cathodes and discusses a theory of their operation. These cathodes are made by using commercial diamond grit with added nickel and cesium salts to enhance emission. The resulting structure resembles a Spindt-type field-emission cathode with the internal metal cone replaced by a layer of diamond grit approximately one hundred nanometers thick. Emission of electrons from these cathodes occurs at the lowest reported gate voltage of any field-emission device and is unaffected by operation at pressures of over 100 Pa of nitrogen. Operation in oxygen and hydrogen sulfide at pressures of 6 X 10^-2 Pa degrades emission, but the cathodes recover once the ambient pressure is reduced to below 1 X 10^-4 Pa. The emission current noise is 2.5% rms over an eight-hour period and 1% rms over a three-millisecond period. These cathodes suffer from high gate current that varies from 0.2 to 10⁺⁴ times the emitted current. The high gate current is known to be process dependent and is not inherent to the cathodes.

The emission performance is explained by the stable negative electron affinity of diamond, which allows for injection of electrons from diamond into vacuum with little to no electric field. Cathode operation is limited by the injection of electrons into the diamond at the back metal-diamond interface, which depends on the doping of the diamond and the roughness of the interface.

Automatic Speaker Recognition Using Gaussian Mixture Speaker Models
Douglas A. Reynolds

Speech conveys several levels of information. On a primary level, speech conveys the words or message being spoken, but on a secondary level, speech also reveals information about the speaker. The Speech Systems Technology group at Lincoln Laboratory has developed and experimented with approaches for automatically recognizing the words being spoken, the language being spoken, and the topic of a conversation. In this article we present an overview of our research efforts in a fourth area—automatic speaker recognition. We base our approach on a statistical speaker-modeling technique that represents the underlying characteristic sounds of a person's voice. Using these models, we build speaker recognizers that are computationally inexpensive and capable of recognizing a speaker regardless of what is being said. Performance of the systems is evaluated for a wide range of speech quality, from clean speech to telephone speech, by using several standard speech corpora.

top of page