Areas of Application

Phone-based speech dialog systems
Speech control
Text generation
Translation technology
Information retrieval
Internet search agents with natural language interface
Information extraction
Question-answer systems
Automatic text summary
Spell checking


Speech-to-text technology is used within dictation systems. Dictation systems are systems which convert spoken language input (via microphone) into a corresponding written text and print out the written text onto a computer screen.

The spoken language input is analyzed by a speech recognition system. A speech recognition system is a system which recognizes a spoken language input (utterance) by converting utterance's soundwave in a corresponding speech signal and by analyzing the speech signal in real time. At the same time, signal specific information is picked out from speech signal. With the help of signal specific information and with the help of calculations using theory of probalities, it is tried to assign corresponding words from a lexicon to speech signal. For this reason, words in lexicon are provided with a specific transcription alphabet.

During development of speech recognition systems, it must be considered that identical utterances of different speakers lead to speech signals which are not completely identical because humans have different voices. Reasons for this are:
- male voices vs. female voices,
- accent because of a regional origin,
- age dependant voice.
Nevertheless a speech recognition system must always come to the same conclusion for different speakers.



With the help of text-to-speech systems, written texts (stored on a computer) can be automatically read out. Computation and read out of acoustic speech output is executed by a text-to-speech system. Output medium is a loudspeaker. Text-to-speech systems facilitate to customize some parameters such as selection of language (English, German, etc.) or male vs. female voice.

Text-to-speech systems read out their texts with artificial "voices". When text is read out correct pronounciation and intonation matters. A single read out word consists of single sounds. And each sound consists of single sound components. A text-to-speech systems contains all necessary sound components. Each sound component is a piece of speech signal. To generate a sound or a acoustic word such sound components are connected by appending single speech signal pieces. Which sound have to be selected also depends on adjoining sound components.


Phone-based speech dialog systems:

Phone-based speech dialog systems facilitate a dialog between a human and a computer. Purpose of such systems is to relieve call center staff from answering routine questions or to offer full automatic information platforms (voice portals) which normally only have to answer standardized questions.

For example in case of a phone-based flight schedule information system, flights can be queried. After having inserted data in form of a natural language input, speech dialog system analyzes the input (sentence) and checks which data have being inserted and which data are still missed to print out available flights. If the inserted sentence already contains all relevant data then dialog system can present an answer immediately. If some data are still missed in inserted sentence then speech dialog system must ask for the missing data in separate dialog steps.

Two examples shall explain how such dialog works in case of a automatic flight schedule information system. For such an information system, arrival airport, arrival day, arrival time and departure aiport are the relevant data. (Departure airport is queried seperately). In first example, the user input already contains all relevant data. So information system is able to present an answer.

System>  When do you want to be at arrival airport?
User>       I want to be in Paris next saturday at 10 a.m.
System>  Please say the departure location!
User>       Frankfurt.
System>  I can offer you following flights:
                  Flight ... from Frankfurt to Paris at [date and time].
                  Flight ... from Frankfurt to Paris at [date and time].

Next example shows dialog's behavior when user input does not contain all relevant data, yet.
In this example, arrival time is still missed.

System>  When do you want to be at arrival airport?
User>       I want to be in Paris next saturday.
System>  At what time do you want to be in Paris?
User>       At 10 a.m.
System>  Please say the departure location!
User>       Frankfurt.
System>  I can offer you following flights:
                  Flight ... from Frankfurt to Paris at [date and time].
                  Flight ... from Frankfurt to Paris at [date and time].

Now all data are available because in an additional dialog step the missing arrival time was inserted.
So, dialog system is able to print out corresponding flights.

Within the framework of a phone, speech language dialog system spoken utterances of a caller are analyzed in real time by a speech recognition and languge processing system. Language processing system computes a formal meaning of the utterance so that speech dialog system is able to react correspondantly. Dialog system can understand only such utterances which refer to application domain of underlying dialog system (in the example: flight schedule information system). Due to speech input, an answer is computed by dialog system which is printed out by a text generation system and read out by a text-to-speech system.


Speech control:

Speech control of electronic devices means that the operating functions of an electronic device (of a
cell phone, for example) do not occur by pressing buttons and selecting menu items on the screen
but by means of voice commands. Thus, the voice commands here replace haptical operation.

Speech control exists for example for:
- Cars (in order to be able to operate mediaplayer, for example, via speech during drive),
- Smartphones, as well as for
- the "intelligent house" (in which it is possible to trigger actions in and around the house on call).

Speech control typically consists of a speech dialog system as human-machine-interface (HMI).
In this connection, a linguistic reaction by the speech dialog system follows on the voice command.
The linguistic reaction can be an acoustic confirmation for the technical execution of an action (for
example that a desired contact is now dialed) or it can be a request to speak the next action step.

A voice command can be a word, a sequence of words or a sentence:

- Within a word input, it can be, for example, a confirmation with "yes" or a rejection with "no".

- Sequences of words as speech input enable to operate more complex electronic devices via speech.
   With their help it is possible to formulate an action step within the operation.
   Typically, voice commands have to be previously learned (except for those being intuitively clear).
   Eventually, two (or more) action steps have to be performed via speech till the desired technical
   action can be executed.

   An example within cell phone speech control would be:
   User> "Call contact".
   System> "Please say the contact name".
   User> "Peter Fisher".
   System> "Calling Peter Fisher"   => as a result, the call is executed afterwards.

   Next example combines two user dialog steps in a single one.
   Benutzer> "Call contact Peter Fisher".
   System> "Calling Peter Fisher"   => as a result, the call is executed afterwards.

- Sentences as speech input enable a relative free formulation of actions. Basically, it should be
   possible to integrate the amount of information in one single linguistic utterance instead of
   using several dialog steps. This interaction form is also called "natural language".

   An example with natural language input within a smartphone speech control would be:
   User> "Please call Peter Fisher from the address book".
   System> "Calling Peter Fisher"   => as a result, the call is executed afterwards.

In order that a voice command can be executed, a so called speech recognizer is necessary.
Speech recognizer's task is to identify words within the acoustic utterance and to extract them
so that the words or rather the utterance itself can be further processed.

There are two approaches within the technical realization:
a) Speech control runs on local level. That is, it is integrated in the electronic device.
     In this connection, a voice command is processed in the device, converted into
     an executable action and then executed.
b) Speech control runs over a server. In this connection, the voice command is sent to a server
     via mobile communication, processed there, converted into an executable action and then
     executed on the device.


Text generation:

Text generation systems facilitate automatically generating a complete new text. Base of such a system mostly is a non-language data source like a table. Text generation systems are normally developed for certain application domains such as automatic generation of weather forecasts.

A text generation system owns a text planing and sentence planing component as well as a knowledge base which contains domain specific knowledge and where data from data source are loaded in. For this reason, the knowledge base owns so called application concepts to which corresponding data from data source are assigned. Application concepts for application domain "weather forecast" could be: wind speed, temperature and air pressure.

Text planing component's task is to describe what text should express (providing the contents) and how text has to express it (providing the means). For this, text planing component accesses knowledge base. Sentence planing component firstly forms content-based sentence descriptions and then corresponding sentences (with help of linguistic knowledge and a domain specific lexicon). Motivation for developing text generation systems is that information in text form is mostly easier to understand than information in a formal structure.


Translation technology:

There are two appoaches in area of translation technology:
- Machine translation and
- computer aided translation

Machine translation systems are systems which can independently translate a given text from one language to any other language.

There are two approaches concerning machine translation:
- The rule-based approach and
- the statistical approach.

Machine translation systems based on rule-based approach (such as transfer approach) analyze each sentence of source text one after another by looking at sentential structures of the underlying sentences. For the source sentence, structures corresponding target language structures are computed and for those corresponding sentences are generated. Machine translation systems based on statistical approach select the most probable translation for each source sentence. To facilitate that, statistical machine translation systems must be trained with huge electronic reference translations. After training, statistical translation system is principally able to translate new sentences.

Machine translation systems are often confronted with ambiguous words. An example for an ambiguous word is the german word "bank" which can be translated in english as "bench" or as "bank" (depending on context). With the help of certain procedures the right context can be determined. The longer and more complex the sentence are to be translated the harder is translation process. Therefore a preparation or a follow-up treatment are often necessarily. In case of application domains which get by on simple sentences and restricted vocabulary (such as technical documentations) so called "controlled language" can be used. Thereby it is ensured that text can be automatically and trouble-free translated.

In contrast to machine translation computer aided translation deals with tools which support a human translator's work. So called translation memory systems belong to this. A translation memory system contains ready and correct translations in form of bilingual sentence pairs. If a human translator wants to translate a sentence then he/she can make look up that sentence in translation memory system. If translation memory finds a translation for the sentence to be translated then found translation will be presented. It is not necessary to find exactly the same sentence in translation memory. Instead of this it is sufficient if only a similiar sentence is found. The degree of similarity is shown for example by a percentage number.

Translation offices use translation memory systems in order to be able to translate a sentence always identically (quality insurance).


Information retrieval:

Purpose of information retrieval systems is to find documents (e.g. in internet) which correspond to criteria of inserted search query consisting of search terms. These search terms are compared with indices of available documents. Indices are terms which describe a document as exact as possible. If the terms in search query are also contained in document (as document indices) then that document is selected as a "hit".

In order to find also documents which do not contain the exact search query terms but also correct alternative terms, so information retrieval systems can be equiped with "linguistic intelligence".

Following an example how a retrieval system with linguistic intelligence works: For german search term "blaues Fahrzeug" (blue vehicle) such a retrieval system can find documents with document indices {"blau", "Fahrzeug"} and {"blaues","Fahrzeug"}, respectively because it can analyze inserted search terms in a morpho-syntactic way. I.e., word form "blau" can be automatically derived from word form "blaues".

A further opportunity for retrieval systems is the use of semantic methods. I.e., for an inserted search term documents with equivalent semantic indices can be found. To make semantic search possible so called thesauri are necessary. In a thesaurus alternative words for a given word are stored. Thus it is possible to find documents containing only indices "car" or "Toyota" for example when search term only contains term "vehicle" instead of this. A combination with morpho-syntactic analysis is possible.


Internet search agents with natural language interface:

A internet search agent is computer program that indepently searches in internet for given information. In course of this it uses search engines and internet databases. The agent analyzes found information, work up that information, provides it and stores it on computer or send it as e-mail to the user. Information to be searched can be given to the agent in the form of an (interrogative) sentence by a natural language interface.

Well suitable for work of search agent with natural language interface is so called Semantic Web ("Web 3.0"). In Semantic Web, information is (semantically) described with special formal description languages. Purpose of Semantic Web is that also computer programs can read and utilize information contained in web sites (and not just humans).

Within the semantic-web, semantic paraphrasing (described as annotation) is added to data in the internet. Additionally, logic conclusions on the annotations can be drawn automatically. This is achieved by storing the annotations and their logic relations in a so called ontology.

An use case of such an internet-based search agent could be a system which searches for certain courses of study. A corresponding natural language query could look as follows:
> Bachelor in chemistry, maximum 100 kilometers away from Stuttgart.
In order to find corresponding offers, semantic web sites are necessary which contain information concerning study and geographic position, therefore: Course of studies, degree, university, location.

For this purpose, concerning internet sites, for example of a university, must be provided as semantic-web and thus also contain annotations. Therefore, the text "chemistry" would be assigned to annotation [course of studies], "Bachelor" would be assigned to annotation [degree] and the name and the location of the university would be assigned correspondingly to the annotations [university] and [location]. If the search agent accesses an ontology in this connection then the search agent could also find courses of study being derived from chemistry such as biochemistry or food chemistry.
Note: The distances between cities could be computed through a separate web-based information source to which current location and the maximum radius are handed over.

Within such a search agent with a natural language interface, each inserted sentence is read out from the natural language interface and given to a natural language processing program. The natural language processing program syntactically analyzes the sentence and computes a formal meaning for it which is appropriate for the semantic-web. After this, the search process is started.


Information extraction:

Information extraction systems are domain specific systems which can analyze texts as regards content. The task of such a system is to find domain specific information in a text, to extract and to store it in a database for further use. For this, information extraction system must provide domain specific concepts which can be assigned to information from text to be analyzed.

An information extraction system must analyze all sentences one after another to compute sentence meanings (sentence semantics) and to assign information from underlying text to corresponding concepts. Only because of computed sentence meanings system is able to decide which sentences contain relevant information.

An example shall explain how a information extraction systems works: An information extraction system for domain "enterprise specific information" analyzes texts which contain information about selling enterprises. For this, following concepts are provided for the information extraction system:
- [selled enterprise]:
- [purchasing enterprise]:
- [purchase price]:
For a given text, corresponding information from the underlying text must be assigned to the concepts defined above. A fictive text could contain following sentence:
- "Smith Inc. was selled for 5 million euros to Miller Inc."
Because of sentence on hand, information extraction system could make following assignment:
- [selled enterprise]: "Smith Inc."
- [purchasing enterprise]: "Miller Inc."
- [purchase price]: "5,000,000 Euro"
Information above extracted from underlying text can now stored in a database. For this purpose, a corresponding table structure must be provided. By storing in a database, further persons can access and work with extracted information/data.


Question-answer systems:

Question-answer systems are systems which generate and print out an answer text for a corresponding natural language question as input. Input normally occurs in written form (keybord). Answer text can as well occur in written form (computer screen) or acoustically (loudspeaker).

Answer text should be detailed and informative. If answer text is printed out by a computer screen then answer text can be combined with additional multimedial elements such as graphic or sound effects. Normally, question-answer-systems are restricted to a given application domain such as medical or technical knowledge.

There are two approaches for question-answer systems:
- Generating a new answer text,
- Answer extraction.

In first case, question-answer system consists of analysis and generation. A given interrogative sentence is handed over to a language processing system which analyzes the interrogative sentence and compares analysis result with a knowledge base. Knowledge bases contains domain knowledge (e.g. technical knowledge). For those knowledge entries fitting to given interrogative sentence a corresponding answert text can be generated by a text generation system. In second case, a fitting text segment (as answer) is seeked in underlying text for a given interrogative sentence. For this, information extraction methods are applied for underlying text. I.e., there are provided concepts which can conceptionally describe sentences occuring in text . If a natural language question fits to certain concepts and their assigned text information then corresponding underlying text segment is presented as answer text.


Automatic text summary:

Automatic text summary systems can automatically build a summary for a given text. Such systems can be used for search in internet (and intranet) for example. Because people do not want to read complete found text document it would provide to generate an automatic summary for each found text document. Therefore each user can easier get a general idea of the text. User of such systems should have opportunity to adjust degree (i.e., from a very short summary to a larger summary).

There are two approaches for automatic summary:
- Knowledge-based approach and
- statistical approach.

One way concerning knowledge-based approach is to separate important from unimportant parts of a text segment. For this, central assertion of each text segment is located. The remaining assertions of considered text segment are only amplifying information. To get a meaningful text summary such a summary (based on described procedure above) must at least contain all central assertions of original text so that content of original text is kept. But summary can also contain additional assertions -- depending on degree of summary. A second way concerning knowledge-based approach is to use an information extraction system. This information extraction system can extract relevant information of underlying text and can assign that information to domain specific concepts. Based on the concepts and the assigned information a text generation system can build a text summary.

In case of statistical approach such sentences are selected for the later summary which contain a high number of "important" words. Which words are important must be determined before by analyzing huge text corpora during development process. The more frequent a word occurs in such a text corpus the more important is that word. If a text summary system is made for domain specific texts then also used text corpora must contain corresponding domain specific texts. Thereby, importance of words correspond to respective domain. (The reason is that a word can be important in one domain but less important in an alternative one).


Spell checking:

Spell checkers are programs which can find possible spelling mistakes, highlight them and correct them independantly. Spell checkers are frequently integral part of existing word processing programs but they also can be separate applications.

A spell checker owns a system lexicon which contains all common words of a language (e.g. English). A simple spell checker runs through a text and compares occuring words in that text with lexicon entries. If spell checker finds a "word" (a sequence of letters) which is not contained in system lexicon then it tries to find a similiar word in its system lexicon. Which word is similiar is determined by some theoretical procedures. But sheer wordwise proceeding is not always sufficient. So, more progressive spell checkers also include context for their work; i.e., they also look at word enviroment (context) of a found spelling mistake.