Windows speech recognition

The illustrative real-life case of the principle "if a person has control over any function, it can also be used to control the computer": Windows speech recognition


Over the past decades, there has been significant progress in the development of the IT field and at the moment a brain-assisted computer controller is being developed, which will one day become a reality. But now we can talk about things that already exist, that is, about controlling a computer with the help of a basic human function - voice. Not everyone knows that the Windows operating system has a certain feature which is speech recognition. This means that a person can communicate with Windows to issue commands, open applications, dictate text and perform other tasks. Further, in this blog post, this feature will be discussed in more detail.

 

 

Windows Speech Recognition (WSR) is a speech recognition developed by Microsoft for Windows Vista that allows voice commands to control the desktop user interface; dictate text in electronic documents and e-mail; browse websites; perform keyboard shortcuts and control the mouse cursor. It supports custom macros to perform additional or auxiliary tasks. WSR is a locally processed speech recognition platform that does not rely on cloud computing for accuracy, dictation, or recognition, but adapts based on contexts, grammars, speech patterns, learning activities, and vocabularies. It provides a personal dictionary that allows users to include or exclude words or expressions from dictation and record the pronunciation to improve recognition accuracy.

 

WSR allows a user to control applications and the Windows desktop user interface through voice commands. Users can dictate text within documents, email, and forms, control the operating system user interface, perform keyboard shortcuts and move the mouse cursor. The majority of integrated applications in Windows Vista can be controlled; third-party applications must support the Text Services Framework for dictation. English (U.S.), English (U.K.), French, German, Japanese, Mandarin Chinese and Spanish are supported languages.

 

 

When started for the first time, WSR presents a microphone setup wizard and an optional interactive step-by-step tutorial that users can commence to learn basic commands while adapting the recognizer to their specific voice characteristics. It is really important to configure the microphone properly at this stage. The accuracy of the sound recognizer increases through regular use, which adapts it to contexts, grammar, patterns, and vocabularies. Custom language models for the specific contexts, phonetics, and terminologies of users in particular occupational fields such as legal, medical and IT are also supported. 

 

Below there are presented the most common use commands to control the computer via speech.

 

Dictation commands: "New line"; "New paragraph"; "Tab"; "Literal word"; "Numeral number"; "Go to word"; "Go after word"; "No space"; "Go to start of sentence"; "Go to end of a sentence"; "Go to start of paragraph"; "Go to end of paragraph"; "Go to start of document" "Go to end of document"; "Go to field name" (e.g., go to address, cc, or subject). 

 

Special characters such as a comma are dictated by speaking the name of the special character.

 

Navigation commands:

Keyboard shortcuts: "Press keyboard key"; "Press Shift plus a"; "Press capital b."

Keys that can be pressed without first giving the press command include: Backspace, Delete, End, Enter, Home, Page Down, Page Up, and Tab.

 

Mouse commands: "Click"; "Click that"; "Double-click"; "Double-click that"; "Mark"; "Mark that"; "Right-click"; "Right-click that"; "MouseGrid".

 

Window management commands: "Close (alternatively maximize, minimize or restore) window"; "Close that"; "Close name of open application"; "Switch applications"; "Switch to name of open application"; "Scroll direction"; "Scroll direction in number of pages"; "Show desktop"; "Show Numbers."

 

Speech recognition commands: "Start listening"; "Stop listening"; "Show speech options"; "Open speech dictionary"; "Move speech recognition"; "Minimize speech recognition"; "Restore speech recognition".

 

 In the English language, applicable commands can be shown by speaking "What can I say?"Users can also query the recognizer about tasks in Windows by speaking "How do I task name" (e.g., "How do I install a printer?") which opens related help documentation.

 

 

Resources:

https://en.wikipedia.org/wiki/Windows_Speech_Recognition


Комментарии

Популярные сообщения из этого блога

Three interesting examples of IT solutions from different decades

An illustrative example of ethical theory