March 8, 2007

Coding "Hello, World" with Windows Speech Recognition

There's a great article out on the Windows Vista beta experience portal showcasing Windows Speech Recognition by Richard Costall entitled "Look, no hands". I especially liked his demonstration of using Visual Studio 2005 via speech. In it, he points out several frustrations that I have also experienced using the program, but he proves that there are many excellent features in Windows Speech Recognition that can be used to sidestep some of Visual Studio's accessibility issues. In fact, he highlights the use of the Start Typing command as his means of actually coding the obligatory "Hello, World" application that he is demonstrating. If you're interested in more details on using the Start Typing command, be sure to take a look at my earlier post as well.

March 5, 2007

Using a Simple File Mutex to Integrate Complex Disparate Applications

As a Windows developer, one of the most challenging (and fun) tasks I sometimes get to do is integrating multiple disparate applications so they can communicate with one another. Most of the time this involves blending a .NET application with an automation application and doing the regular COM interop one would expect. However, occasionally something more interesting comes along. One such occasion popped up recently for me, as I had to design and implement a solution to integrate our mainframe terminal emulation client with an intranet web application.

The main goal of this particular project was to provide a way to pre-fill an existing web form application with data from a mainframe screen. Normally, the user would have to read the appropriate data from the mainframe screen, manually enter the data into the web form, submit the form, and return to the mainframe terminal and key in a simple log entry indicating that the process was complete. Obviously, due to the nature of jumping back and forth between applications and doing data entry, this was a very tedious and error-prone activity.

To understand the scenario a bit better, it's critical to know how the two systems work. The existing web application which does most of the business logic in this entire process is just a traditional dynamic web application; it works pretty much like any other form-based web application you have seen before. On the other hand, our terminal emulation environment may seem a bit unusual to some of you; it is basically a Windows client program connected via our internal network to our legacy system. From a software development perspective, this emulator is more than just a typical dumb terminal; it also provides a handy automation environment in which we can produce macros for screen-scraping data, automating keyboard commands, and executing other Windows-related tasks. However, it's important to note that it is definitely not a web interface by any means, so integrating it with web applications is not entirely straightforward.

In order to get these two applications to work together, I needed to devise a way for them to communicate with each other in a very simple manner. So, like any interop activity, I had to determine the level at which both programs could communicate. Since both environments are quite different from each other, I had to go down to the level where they were both in common—the operating system, in particular, the file system. Basically, both environments have the ability to read and write files, so I opted to use a simple mechanism, which I will call a file mutex, to communicate between the two applications.

Here's the complete process flow that I came up with:

  1. The user clicks a button on the appropriate screen of the terminal emulation client to initiate an automation macro.
  2. The automation macro creates a new file mutex simply by creating a uniquely named temp file on the file system.
  3. The automation macro kicks off a new Internet Explorer session via a shell command, passing all the screen-scraped data and the mutex identifier (the temp file path) in the query string.
  4. The automation macro is then temporarily put to sleep, while it waits for the mutex to be released.
  5. The user completes the Internet Explorer web form and clicks submit, passing the mutex identifier yet again in a hidden form field to the associated confirmation page for this web form.
  6. Internet Explorer displays a successful confirmation screen and releases the mutex by deleting the temp file. (A custom client-side ActiveX object had to be installed in Internet Explorer at this stage to perform the file deletion since this is out of the scope of JavaScript.)
  7. While waiting, the automation macro now sees that the mutex has been released and can then automatically write the appropriate log entry on the mainframe system.

And that's all there is to it! In case you're curious about the ActiveX object in step six, it is a very simple custom VB6 automation object (only ten lines of code) that deletes the temp file passed to it as an argument. Nothing fancy there. But that's the whole point.

March 3, 2007

"Start Typing" with Windows Speech Recognition

As a software developer with a physical disability that makes using a keyboard practically impossible for me, one of the most important capabilities of speech recognition that I always look for is keyboard emulation.  And by keyboard emulation, I’m not talking about entering a bunch of common words and phrases like I’m doing while writing this article.  This is called dictation.  Rather, I’m referring strictly to the ability to key short (or not-so-short) sequences of characters and/or key combinations like myVariableName or myFile.doc.  Words like these aren’t easily understood by the built-in speech recognition dictation engine because they are not in any dictionaries I know of (nor should they be), so another speech recognition mechanism is needed.  This is called typing.

Vista’s speech recognition tutorial and the what can I say Windows help documents suggest one good way to type single keyboard keys—Press X.  For example, you can say Press a to type the letter a, and you can say Press b to type the letter b.  To improve accuracy, you can even say something like Press a as in apple to key the character a in case Windows Speech Recognition is having problems with your short single letter utterances.

This method works perfectly well and is indeed the best way to key a single character.  However, using this command over and over to type multi-character sequences is quite tedious and inefficient.  The main reason it is so slow to do this is the nature of it behaving like any other command; you must pause immediately before and after saying each Press command in order for it to process correctly.  Imagine spelling myVariableName with Press m (pause) Press y (pause) Press Capital v (pause) Press a (pause) Press r (pause)…  You get the picture.  Luckily, there is another way.

What should you say?  To enter a special typing mode, you can say Start Typing, and to leave this mode, you can say Stop Typing.  While in this special mode, you cannot dictate words and you cannot do most of the command-and-control features available in the standard mode.  It’s geared for typing—no more, no less.

What’s great about it is that you can key long sequences of characters with minimal pausing, which is a huge performance boost if you do this frequently like I do.  For example, you can say Start Typing (pause) m y (pause) Shift v a r i a b l e (pause) Shift n a m e (pause) Stop Typing (pause) to type myVariableName.  Sure, it doesn’t beat ten agile fingers pounding on a keyboard, but some of us (and some devices) don’t have that luxury.

To improve your typing accuracy, I strongly recommend that you learn the NATO phonetic alphabet (alpha, bravo, charlie, and so on).  Windows Speech Recognition properly interprets these code words into their corresponding characters when you’re typing.  I use the phonetic alphabet all the time when typing because it allows me to achieve near perfect typing accuracy.  So to say myFile.doc, I would recommend saying Start Typing (pause) mike yankee (pause) Shift foxtrot india lima echo dot delta oscar charlie (pause) Stop Typing (pause).  It looks like a mouthful, but it’s really not all that difficult once you get used to it.

Not to confuse the issue, but using the NATO phonetic alphabet also makes the Press command much more useful, as using it makes it capable of effectively entering short multi-character sequences as well.  To say http, you can speak Press hotel tango tango papa.

As always, the best way to really learn how to type effectively using Windows Speech Recognition is by actually practicing doing it, so I’ll leave you with a list of the characters you’ll use most often when typing and their phonetic alphabet equivalents.

CharacterNATO Phonetic Alphabet
Code Word
0number zero
1number one
2number two
3number three
4number four
5number five
6number six
7number seven
8number eight
9number nine