December 27, 2007

Talking to the Web

A fellow web development aficionado recently asked me a question I commonly receive concerning web accessibility, so I thought I'd share my thoughts here in hope that others might benefit from my ideas (and hopefully expand upon them). Here's the question:

Is there anything in particular in terms of accessibility or even just coding in general that you find to be the most helpful when using the web?

This is obviously a very broad question and to limit its response to a single blog post probably does not do it justice. However, it is indeed a very simple, honest concern that deserves a simple, honest reply, so I'll try my best to offer my advice here. Of course, you should keep in mind that my suggestions are focused on my own experience in accessibility. My vision and hearing are actually quite good, so I'm not as familiar in accessibility concerning those areas. But I can tell you a lot about how speech recognition works as far as web pages are concerned.

I suppose the main suggestion I tend to stress to other web developers about accessibility is to always focus on carefully and completely designing links. Links are, after all, what the web is all about. Always use the a (anchor) element for links, as anchor elements tend to be better supported by things like speech recognition, screen readers, etc. It can be tempting to sometimes utilize other HTML elements with events like onclick to simulate what an anchor element might do, but this route is a surefire way of leading to accessibility problems.

With that thought in mind, it's also a good idea to make sure that all the links on any given web page can be accessed easily by a straightforward command, which is usually the text of each link. Good speech recognition software actually extracts the text from the visible screen so that when you say something, it will try to find the associated anchor element that matches what you say, and it will go there, which makes this an easier task for us web developers than some of us might think. However, there are caveats you need to keep in mind. If there are several links that sound alike, you are given options, generally a numerical list to choose from, instead of going directly to a certain link by voice, which can be a nuisance if there are a number of links that sound alike. This might give you some ideas on what makes a good name for a link, and what might not. So try to be unique when naming each link and, in general, avoid redundancy as much as you can.

Also, for what it's worth, when dealing with graphical links (and certain abbreviated textual links), it has been my experience with Windows Speech Recognition that implementing the title attribute of the anchor element is the most important component for the speech recognizer (at least for WSR). For images, the title attribute does not necessarily have to be identical to the alt attribute of the embedded img (image) element, but it usually makes sense for both to be the same so that the tooltips match whatever the spoken command would be. I'm not sure if this conforms exactly to a specific web accessibility initiative or not, but this process has generally been what I've used in practice. Also, it is best to try to use this title attribute as a simple, non-redundant command that makes sense in the context of what the link does, not necessarily describing exactly what the image is. Just be sure to remember that the link is always more important than the image. Substance before style.

September 29, 2007

Using the On-Screen Keyboard as an Alternative to Typing with a Physical Keyboard

As an individual with a physical disability who touts speech recognition so much, I occasionally get asked how I ever use the computer without having speech recognition available (since I cannot move my arms well enough to operate a standard physical keyboard)? This is a good question, since speech recognition is not one of the most portable tools around. For example, I've never come across a public computer at a library or hotel that was set up with a good microphone and sound card combo, which are necessities for using speech recognition. So, when the necessary hardware is unavailable, that means I have to look for software to simulate it--in this case, the On-Screen Keyboard.

The On-Screen Keyboard is nothing new to Windows; it's been one of the standard accessibility tools for several versions now, not just Vista. It's pretty simple, really, but is extremely useful for users like me who cannot utilize a traditional physical keyboard. Basically, the On-Screen Keyboard application displays a window that looks exactly like a standard keyboard, only it is on the screen. You can select the different keys simply by pointing and clicking with a mouse or other pointing device. So, in truth, there is still a need for hardware since something has to do the pointing, but, generally, using a mouse or other similar device is a lot less strenuous than using a physical keyboard. Believe me, I know. Currently, I can still move my fingers adequately on one hand, so I can still handle a mouse, but a traditional keyboard is just not an option for me.

Interestingly, the On-Screen Keyboard and other similar applications have become a little more mainstream now than they have ever been thanks to the rise of tablet computing. Touchscreen devices like these typically allow writing with your finger or a stylus, but occasionally it may be necessary to find a way to input keys like we would on a normal keyboard. So this means a software keyboard can actually be a necessity for everyone, not just those with disabilities.

How do you actually start the On-Screen Keyboard in Windows? Since it's generally not a tool most will need, Microsoft tends to bury it down several levels in the start menu. (They've done this in every version of Windows I can remember.) In Vista, you can launch the program by mouse as follows:

Click Start > All Programs > Accessories > Ease of Access > On-Screen Keyboard

There you go! You found it! Using the program is pretty straightforward and really doesn't need any explanation. It's simple and does its job, which is all one should expect from a tool like this.

July 6, 2007

Using IE Procedural Surfaces to Generate Icons

Graphics are important to programmers. As much as we'd like to stick to foo and bar as much as possible, there always comes a time when we need to put a little thought into how an interface should look. That often involves graphics.

In most of the applications I've created, the main type of images I've used have been icons. They're small, easy to work with, and help to enhance the visual interface of almost any application. Unfortunately, my computer graphics skills are quite limited, so, when the time comes to needing an icon, I'm usually stuck trying to find a nice affordable (read: free) one on the web.

Lucky for me, not long ago, I happened to discover an excellent site called IconBuffet which provides a number of high-quality icons at no charge to its members. IB is more than just your average icon download site, though. The images are professionally done, and there is an interesting community of users to communicate with when swapping icon sets. Note that you can't get all their icons at once, though. You have to play by IB's rules so-to-speak, where you have a limited number of tokens to buy certain icons, and you can build up points by sharing your set with others. I know; it's definitely a scheme designed to keep users around and active on the site. But in this case, I think it works well, and it's relatively easy to get a hold of the images you want fairly quickly after learning how the site works. It's fun, trust me. And if you sign up with me as the referral, I'll get some nice bonus points. Thanks. Ah, another pyramid scheme...but at least it's free!

Back to the point. Sorry about the spiel, but it does bring me to the topic at hand. What happens if you still cannot find the simple icon you're looking for and you're not a computer graphics whiz? Well, you can make your own using some pretty simple tools that almost everyone owns: Notepad, Internet Explorer, and Paint.

Basically, you can make a really nice text-based icon by utilizing some cool DirectX visual extensions in IE with just a little bit of code:

<!-- saved from url=(0014)about:internet -->
<style type="text/css">
#myicon
{
width : 48px;
height : 48px;
font-size : 24px;
text-align : center;
line-height : 48px;
background-color : #00f;
color : #fff;
font-family : sans-serif;
font-weight : bold;
filter : progid:DXImageTransform.Microsoft.Gradient(
gradientType=1,
startColorStr=#bf0000,
endColorStr=#00007f);
}
</style>
<div id="myicon">US</div>




US


You'll see that the code is pretty straightforward. Just a div element with some styles applied. Pretty standard HTML/CSS. The one thing that might seem strange to you is the IE visual filter style attribute. This is what applies the funky gradient background.

And now that you've got your image displayed in the IE browser, all you have to do is hit <print-screen> on the keyboard to copy the image to the clipboard and then paste the image into Paint. From there, you probably need to crop off the rest of the window surrounding the icon that also got copied to the clipboard, but that's pretty simple if you've ever used Paint before. Once you finally have your icon cropped to the 48 pixel size you specified, all you have to do is save the image in the file format you would like, and you're done. Voila, you're an artist!

June 24, 2007

Shoot Ghosts with Windows Speech Recognition

Sorry about the lengthy blogging hiatus. I've been extremely busy at work and just have not found the time to spend on fun things like my blog. I know that's a lame excuse, so I'll give you another one. In what little free time I've managed to find, I've actually been playing a game. :-) And, guess what, I've been using Windows Speech Recognition to help me win.

What game have I been playing, you ask? Well, my current game of choice happens to be Desktop Tower Defense, a relatively simple but strategically complex game. In fact, I would have never known about it without reading Text Services Framework guru Eric Brown's blog. Thanks, Eric! Now, I'm addicted, too.

The object of this free Flash-based game is pretty simple. Shoot all the little ghosts before they escape the maze of towers that you create. It sounds simple enough, but it can get extremely difficult as the game progresses. In fact, a lot of the challenge involves managing and upgrading your maze of towers when the screen is already littered with ghosts. I soon discovered that just using the mouse to control the game was not enough. Timing is of the essence during the more difficult levels. I needed to find a way to issue the accompanying keyboard commands so I could keep the mouse on the playing board at all times; otherwise, I was doomed.

So what does a guy like me do who has virtually no movement in his left hand and is stuck using the mouse with his right hand? It's time for Windows Speech Recognition to save the day again! By turning on typing mode by saying Start typing, I was able to quickly adjust all the common game play elements by voice, saying Number one for pellet towers, Number two for squirt towers, and so on. You get the picture. There's no need to touch the keyboard when Windows Speech Recognition can do it for you. :-) Very cool, and very fun!

March 8, 2007

Coding "Hello, World" with Windows Speech Recognition

There's a great article out on the Windows Vista beta experience portal showcasing Windows Speech Recognition by Richard Costall entitled "Look, no hands". I especially liked his demonstration of using Visual Studio 2005 via speech. In it, he points out several frustrations that I have also experienced using the program, but he proves that there are many excellent features in Windows Speech Recognition that can be used to sidestep some of Visual Studio's accessibility issues. In fact, he highlights the use of the Start Typing command as his means of actually coding the obligatory "Hello, World" application that he is demonstrating. If you're interested in more details on using the Start Typing command, be sure to take a look at my earlier post as well.

March 5, 2007

Using a Simple File Mutex to Integrate Complex Disparate Applications

As a Windows developer, one of the most challenging (and fun) tasks I sometimes get to do is integrating multiple disparate applications so they can communicate with one another. Most of the time this involves blending a .NET application with an automation application and doing the regular COM interop one would expect. However, occasionally something more interesting comes along. One such occasion popped up recently for me, as I had to design and implement a solution to integrate our mainframe terminal emulation client with an intranet web application.

The main goal of this particular project was to provide a way to pre-fill an existing web form application with data from a mainframe screen. Normally, the user would have to read the appropriate data from the mainframe screen, manually enter the data into the web form, submit the form, and return to the mainframe terminal and key in a simple log entry indicating that the process was complete. Obviously, due to the nature of jumping back and forth between applications and doing data entry, this was a very tedious and error-prone activity.

To understand the scenario a bit better, it's critical to know how the two systems work. The existing web application which does most of the business logic in this entire process is just a traditional dynamic web application; it works pretty much like any other form-based web application you have seen before. On the other hand, our terminal emulation environment may seem a bit unusual to some of you; it is basically a Windows client program connected via our internal network to our legacy system. From a software development perspective, this emulator is more than just a typical dumb terminal; it also provides a handy automation environment in which we can produce macros for screen-scraping data, automating keyboard commands, and executing other Windows-related tasks. However, it's important to note that it is definitely not a web interface by any means, so integrating it with web applications is not entirely straightforward.

In order to get these two applications to work together, I needed to devise a way for them to communicate with each other in a very simple manner. So, like any interop activity, I had to determine the level at which both programs could communicate. Since both environments are quite different from each other, I had to go down to the level where they were both in common—the operating system, in particular, the file system. Basically, both environments have the ability to read and write files, so I opted to use a simple mechanism, which I will call a file mutex, to communicate between the two applications.

Here's the complete process flow that I came up with:

  1. The user clicks a button on the appropriate screen of the terminal emulation client to initiate an automation macro.
  2. The automation macro creates a new file mutex simply by creating a uniquely named temp file on the file system.
  3. The automation macro kicks off a new Internet Explorer session via a shell command, passing all the screen-scraped data and the mutex identifier (the temp file path) in the query string.
  4. The automation macro is then temporarily put to sleep, while it waits for the mutex to be released.
  5. The user completes the Internet Explorer web form and clicks submit, passing the mutex identifier yet again in a hidden form field to the associated confirmation page for this web form.
  6. Internet Explorer displays a successful confirmation screen and releases the mutex by deleting the temp file. (A custom client-side ActiveX object had to be installed in Internet Explorer at this stage to perform the file deletion since this is out of the scope of JavaScript.)
  7. While waiting, the automation macro now sees that the mutex has been released and can then automatically write the appropriate log entry on the mainframe system.

And that's all there is to it! In case you're curious about the ActiveX object in step six, it is a very simple custom VB6 automation object (only ten lines of code) that deletes the temp file passed to it as an argument. Nothing fancy there. But that's the whole point.

March 3, 2007

"Start Typing" with Windows Speech Recognition

As a software developer with a physical disability that makes using a keyboard practically impossible for me, one of the most important capabilities of speech recognition that I always look for is keyboard emulation.  And by keyboard emulation, I’m not talking about entering a bunch of common words and phrases like I’m doing while writing this article.  This is called dictation.  Rather, I’m referring strictly to the ability to key short (or not-so-short) sequences of characters and/or key combinations like myVariableName or myFile.doc.  Words like these aren’t easily understood by the built-in speech recognition dictation engine because they are not in any dictionaries I know of (nor should they be), so another speech recognition mechanism is needed.  This is called typing.

Vista’s speech recognition tutorial and the what can I say Windows help documents suggest one good way to type single keyboard keys—Press X.  For example, you can say Press a to type the letter a, and you can say Press b to type the letter b.  To improve accuracy, you can even say something like Press a as in apple to key the character a in case Windows Speech Recognition is having problems with your short single letter utterances.

This method works perfectly well and is indeed the best way to key a single character.  However, using this command over and over to type multi-character sequences is quite tedious and inefficient.  The main reason it is so slow to do this is the nature of it behaving like any other command; you must pause immediately before and after saying each Press command in order for it to process correctly.  Imagine spelling myVariableName with Press m (pause) Press y (pause) Press Capital v (pause) Press a (pause) Press r (pause)…  You get the picture.  Luckily, there is another way.

What should you say?  To enter a special typing mode, you can say Start Typing, and to leave this mode, you can say Stop Typing.  While in this special mode, you cannot dictate words and you cannot do most of the command-and-control features available in the standard mode.  It’s geared for typing—no more, no less.

What’s great about it is that you can key long sequences of characters with minimal pausing, which is a huge performance boost if you do this frequently like I do.  For example, you can say Start Typing (pause) m y (pause) Shift v a r i a b l e (pause) Shift n a m e (pause) Stop Typing (pause) to type myVariableName.  Sure, it doesn’t beat ten agile fingers pounding on a keyboard, but some of us (and some devices) don’t have that luxury.

To improve your typing accuracy, I strongly recommend that you learn the NATO phonetic alphabet (alpha, bravo, charlie, and so on).  Windows Speech Recognition properly interprets these code words into their corresponding characters when you’re typing.  I use the phonetic alphabet all the time when typing because it allows me to achieve near perfect typing accuracy.  So to say myFile.doc, I would recommend saying Start Typing (pause) mike yankee (pause) Shift foxtrot india lima echo dot delta oscar charlie (pause) Stop Typing (pause).  It looks like a mouthful, but it’s really not all that difficult once you get used to it.

Not to confuse the issue, but using the NATO phonetic alphabet also makes the Press command much more useful, as using it makes it capable of effectively entering short multi-character sequences as well.  To say http, you can speak Press hotel tango tango papa.

As always, the best way to really learn how to type effectively using Windows Speech Recognition is by actually practicing doing it, so I’ll leave you with a list of the characters you’ll use most often when typing and their phonetic alphabet equivalents.

CharacterNATO Phonetic Alphabet
Code Word
aalpha
bbravo
ccharlie
ddelta
eecho
ffoxtrot
ggolf
hhotel
iindia
jjuliet
kkilo
llima
mmike
nnovember
ooscar
ppapa
qquebec
rromeo
ssierra
ttango
uuniform
vvictor
wwhiskey
xxray
yyankee
zzulu
0number zero
1number one
2number two
3number three
4number four
5number five
6number six
7number seven
8number eight
9number nine

January 29, 2007

Microsoft Is Listening: Vista Speech Recognition Is Worth Talking About

As a professional programmer who also happens to be afflicted with spinal muscular atrophy (a severe neuromuscular disorder), PC accessibility is of paramount importance to me. Accessibility (or the lack of it) directly influences how efficiently I am able to work, which invariably influences my bottom line. More than that, it affects my state of mind. Being able to click that little red 'x' to close a window on your desktop may seem easy to most of you, but it can become quite tiresome or perhaps even be impossible to do for many users with disabilities. So when the world's most influential software maker introduces a new or updated accessibility feature, I take notice. And after test driving Windows Vista's speech recognition engine, it most certainly opened my eyes, er mouth!

Why am I so excited? Well, for one, speech recognition has finally become a first-class citizen in Windows. Before Vista, speech recognition was never installed by default in Windows (and for good reason). It used to only be effective in a very limited number of scenarios, like dictating in Microsoft Word, but, now, it is useful almost everywhere. Why is that? The short answer: It's truly integrated in the OS, which gives it much more power than ever before. The long answer: Nearly all Windows controls (text boxes, dropdown lists, menus, etc.) are now interfacing with the new Text Services Framework, but you can learn the details elsewhere from the experts.

So what does all of this really mean? Now, I can surf the web by voice without touching a mouse; I can click a point on the screen by speech alone; and I can dictate this article without typing on a keyboard. Pretty cool!

Of course, all of this has largely been available before in third-party applications, like Nuance Dragon NaturallySpeaking (DNS), but, in my opinion, never so elegantly and effectively with the entire user experience. Just try comparing Windows Speech Recognition and DNS when surfing the web in Internet Explorer or finding a file in Windows Explorer, and you'll quickly understand what I mean.

Windows Speech Recognition (WSR) still has room for improvement. One significant shortcoming of WSR is that there is no macro support yet. Also, my dictation is still more accurate in DNS, but the difference is minimal, and, with more use, WSR may very well eliminate that gap. Command-and-control is significantly superior with WSR, though, and the price is right (it's included in the OS). All in all, the speech recognition competition will definitely benefit consumers.

I, for one, am appreciative of all of Microsoft's effort put into speech recognition and am grateful it has become a mainstream feature in Windows. Indeed, I may have actually experienced a genuine "wow" moment because of it. ;-)