Chapter 19

Creating SAPI Applications with C++


CONTENTS


Most of the things you'll want to add to Windows applications can be handled using the OLE automation libraries for speech recognition (VCMDAUTO.TLB) and text-to-speech (VTXTAUTO.TLB). These libraries can be called from Visual Basic or any VBA-compliant application such as Excel, Access, Word, and so on. However, there are some things that you can only do using C or C++ languages. This includes building permanently stored grammars, calling SAPI dialog boxes, and other low-level functions. For this reason, it is a good idea to know how to perform some basic SAPI functions in C. Even if you do not regularly program in C or C++, you can still learn a lot by reading through the code in these examples.

In this chapter, you'll get a quick tour of a TTS demonstration application and an SR demonstration application. The applications reviewed here are part of the Microsoft Speech SDK. If you have the Speech SDK installed on your workstation, you can follow along with the code and compile and test the resulting application. If you do not have the SDK installed or do not have a copy of C++ with which to compile the application, you can still follow along with the review of the various functions used in the C++ programs.

When you are finished with this chapter, you will know how to build simple TTS and SR programs using C++ and the Microsoft Speech SDK.

Note
All the examples in this chapter are shipped with the Microsoft Speech SDK. The compiler used in this chapter is Microsoft Visual C++ 4.1 (VC++). If you are using another compiler, you may need to modify some of the code in order for it to work for you.

The TTS Demo Project

Creating a TTS application with C++ involves just a few basic steps. To see how this is done, you can look at the TTSDEMO.CPP source code that ships with the Microsoft Speech SDK. You can find this file in the \SPEEchSDK\SAMPLES\TTSDEMO folder that was created when you installed the Microsoft Speech SDK.

Note
If you do not have the Microsoft Speech SDK installed or do not have Visual C++, you can still follow along with the code examples shown in this chapter.

The rest of this section reviews the contents of the TTSDEMO.MAK project. There are two components to review: the DEMO.CPP source code file and the DEMO.RC file. The DEMO.CPP source code file contains all the code needed to implement TTS services for Windows using C++. The DEMO.RC file contains a simple dialog box that you can use to accept text input from the user and then send that text to the TTS engine for playback.

The Initial Header and Declarations for DEMO.CPP

Before you code the various routines for the TTS demo, there are a handful of include statements and a couple of global declarations that must be added. Listing 19.1 shows the include statements needed to implement TTS services in VC++.


Listing 19.1. The include statements and global declarations for DEMO.CPP.
/***********************************************************************
Demo.Cpp - Code to demo tts.

Copyright c. 1995 by Microsoft Corporation

*/


#include <windows.h>
#include <string.h>
#include <stdio.h>
#include <mmsystem.h>
#include <initguid.h>
#include <objbase.h>
#include <objerror.h>
#include <ole2ver.h>

#include <speech.h>
#include "resource.h"


/*************************************************************************
Globals */

HINSTAncE         ghInstance;                // instance handle
PITTSCENTRALW     gpITTSCentral = NULL;

Most of the include files are part of VC++. The speech.h header file is shipped as part of the Microsoft Speech SDK. And the resource.h header file is created when you build the dialog box for the project.

The WinMain Procedure of DEMO.CPP

Since the SAPI model is implemented using the Component Object Model (COM) interface, you need to begin and end an OLE session as part of normal processing. After starting the OLE session, you need to use the TTSCentral interface to locate and initialize an available TTS engine. Once you have a session started with a valid TTS engine, you can use a simple dialog box to accept text input and send that text to the TTS engine using the TextData method of the TTSCentral interface. After exiting the dialog box, you need to release the connection to TTSCentral and then end the OLE session.

The code in Listing 19.2 shows the WinMain procedure for the TTSDEMO.CPP file.


Listing 19.2. The WinMain procedure of the TTSDEMO project.
/*************************************************************************
winmain - Windows main code.
*/

int PASCAL WinMain(HINSTAncE hInstance, HINSTAncE hPrevInstance,
                   LPSTR lpszCmdLine, int nCmdShow)
{
TTSMODEINFOW   ModeInfo;

// try to begin ole

   if (!BeginOLE())
      {
      MessageBox (NULL, "Can't create OLE.", NULL, MB_OK);
      return 1;
      }

// find the right object
   memset (&ModeInfo, 0, sizeof(ModeInfo));
   gpITTSCentral = FindAndSelect (&ModeInfo);
   if (!gpITTSCentral) {
      MessageBox (NULL, "Can't create a TTS engine.", NULL, MB_OK);
      return 1;
      };

// Bring up the dialog box
   DialogBox (hInstance, MAKEINTRESOURCE(IDD_TTS),
      NULL, (FARPROC) DialogProc);

// try to close ole
   gpITTSCentral->Release();

   if (!EndOLE())
      MessageBox (NULL, "Can't shut down OLE.", NULL, MB_OK);

   return 0;
}

You can see the basic steps mentioned earlier: start the OLE session, get a TTS object, and start the dialog box. Once the dialog is completed, you need to release the TTS object and end the OLE session. The rest of the code all supports the code in WinMain.

Starting and Ending the OLE Session

The code needed to start and end the OLE session is pretty basic. The code in Listing 19.3 shows both the BeginOLE and EndOLE procedures.


Listing 19.3. The BeginOLE and EndOLE procedures.
/*************************************************************************
BeginOLE - This begins the OLE.

inputs
   none
returns
   BOOL - TRUE if is succedes
*/

BOOL BeginOLE (void)
{
   DWORD    dwVer;

// Initialize OLE

   SetMessageQueue(96);
   dwVer = CoBuildVersion();

   if (rmm != HIWORD(dwVer)) return FALSE;         // error

   if (FAILED(CoInitialize(NULL))) return FALSE;

   return TRUE;
}


/*************************************************************************
EndOLE - This closes up the OLE.

inputs
   none
returns
   BOOL - TRUE if succede
*/

BOOL EndOLE (void)
{
// Free up all of OLE

   CoUninitialize ();

   return TRUE;
}

The code in Listing 19.3 shows three steps in the BeginOLE routine. The first line creates a message queue that can hold up to 96 messages. The next two lines check the OLE version, and the last line actually initializes the OLE session. The EndOLE session simply releases the OLE session you created in BeginOLE.

Selecting the TTS Engine Object

After creating the OLE session, you need to locate and select a valid TTS engine object. There are just a few steps to the process. First, you need a few local variables to keep track of your progress. Listing 19.4 shows the initial declarations for the FindAndSelect procedure.


Listing 19.4. Declarations for the FindAndSelect procedure.
/*************************************************************************
FindAndSelect - This finds and selects according to the specific TTSMODEINFOW.

inputs
   PTTSMODEINFOW     pTTSInfo - desired mode
returns
   PITTSCENTRAL - ISRCentral interface to TTS engine
sets:

*/

PITTSCENTRALW FindAndSelect (PTTSMODEINFOW pTTSInfo)
{
   HRESULT        hRes;
   TTSMODEINFOW        ttsResult;     // final result
   WchAR          Zero = 0;
   PITTSFINDW      pITTSFind;         // find interface
   PIAUDIOMULTIMEDIADEVICE    pIAMM;  // multimedia device interface for audio-dest
   PITTSCENTRALW  pITTSCentral;       // central interface

Next, you need to create an instance of the TTSFind object. This will be used to select an available TTS engine. Listing 19.5 shows how this is done.


Listing 19.5. Creating the TTSFind object.
hRes = CoCreateInstance(CLSID_TTSEnumerator, NULL, CLSCTX_ALL, IID_ITTSFindW,
                                          &nbs p;        (void**)&pITTSFind);
   if (FAILED(hRes)) return NULL;

   hRes = pITTSFind->Find(pTTSInfo, NULL, &ttsResult);

   if (hRes)
      {
      pITTSFind->Release();
      return NULL;     // error
      }

The next step is to locate and select an audio output object. This will be used by the TTS engine for playback of the text. Listing 19.6 shows the code needed to select an available audio device.


Listing 19.6. Selecting an available audio device.
// Get the audio dest
   hRes = CoCreateInstance(CLSID_MMAudioDest, NULL, CLSCTX_ALL, ÂIID_IAudioMultiMediaDevice,(void**)&pIAMM);
   if (hRes)
      {
      pITTSFind->Release;
      return NULL;     // error
      }
    pIAMM->DeviceNumSet (WAVE_MappeR);

The code in Listing 19.6 uses the DeviceNumSet method of the MMAudioDest interface to find the available WAVE output device for the pc.

Once you have successfully created the TSFind object and the MMAudioDest object, you're ready to use the Select method of TTSFind to return a handle to a valid TTS engine object. After getting the handle, you can release the TTSFind object because it was needed only to locate a valid TTS engine object. Listing 19.7 shows how this is done.


Listing 19.7. Selecting the TTS engine and releasing TTSFind.
// Pass off the multi-media-device interface as an IUnknown (since it is one)

// Should do select now

   hRes = pITTSFind->Select(ttsResult.gModeID, &pITTSCentral, (LPUNKNOWN) pIAMM);

   if (hRes) {
      pITTSFind->Release();
      return NULL;
      };

// free random stuff up

   pITTSFind->Release();

   return pITTSCentral;
}

After getting a valid TTS engine and a valid audio output, you can start sending text to the TTS engine using the TextData method of the TTSCentral interface.

Sending Text to the TTS Engine

The TTSDEMO project uses a simple dialog box to accept text from the user and send it to the TTS engine. Figure 19.1 shows the dialog box in design mode.

Figure 19.1 : The dialog box from the TTSDEMO project.

This dialog box has a single text window and two command buttons-Speak and Exit. When the user presses the OK button, the text typed into the window is sent to the TTS engine for playback. The code in Listing 19.8 shows the CallBack routine that handles the dialog box events.


Listing 19.8. Handling the dialog box callback.
/*************************************************************************
DialogProc
*/
BOOL CALLBACK DialogProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
switch (uMsg) {
   case WM_COMMAND:
         switch (LOWORD(wParam))
            {
            case IDOK:
               {
               char  szSpeak[1024];
               WchAR wszSpeak[1024];
               SDATA data;

               // Speak
               GetDlgItemText (hWnd, IDC_EDIT, szSpeak, sizeof(szSpeak));
               data.dwSize = (DWORD)
                  MultiByteToWideChar(CP_ACP, 0, szSpeak, -1, wszSpeak,
                  sizeof(wszSpeak) / sizeof(WchAR)) * sizeof(WchAR);
               data.pData = wszSpeak;
               gpITTSCentral->TextData (chARSET_TEXT, 0,
                  data, NULL,
                  IID_ITTSBufNotifySinkW);
               }
               return TRUE;
            case IDCAncEL:
               EndDialog (hWnd, IDCAncEL);
               return TRUE;
            }
      break;
   };

return FALSE;  // didn't handle
}

Note that there are a couple of extra steps involved in sending the text to the TTS engine. The whole process involves filling an SDATA structure to pass to the TTS engine. First, the GetDlgItemText copies the text in the window into a memory location. The size of the string is inserted into the SDATA structure. Next, the MultiByteToWideChar function is used to convert the data to wide character format, and the results are loaded into the SDATA structure. Finally, the structure is passed to the TTS engine using the TextData method of the TTSCentral interface.

That's all there is to it. You can test the application by compiling it yourself or by loading the TTSDEMO.EXE application from the CD-ROM that ships with this book.

The VCMD Demo Project

Building SR applications in C++ is not much different. The biggest change is that you need to create a menu object and load it with commands that will be spoken to the SR engine. In fact, the process of loading the commands and then checking the spoken phrase against the command list is the largest part of the code.

The C++ example reviewed here is part of the Microsoft Speech SDK. The project VCMDDEMO.MAK is installed when you install the SDK. If you have the SDK installed and own a copy of C++, you can load the project and review the source code while you read this chapter.

Note
If you do not have a copy of the SDK or C++, you can still learn a lot by reviewing the code shown here.

There are two main parts to the project. The first is the DEMO.CPP source code. This file contains the main C++ code for the project. The second part of the project is the VCMDDEMO.RC resource file. This file contains the definitions of two dialog boxes used in the project.

The Initial Header and Declarations for the VCMDDEMO Project

The first thing that must be done is to add the include and declaration statements to the source code. These will be used throughout the project. Most of the include files are a part of the VC++ system. However, the last two items (speech.h and resource.h) are unique. The speech.h file ships with the SDK. The resource.h file contains information about the two dialog boxes used in the project. The define statements establish some constant values that will be used to track timer events later in the project. Listing 19.9 shows the header and include code for the project.


Listing 19.9. The include and header code for the VCMDDEMO project.
/***********************************************************************
Demo.Cpp - Code to quickly demo voice commands.

Copyright c. 1995 by Microsoft Corporation

*/


#include <windows.h>
#include <string.h>
#include <stdio.h>
#include <mmsystem.h>
#include <initguid.h>
#include <objbase.h>
#include <objerror.h>
#include <ole2ver.h>

#include <speech.h>
#include "resource.h"

#define  TIMER_chANGECOMMAND     (52)
#define  TIMER_CLEARRESULT       (53)

The VCMDDEMO project uses notification callbacks to receive messages from the SR engine. Since the SAPI system is based on the Component Object Model (COM), you'll need to include some code that creates the needed class object and associated methods. Listing 19.10 shows the code needed to declare the class and methods of the event notification routine.


Listing 19.10. Declaring the event notification class and methods.
// Voice Command notifications
class CIVCmdNotifySink : public IVCmdNotifySink {
    private:
    DWORD   m_dwMsgCnt;
    HWND    m_hwnd;

    public:
    CIVCmdNotifySink(void);
    ~CIVCmdNotifySink(void);


    // IUnknown members that delegate to m_punkOuter
    // Non-delegating object IUnknown
    STDMETHODIMP         QueryInterface (REFIID, LPVOID FAR *);
    STDMETHODIMP_(ULONG) AddRef(void);
    STDMETHODIMP_(ULONG) Release(void);

    // IVCmdNotifySink members
    STDMETHODIMP CommandRecognize (DWORD, PVCMDNAME, DWORD, DWORD, PVOID,
                                   DWORD,PSTR, PSTR);
    STDMETHODIMP CommandOther     (PVCMDNAME, PSTR);
    STDMETHODIMP MenuActivate     (PVCMDNAME, BOOL);
    STDMETHODIMP UtteranceBegin   (void);
    STDMETHODIMP UtteranceEnd     (void);
    STDMETHODIMP CommandStart     (void);
    STDMETHODIMP VUMeter          (WORD);
    STDMETHODIMP AttribChanged    (DWORD);
    STDMETHODIMP Interference     (DWORD);
};
typedef CIVCmdNotifySink * pcIVCmdNotifySink;

There is one more step needed as part of the initial declarations. The VCMDDEMO project must declare a list of commands to be loaded into the menu. These are the commands that the SR engine will be able to recognize when a user speaks. The project also uses a handful of other global-level declarations in the project. Listing 19.11 shows the final set of declarations for the project.


Listing 19.11. Declaring the menu commands for the project.
/*************************************************************************
Globals */

HINSTAncE         ghInstance;                // instance handle
CIVCmdNotifySink  gVCmdNotifySink;
HWND              ghwndResultsDisplay = NULL;
HWND              ghwndDialog = NULL;
PIVOICECMD        gpIVoiceCommand = NULL;
PIVCMDDIALOGS     gpIVCmdDialogs = NULL;
PIVCMDMENU        gpIVCmdMenu = NULL;
char              *gpszCommands = NULL; // Commands
char              *gpszCurCommand = NULL;  // current command that looking at

char  gszDefaultSet[] = // default command set
   "Help\r\n"
   "Minimize window.\r\n"
   "Maximize window.\r\n"
   "What time is it?\r\n"
   "What day is it?\r\n"
   "Create a new file.\r\n"
   "Delete the current file\r\n"
   "Open a file\r\n"
   "Switch to Word.\r\n"
   "Switch to Excel.\r\n"
   "Switch to calculator.\r\n"
   "Change the background color.\r\n"
   "Go to sleep.\r\n"
   "Wake up.\r\n"
   "Print the document.\r\n"
   "Speak the text.\r\n"
   "Paste\r\n"
   "Copy\r\n";

 BOOL        bNonFatalShutDown = FALSE;
 int CheckNavigator(void);
 DWORD VCMDState(PIVCMDATTRIBUTES);

The WinMain Procedure of the VCMDDEMO Project

The main routine of the project is quite simple. First, the project performs basic initialization by starting the OLE session, initializing the SR engine, and making sure it is up and running. After the OLE routines are done, the default list of commands is loaded, and the first dialog box is presented. This first dialog box simply displays a list of possible commands and listens to see if the user utters any of them (see Figure 19.2).

Figure 19.2 : Main dialog box of the VCMDDEMO project.

This same dialog box has a button that the user can press to bring up a secondary dialog box. This second dialog box allows the user to create a custom list of commands to recognize. Once the commands are entered, they are used to replace the default set, and the listening dialog box returns (see Figure 19.3).

Figure 19.3 : Secondary dialog box of the VCMIDDEMO project.

Once the user exits the top dialog box, string resources are freed and the OLE objects are released. Listing 19.12 shows the complete WinMain procedure.


Listing 19.12. The VCMDDEMO project WinMain procedure.
/*************************************************************************
winmain - Windows main code.
*/

int PASCAL WinMain(HINSTAncE hInstance, HINSTAncE hPrevInstance,
                   LPSTR lpszCmdLine, int nCmdShow)
{
ghInstance = hInstance;

// try to begin ole

   if (!BeginOLE())
   {
      if(!bNonFatalShutDown)
          MessageBox (NULL, "Can't open. OLE or a VoiceCommands call failed", NULL, ÂMB_OK);
      return 1;
   }


// Create a menu out of the default
gpszCommands = (char*) malloc (strlen(gszDefaultSet)+1);
if (!gpszCommands)
   return 1;
strcpy (gpszCommands, gszDefaultSet);
gpszCurCommand = gpszCommands;

// Bring up the dialog box
   DialogBox (hInstance, MAKEINTRESOURCE(IDD_VCMD),
      NULL, (FARPROC) DialogProc);

if (gpszCommands)
   free (gpszCommands);

// try to close ole
   if (!EndOLE())
      MessageBox (NULL, "Can't shut down OLE.", NULL, MB_OK);

   return 0;
}

Starting and Ending the OLE Session

All C++ programs need to use the OLE services of Windows to access SAPI services. The BeginOLE routine of the VCMDDEMO project covers a lot of ground. This one routine handles OLE initialization, the registration of SAPI services, creation of several SR objects, checking the registry for SAPI-related entries, and checking the status of the SR engine on the workstation.

The first step is to establish the start of OLE services. Listing 19.13 shows this part of the BeginOLE routine.


Listing 19.13. Starting the OLE services.
/*************************************************************************
BeginOLE - This begins the OLE and creates the voice commands object,
   registers with it, and creates a temporary menu.

inputs
   none
returns
   BOOL - TRUE if is succedes
*/

BOOL BeginOLE (void)
{
   DWORD    dwVer; // OLE version
   HRESULT  hRes;
   VCMDNAME VcmdName; // Command Name
   LANGUAGE Language; // language to use
   PIVCMDATTRIBUTES  pIVCmdAttributes;


   gpIVoiceCommand = NULL;
   gpIVCmdDialogs = NULL;
   gpIVCmdMenu = NULL;

// Initialize OLE

   SetMessageQueue(96);
   dwVer = CoBuildVersion();

   if (rmm != HIWORD(dwVer)) return FALSE;         // error

   if (FAILED(CoInitialize(NULL))) return FALSE;

The next step is to create a Voice Command object, get a pointer to one of the training dialog boxes provided by SAPI, and register the VCMDDEMO application to receive notifications when the SR engine recognizes a command. Listing 19.14 shows this part of the BeginOLE routine.


Listing 19.14. Creating the Voice Command object and registering the application.
// Create the voice commands object
if (CoCreateInstance(CLSID_VCmd,
   NULL,
   CLSCTX_LOCAL_SERVER,
   IID_IVoiceCmd,
   (LPVOID *)&gpIVoiceCommand) != S_OK)
      goto fail;

// Get the dialogs object
hRes = gpIVoiceCommand->QueryInterface(
   IID_IVCmdDialogs, (LPVOID FAR *)&gpIVCmdDialogs);
if (hRes)
   goto fail;

// Register
hRes = gpIVoiceCommand->Register("", &gVCmdNotifySink,
      IID_IVCmdNotifySink, VCMDRF_ALLMESSAGES, NULL);
if (hRes)
   goto fail;

If all that goes well, the next section of code creates a link to a command object attribute interface and checks the status of SAPI services on the workstation. The CheckNavigator routine checks the registry to see if speech services are present. If they are, the VCMDState routine is used to return a value indicating the status of SAPI services. Based on the return value, several different messages are displayed in dialog boxes for the user to review. These two routines are reviewed later in this chapter. Listing 19.15 shows the next part of the BeginOLE routine.


Listing 19.15. Checking the attributes of the SR engine.
//The following code checks for a navigator app and
//checks the state of voice commands

hRes = gpIVoiceCommand->QueryInterface(
   IID_IVCmdAttributes, (LPVOID FAR *)&pIVCmdAttributes);
if (pIVCmdAttributes)
{
       int     iRes;
    DWORD    dwRes;

    iRes = CheckNavigator();

    if (iRes == -1)//navigator not installed or has never been run(entries not in Âregistry)
    {
         MessageBox(NULL, "A navigator application is not installed on your system Âor it has \
been installed but has not been run. \r\nVCMD Demo can not continue", "Error", ÂMB_OK | MB_ICONSTOP);
        pIVCmdAttributes->Release();
        bNonFatalShutDown = TRUE;
        goto fail;
    }

    else if(iRes == 0)// navigator installed but not running
    {
         int iMBRes;

         iMBRes = MessageBox(NULL, "A navigator application is installed but not Ârunning. \
You can press \"Cancel\" and enable speech recognition by starting the navigator Âapplication or \
press \"OK\" and VCMD Demo will enable Speech Recognition without starting the navigator application", "Speech Recognition Status", MB_OKCAncEL | ÂMB_ICONQUESTION);

        if(iMBRes == IDCAncEL)
        {
             pIVCmdAttributes->Release();
            bNonFatalShutDown = TRUE;
            goto fail;
        }
        else if(iMBRes == IDOK)
        {
            pIVCmdAttributes->EnabledSet( TRUE );
              pIVCmdAttributes->AwakeStateSet( TRUE );
        }
    }

    else if (iRes == 1)// navigator installed and running
    {
         dwRes = VCMDState(pIVCmdAttributes);
        if(dwRes == 0)
             MessageBox(NULL, "Speech recognition is currently turned off. Please Âturn it on using the Navigator application.", "Speech Recognition Status", ÂMB_ICONINFORMATION);
        else if(dwRes == 1)
            MessageBox(NULL, "Speech recognition is currently in standby mode. ÂPlease turn it on using the Navigator application.", "Speech Recognition Status", ÂMB_ICONINFORMATION);
        else if(dwRes == 3)
        {
            MessageBox(NULL, "Voice Commands Call failed. This application will Âterminate.", "Error", MB_ICONSTOP | MB_OK);
            pIVCmdAttributes->Release();
            goto fail;
        }
    }

       pIVCmdAttributes->Release();
};

Finally, the routine creates a menu object to hold the command list. The final part of the routine contains code that is invoked in case of errors. This code releases any collected resources. Listing 19.16 shows the final portion of the BeginOLE routine.


Listing 19.16. Creating the Voice Command menu object.
// Create a menu object
lstrcpy (VcmdName.szApplication, "Voice Commands Demo");
lstrcpy (VcmdName.szState, "Main");
Language.LanguageID = LANG_ENGLISH;
lstrcpy (Language.szDialect, "US English");
hRes = gpIVoiceCommand->MenuCreate( &VcmdName,
&Language,
VCMDMC_CREATE_TEMP,
&gpIVCmdMenu
     );
if (hRes)
goto fail;
return TRUE;

// else failed
fail:
if (gpIVoiceCommand)
gpIVoiceCommand->Release();
if (gpIVCmdDialogs)
gpIVCmdDialogs->Release();
if (gpIVCmdMenu)
gpIVCmdMenu->Release();
gpIVoiceCommand = NULL;
gpIVCmdDialogs = NULL;
gpIVCmdMenu = NULL;
return FALSE;
}

After the user exits the main dialog box, the WinMain routine calls the EndOLE procedure. This procedure releases SAPI resources and closes out the OLE session. Listing 19.17 shows the code for the EndOLE procedure.


Listing 19.17. The EndOLE procedure.
/*************************************************************************
EndOLE - This closes up the OLE and frees everything else.

inputs
   none
returns
   BOOL - TRUE if succeed
*/

BOOL EndOLE (void)
{
// Free the interfaces
   if (gpIVoiceCommand)
      gpIVoiceCommand->Release();
   if (gpIVCmdDialogs)
      gpIVCmdDialogs->Release();
   if (gpIVCmdMenu)
      gpIVCmdMenu->Release();
   gpIVoiceCommand = NULL;
   gpIVCmdDialogs = NULL;
   gpIVCmdMenu = NULL;

// Free up all of OLE

   CoUninitialize ();

   return TRUE;
}

Checking the Status of Speech Services

The VCMDDEMO project contains two routines that check the status of speech services on the workstation. The first routine (CheckNavigator) checks the Windows registry to see if speech services have been installed on the machine. Listing 19.18 shows the CheckNavigator procedure.


Listing 19.18. The CheckNavigator procedure.
/****************************************************************************
*    CheckNavigator:
*
*    Checks the registry entries to see if a navigator application
*    has been installed on the machine. If the Navigator is installed
*    CheckNavigator returns its state(0 [not running], 1 [running]) else if no
*    navigator is found it returns -1.
****************************************************************************/
int CheckNavigator(void)
{
    HKEY  hKey;
    DWORD dwType=REG_DWORD, dwSize=sizeof(DWORD), dwVal;

    if( RegOpenKeyEx(HKEY_CURRENT_USER, "Software\\Voice", 0, KEY_READ, &hKey) != ÂERROR_SUccESS )
        return -1;

    if( RegQueryValueEx (hKey, "UseSpeech", 0, &dwType, (LPBYTE)&dwVal, &dwSize) != ÂERROR_SUccESS )
        return -1;

    RegCloseKey (hKey);

    return (int)dwVal;
}

The second routine in VCMDDEMO that checks the status of speech services is the VCMDState procedure. This routine uses the EnabledGet method of the Attributes interface to see whether the SR engine is already listening for audio input. Listing 19.19 shows the code for the VCMDState routine.


Listing 19.19. The VCMDState procedure.
/****************************************************************************
*    VCMDState:
*
*    Determines what listening state Voice Commands is in. Returns an int
*    specifying a state( 0 [not listening state], 1 [sleep state], 2 [listening
*    state]) or in case of error returns 3.
****************************************************************************/

DWORD VCMDState(PIVCMDATTRIBUTES pIVCmdAttributes)
{
     DWORD dwAwake, dwEnabled;

    dwAwake = dwEnabled = 0;
    if((FAILED(pIVCmdAttributes->EnabledGet(&dwEnabled))) || Â(FAILED(pIVCmdAttributes->AwakeStateGet(&dwAwake))))
         return 3;// function failed
    else
    {
        if(dwEnabled == 0)
            return 0; //not listening state
        else if(dwEnabled == 1 && dwAwake == 0)
            return 1; //sleep state
        else
            return 2; //listening state
    }
}

Handling the Main Dialog Box Events

Once the main dialog box starts, there are four possible events to handle. First, upon initiation of the dialog box, the commands are loaded into the menu object, and the timer is activated. The next possible event is the user pressing one of the three command buttons on the form. Here, if Cancel is selected, the program is ended. If the Train button is selected, a general dialog box (supplied by the engine) is called. Finally, if the Change button is pressed, the secondary dialog box is presented.

While waiting for the user to press a button, the timer event fires every two seconds. Each time the timer event occurs, the program displays a new command on the main form and waits for the user to speak the phrase. Finally, upon exiting the dialog box, the timer is canceled and the routine is exited. Listing 19.20 shows the code for this procedure.


Listing 19.20. Handling the main dialog box events.
/*************************************************************************
DialogProc
*/
BOOL CALLBACK DialogProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
switch (uMsg) {
   case WM_INITDIALOG:
      ghwndResultsDisplay = GetDlgItem (hWnd, IDC_HEARD);
      ghwndDialog = hWnd;

      if (UseCommands (gpszCommands, gpIVCmdMenu))
         return 1;   // error

      SetTimer (hWnd, TIMER_chANGECOMMAND, 2000, NULL);
      PostMessage (hWnd, WM_TIMER, TIMER_chANGECOMMAND, 0);
      return FALSE;
   case WM_COMMAND:
         switch (LOWORD(wParam))
            {
            case IDC_chANGE:
               // Change commands dialog box
               DialogBox (ghInstance, MAKEINTRESOURCE(IDD_chANGE),
                  hWnd, (FARPROC) ChangeProc);
               return TRUE;
            case IDC_TRAIN:
               gpIVCmdDialogs->GeneralDlg (hWnd, "Demo Training & General ÂControl");
               return TRUE;
            case IDCAncEL:
               EndDialog (hWnd, IDCAncEL);
               return TRUE;
            }
      break;
   case WM_TIMER:
      if (wParam == TIMER_chANGECOMMAND) {
         char     *pszToDisplay;
         DWORD    dwSize;
         char     cTemp;

         // go to the next command
         if (!gpszCurCommand)
            gpszCurCommand = gpszCommands;
         gpszCurCommand = NextCommand (gpszCurCommand,
            &pszToDisplay, &dwSize);
         if (gpszCurCommand) {
            cTemp = pszToDisplay[dwSize];
            pszToDisplay[dwSize] = '\0';
            SetDlgItemText (hWnd, IDC_COMMAND, pszToDisplay);
            pszToDisplay[dwSize] = cTemp;
            };
         }
      else {
         // clear the static
         KillTimer (hWnd, TIMER_CLEARRESULT);
         SetDlgItemText (hWnd, IDC_HEARD, "");
         };
      return TRUE;
   case WM_DESTROY:
      KillTimer (hWnd, TIMER_chANGECOMMAND);
      break;   // continue on
   };

return FALSE;  // didn't handle
}

Handling the Change Dialog Box Events

The secondary dialog box allows the user to create a new, customized menu. When the OK button is pressed, the new command list is copied to the menu object using the UseCommands routine. Listing 19.21 contains the code for the ChangeProc procedure.


Listing 19.21. Handling the change dialog events.
/*************************************************************************
ChangeProc
*/
BOOL CALLBACK ChangeProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
switch (uMsg) {
   case WM_INITDIALOG:
      SetDlgItemText (hWnd, IDC_EDIT, gpszCommands);
      return FALSE;
   case WM_COMMAND:
         switch (LOWORD(wParam))
            {
            case IDOK:
               {
               char     *pszNew;
               DWORD    dwSize;
               // Throw out the old buffer & copy the
               // new one in. Then set us to use it
               pszNew = (char*) malloc (dwSize =
                  GetWindowTextLength(GetDlgItem(hWnd, IDC_EDIT)) + 1);
               if (pszNew) {
                  GetDlgItemText (hWnd, IDC_EDIT, pszNew, dwSize);
                  free (gpszCommands);
                  gpszCommands = pszNew;
                  gpszCurCommand = pszNew;
                  if (UseCommands (gpszCommands, gpIVCmdMenu))
                     return 1;   // error

                  };
               EndDialog (hWnd, IDOK);
               }
               return TRUE;
            case IDCAncEL:
               EndDialog (hWnd, IDCAncEL);
               return TRUE;
            }
      break;
   };

return FALSE;  // didn't handle
}

Handling the Menu Commands

The VCMDDEMO project uses three routines to handle the process of loading commands into the voice menu object and responding to recognized spoken commands. The UseCommands procedure is the high-level routine that loads the voice menu object. There are five steps to complete for loading the menu. First, the current menu is deactivated. Then the number of commands in the menu is updated (pMenu->Num). Next, all the commands in the existing menu are removed using the pMenu->Remove method.

Once all commands are removed, the GetCommands procedure is called to collect all the new commands into a single data block. This block is then used as the source for adding the new menus (pMenu->Add). Notice that the C++ Add method allows you to add all menus at once by telling the SR engine to total the number of commands in the data block. After the data is loaded, the memory is freed and the menu is reactivated using the pMenu->Activate method. Listing 19.22 shows how this looks in the VCMDDEMO code.


Listing 19.22. The UseCommands procedure.
/************************************************************************
UseCommands - This accepts a NULL-terminated string with commands
   separated by new-lines and loads them into the voice-menu object,
   replacing any old commands.

inputs
   char     *pszCommands - String.
   PIVCMDMENU  pMenu - Menu
returns
   HRESULT - error
*/
HRESULT UseCommands (char *pszCommands, PIVCMDMENU pMenu)
{
HRESULT  hRes;
SDATA    data;
DWORD    dwNum, dwStart;

hRes = pMenu->Deactivate ();
if (hRes) return hRes;

hRes = pMenu->Num (&dwNum);
if (hRes) return hRes;

if (dwNum)
   hRes = pMenu->Remove (1, dwNum, VCMD_BY_POSITION);
if (hRes) return hRes;

if (!GetCommands(pszCommands, &data, &dwNum))
   return ResultFromScode (E_OUTOFMEMORY);

hRes = pMenu->Add (dwNum, data, &dwStart);
if (hRes) return hRes;

// free memory
free (data.pData);

hRes = pMenu->Activate(ghwndDialog, 0);
return hRes;
}

The GetCommands procedure converts the text strings stored in the memory block into the menu commands structure understood by the SAPI system. The first step is a call to NextCommand to get a command line to load. Then, after computing the total length of the command, a series of steps is executed to build a valid menu structure. This continues in a loop until the NextCommand procedure reports that all command strings have been converted. Listing 19.23 shows the source code for the GetCommands procedure.


Listing 19.23. The GetCommands procedure.
/*****************************************************************
GetCommands - Takes a block of memory containing command strings and
   converts it into a list of VCMDCOMMAND structures.

inputs
   char     *pszMemory - NULL terminated string. Commands are
               separated by \n or \r.
   PSDATA   pData - This is filled in with a pointer to memory and
               size for the vcmdcommand structure. The memory
               must be freed by the caller with free().
   DWORD    *pdwNumCommands - Filled with the number of commands
*/
BOOL GetCommands(char *pszMemory, PSDATA pData, DWORD *pdwNumCommands)
{
    PSTR pTemp;
    DWORD dwTotal, dwSize, dwSizeDesc, dwSizeCat;
    DWORD dwSizeCmd;
    PVCMDCOMMAND pCmd, pCmdNew;
    chAR    *pszBegin;
    DWORD   dwCmdSize;
    DWORD   dwCmds = 0;  // Current count
    DWORD   dwCount = 1; // Command number
    char    szCat[] = "Main";

    dwTotal = dwSize = 0;

    pTemp = (PSTR)malloc(0);
    if (!pTemp)
        return FALSE;

    pCmd = (PVCMDCOMMAND)pTemp;
    for( ;; ) {
        pszMemory = NextCommand (pszMemory, &pszBegin, &dwCmdSize);
        if (!pszMemory)
            break;   // no more

        // size of header
        dwSize = sizeof(VCMDCOMMAND);

        // get command length
        dwSizeCmd = (dwCmdSize + 1);

        // doubleword align
        dwSizeCmd += 3;
        dwSizeCmd &= (~3);
        dwSize += dwSizeCmd;

        // get description length
        dwSizeDesc = (dwCmdSize + 1);

        // doubleword align
        dwSizeDesc += 3;
        dwSizeDesc &= (~3);
        dwSize += dwSizeDesc;

        // get category length
        dwSizeCat = lstrlen(szCat) + 1;

        // doubleword align
        dwSizeCat += 3;
        dwSizeCat &= (~3);
        dwSize += dwSizeCat;

        // action indicator
        dwSize += sizeof(DWORD);

        // accumulate total size
        dwTotal += dwSize;

        // reallocate enough memory to hold this command
        pTemp = (PSTR)realloc((PVOID)pCmd, dwTotal);

        // fill in the new command
        pCmd = (PVCMDCOMMAND)pTemp;
        pTemp += (dwTotal-dwSize);
        pCmdNew = (PVCMDCOMMAND)pTemp;
        memset (pCmdNew, 0, dwSize);

        pCmdNew->dwSize = dwSize;
        pCmdNew->dwFlags = 0;
        pCmdNew->dwAction = (DWORD)(pCmdNew->abData-(PBYTE)pTemp);
        pCmdNew->dwActionSize = sizeof(DWORD);
        pCmdNew->dwCommandText = NULL;

        // point past header to begin of data
        pTemp += (pCmdNew->abData-(PBYTE)pTemp);

        // action index
        *(DWORD *)pTemp = dwCount++;
        pTemp += sizeof(DWORD);

        // command
        pCmdNew->dwCommand = (DWORD)((PBYTE)pTemp - (PBYTE)pCmdNew);
        strncpy(pTemp, pszBegin, dwCmdSize);
        pTemp += dwSizeCmd;

        // description
        pCmdNew->dwDescription = (DWORD)((PBYTE)pTemp - (PBYTE)pCmdNew);
        strncpy(pTemp, pszBegin, dwCmdSize);
        pTemp += dwSizeDesc;

        // category
        pCmdNew->dwCategory = (DWORD)((PBYTE)pTemp - (PBYTE)pCmdNew);
        strcpy(pTemp, szCat);

        // we just added another command
        dwCmds++;
    }

    pData->pData = (PVOID)pCmd;
    pData->dwSize = dwTotal;
    *pdwNumCommands = dwCmds;
    return TRUE;
}

The final routine that handles the processing of menu commands is the NextCommand procedure. This routine searches the command list data block for characters until a newline character is found. The resulting string is assumed to be a valid command. This command is returned, and the starting position is updated for the next call to this routine. Listing 19.24 shows the code for the NextCommand routine.


Listing 19.24. The NextCommand procedure.
/****************************************************************
NextCommand - This looks in the memory and finds the next command.

inputs
   chAR     *pszMemory - Memory to start looking at
   PchAR    *pBegin - Filled in with a pointer to the
         beginning of the command string.
   DWORD    *pdwSize - Filled in with the number of bytes in
         the string (excluding any NULL termination)
returns
   chAR * - The next place that NextCommand should be called from,
         or NULL if no command string was found.
*/
chAR * NextCommand (chAR *pszMemory, PchAR *pBegin,
   DWORD *pdwSize)
{
DWORD i;

for( ;; ) {
   // try to find a non-newline
   while ((*pszMemory == '\n') || (*pszMemory == '\r')) {
      if (*pszMemory == '\0')
         return NULL;
      pszMemory++;
      };

   // Try to find a new-line
   for (i = 0;
      (pszMemory[i] != '\n') && (pszMemory[i] != '\r') && (pszMemory[i] != '\0');
      i++);
   if (!i) {
      if (!pszMemory[i])
         return NULL;   // end
      pszMemory++;
      continue;   // try again
      };

   // Else, we've found a string
   *pBegin = pszMemory;
   *pdwSize = i;
   return pszMemory + i;
   };
}

The Notification Events

The final code section to review is the code that handles the various notification events of the voice command menu object. In this program, most of the events are ignored. However, the two most important events-CommandRecognized and CommandOther-contain code that will display the command spoken. The standard COM methods (QueryInterface, AddRef, and Release) are also coded. Listing 19.25 contains the code for the notification events.


Listing 19.25. Handling the notification events.
/**************************************************************************
 *  Voice Command notification objects
 **************************************************************************/

CIVCmdNotifySink::CIVCmdNotifySink (void)
{
    m_dwMsgCnt = 0;
}

CIVCmdNotifySink::~CIVCmdNotifySink (void)
{
// this space intentionally left blank
}

STDMETHODIMP CIVCmdNotifySink::QueryInterface (REFIID riid, LPVOID *ppv)
{
    *ppv = NULL;

    /* always return our IUnknown for IID_IUnknown */
    if (IsEqualIID (riid, IID_IUnknown) || IsEqualIID(riid,IID_IVCmdNotifySink)) {
        *ppv = (LPVOID) this;
        return NOERROR;
    }

    // otherwise, can't find
    return ResultFromScode (E_NOINTERFACE);
}

STDMETHODIMP_ (ULONG) CIVCmdNotifySink::AddRef (void)
{
    // normally this increases a reference count, but this object
    // is going to be freed as soon as the app is freed, so it doesn't
    // matter
    return 1;
}

STDMETHODIMP_(ULONG) CIVCmdNotifySink::Release (void)
{
    // normally this releases a reference count, but this object
    // is going to be freed when the application is freed so it doesn't
    // matter
    return 1;
};


STDMETHODIMP CIVCmdNotifySink::CommandRecognize(DWORD dwID, PVCMDNAME pName,
   DWORD dwFlags, DWORD dwActionSize, PVOID pAction, DWORD dwNumLists,
   PSTR pszListValues, PSTR pszCommand)
{
// This is called when a recognition occurs for the current application

if (!ghwndResultsDisplay)
   return NOERROR;

SetWindowText (ghwndResultsDisplay,
   pszCommand ? pszCommand : "[Unrecognized]");

// Kill the timer & restart it
if (ghwndDialog) {
   KillTimer (ghwndDialog, TIMER_CLEARRESULT);
   SetTimer (ghwndDialog, TIMER_CLEARRESULT, 2000, NULL);
   };

return NOERROR;
}


STDMETHODIMP CIVCmdNotifySink::CommandOther(PVCMDNAME pName, PSTR pszCommand)
{
// This is called when a recognition occurs for another application,
// or an unknown recognition occurs

if (!ghwndResultsDisplay)
   return NOERROR;

SetWindowText (ghwndResultsDisplay,
   pszCommand ? pszCommand : "[Unrecognized]");

// Kill the timer & restart it
if (ghwndDialog) {
   KillTimer (ghwndDialog, TIMER_CLEARRESULT);
   SetTimer (ghwndDialog, TIMER_CLEARRESULT, 2000, NULL);
   };


return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::MenuActivate(PVCMDNAME pName, BOOL bActive)
{
// Called when a menu is activated or deactivated. We don't care.

    return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::AttribChanged(DWORD dwAttribute)
{
// Called when an attribute changes. We don't care.
    return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::Interference(DWORD dwType)
{
// Called when audio interference is happening. We don't care.
    return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::CommandStart(void)
{
// Called when SR starts processing a command. We don't care.
    return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::UtteranceBegin(void)
{
// Called when an utterance begins. We don't care.
    return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::UtteranceEnd()
{
// Called when an utterance finishes. We don't care.
    return NOERROR;
}

STDMETHODIMP CIVCmdNotifySink::VUMeter(WORD wLevel)
{
// Called for VU meter notifications. We don't care.
    return NOERROR;
}

That completes the review of the VCMDDEMO project. You can test this project by compiling the project or just by running the VCMDDEMO.EXE program. You'll find this in the SPEEch\BIN directory that was created when you installed the Microsoft Speech SDK. You can also find this program on the CD-ROM that ships with this book.

Summary

In this chapter, you learned how to write simple TTS and SR applications using C++. You reviewed (and hopefully were able to build) a simple TTS program that you can use to cut and paste any text for playback. You also reviewed (built and tested) a simple SR interface to illustrate the techniques required to add SR services to existing applications.

In the next chapter, you'll build a complete program in Visual Basic 4.0 that uses both SR and TTS services to implement a voice-activated text reader.