How to Create a Speech-Powered Geocoding and Routing Experience in a Windows Store App

In this post, we will speech-enable a XAML-based Windows Store app to provide a speech-driven geocoding and routing experience.

Two very common activities in map-based applications include searching for locations, and obtaining driving directions. In many scenarios, it can be more convenient and faster for users to provide the input for these activities via speech rather than keyboard, especially when using devices in which a full keyboard is not present. In this post, we will enable speech-based geocoding and routing using Bing Maps for Windows Store Apps, the Bing Speech Recognition Control for Windows 8.1, and the speech synthesis capabilities of the Windows 8.1 SDK.

The prerequisites for building our application include:

We can refer to the Bing Speech Recognition Control documentation for detailed instructions on installing and registering the control and enabling projects for speech recognition.

Speech-enabling Our Project

In Visual Studio 2013, we will first create a new project using the Visual C# Windows Store Blank App (XAML) template, and will name our project SpeechGeoRoute.

We now add the following references to our project:

  • Bing Maps for C#, C+ + or Visual Basic
  • Microsoft Visual C+ + Runtime Package
  • Bing.Speech

We must use Configuration Manager to select an individual platform to compile for, rather than All CPU, to satisfy a Visual C++ Runtime requirement:

SpeechPoweredGeocodingRoutingConfigManager

We will add Microphone and Location capabilities to our app in our Package.appmanifest, as detailed in the Bing Speech Recognition Control documentation. We add "microphone" and "location" capabilities, and an Extensions section as shown below:

<Capabilities>
<Capability Name="internetClient" />
<DeviceCapability Name="location" />
<DeviceCapability Name="microphone" />
</Capabilities>
<Extensions>
<Extension Category="windows.activatableClass.inProcessServer">
<InProcessServer>
<Path>Microsoft.Speech.VoiceService.MSSRAudio.dll</Path>
<ActivatableClass ActivatableClassId="Microsoft.Speech.VoiceService.MSSRAudio.Encoder" ThreadingModel="both" />
</InProcessServer>
</Extension>
<Extension Category="windows.activatableClass.proxyStub">
<ProxyStub ClassId="5807FC3A-A0AB-48B4-BBA1-BA00BE56C3BD">
<Path>Microsoft.Speech.VoiceService.MSSRAudio.dll</Path>
<Interface Name="IEncodingSettings" InterfaceId="C97C75EE-A76A-480E-9817-D57D3655231E" />
</ProxyStub>
</Extension>
<Extension Category="windows.activatableClass.proxyStub">
<ProxyStub ClassId="F1D258E4-9D97-4BA4-AEEA-50A8B74049DF">
<Path>Microsoft.Speech.VoiceService.Audio.dll</Path>
<Interface Name="ISpeechVolumeEvent" InterfaceId="946379E8-A397-46B6-B9C4-FBB253EFF6AE" />
<Interface Name="ISpeechStatusEvent" InterfaceId="FB0767C6-7FAA-4E5E-AC95-A3C0C4D72720" />
</ProxyStub>
</Extension>
</Extensions>

 

Laying Out Our UI

Our UI will be map-centric, with a right-side ScrollViewer for displaying geocoding and routing output, along with an app bar on the bottom to allow the user to trigger speech input.

SpeechPoweredGeocodingRoutingUI

In our XAML code, we add:

  • XML namespace declarations for Bing.Speech.Xaml and Bing.Maps
  • A Map control, to which we add our Bing Maps Key as Credentials
  • A ProgressBar to enable users to track asynchronous geocoding and routing requests
  • A MediaElement to enable playing of our synthesized speech
  • A ListBox element which we will data bind to our geocoding results
  • A StackPanel element which we will data bind to our directions itinerary
  • A SpeechRecognizerUx element in our AppBar for speech capture
  • AppBarButtons in our AppBar for initiating speech capture, and clearing the map

Our final markup should appear as shown below:

<Page
x:Class="SpeechGeoRoute.MainPage"
xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
xmlns:local="using:SpeechGeoRoute"
xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:bm="using:Bing.Maps"   
xmlns:bs="using:Bing.Speech.Xaml"
mc:Ignorable="d">

<Grid Background="{StaticResource ApplicationPageBackgroundThemeBrush}">
<Grid.ColumnDefinitions>
<ColumnDefinition/>
<ColumnDefinition Width="375" />
</Grid.ColumnDefinitions>

<!-- Map and Progress bar -->
<bm:Map Credentials="InsertBingMapsKeyHere" x:Name="myMap" Grid.Column="0"></bm:Map>
<ProgressBar x:Name="progressBar" IsIndeterminate="True" Height="10" Width="300" Visibility="Collapsed" Grid.Column="0" VerticalAlignment="Top" />
<!-- Right Side Panel -->
<ScrollViewer Background="Gray" Grid.Column="1">
<StackPanel Margin="10,10,20,10">
<!-- Captured Speech -->
<TextBlock Text="Speech Captured:" FontSize="24"  Foreground="LightGray" />
<TextBlock x:Name="txtSpeech" HorizontalAlignment="Left" FontSize="20" />
<MediaElement x:Name="media" AutoPlay="False"></MediaElement>

<!-- Geocode Results Panel -->
<ListBox x:Name="lbGeocodeResults" 
SelectionChanged="lbGeocodeResults_SelectionChanged" Margin="0,10,0,0">
<ListBox.ItemTemplate>
<DataTemplate>
<TextBlock Text="{Binding Name}"/>
</DataTemplate>
</ListBox.ItemTemplate>
</ListBox>

<!-- Route Itinerary Panel -->
<StackPanel x:Name="spRouteResults">
<ListBox ItemsSource="{Binding RouteLegs}">
<ListBox.ItemTemplate>
<DataTemplate>
<ListBox x:Name="lbItinerary" ItemsSource="{Binding ItineraryItems}" SelectionChanged="lbItinerary_SelectionChanged">
<ListBox.ItemTemplate>
<DataTemplate>
<TextBlock Text="{Binding Instruction.Text}" 
TextWrapping="Wrap" Width="300" />
</DataTemplate>
</ListBox.ItemTemplate>
</ListBox>
</DataTemplate>
</ListBox.ItemTemplate>
</ListBox>
</StackPanel>
</StackPanel>
</ScrollViewer>
</Grid>
<Page.BottomAppBar>
<AppBar IsSticky="True" IsOpen="True">
<StackPanel x:Name="RightPanel" Orientation="Horizontal" HorizontalAlignment="Right">
<!-- speech control-->
<bs:SpeechRecognizerUx x:Name="srSpeechControl" Height="85" VerticalAlignment="Center"/>
<!-- button to capture speech -->
<AppBarButton x:Name="btnSpeech" Icon="Microphone" Label="Find or Route" Click="btnSpeech_Click">
</AppBarButton>
<AppBarSeparator></AppBarSeparator>
<!-- clear map button -->
<AppBarButton x:Name="btnClearMap" Icon="Clear" Label="Clear Map" Click="btnClearMap_Click">
</AppBarButton>
</StackPanel>
</AppBar>
</Page.BottomAppBar>
</Page>

 

Adding Our C# Code

In our MainPage.xaml.cs code-behind, we add the following using statements:

using Bing.Speech;
using Bing.Maps;
using Bing.Maps.Search;
using Bing.Maps.Directions;
using System.Threading.Tasks;
using Windows.Devices.Geolocation;

 

We now declare private variables for our SpeechRecognizer and Geolocator, and add a handler for our page loaded event. This handler will:

  • instantiate our Geolocator, to enable us to find the user’s current location as the start point for routes
  • instantiate our SpeechRecognizer with our credentials we obtained when registering the control
  • associate our SpeechRecognizer with the SpeechRecognizerUx element in our XAML
  • add custom ‘Tips’ to guide users, including a tip on how to obtain driving directions by saying ‘Directions to’ and a destination place name
private SpeechRecognizer speechRecognizer;
private Geolocator geolocator;

public MainPage()
{
this.InitializeComponent();
this.Loaded += MainPage_Loaded;
}

void MainPage_Loaded(object sender, RoutedEventArgs e)
{
// instantiate geolocator, to find start point for routes:
    geolocator = new Geolocator();

// configure speech credentials:
var credentials = new SpeechAuthorizationParameters();
    credentials.ClientId = "yourClientId";
    credentials.ClientSecret = "yourClientSecret";
    speechRecognizer = new SpeechRecognizer("en-US", credentials);
    srSpeechControl.SpeechRecognizer = speechRecognizer;

// Add tips specific to usage:
    srSpeechControl.Tips = new string[]
    {
"Say a place name to navigate the map to that location.",
"If you are not getting accurate results, try using a headset microphone.",
"Speak with a consistent volume.",
"To obtain directions, say 'Directions to' and a place name."
    };
}
…

 

When the user clicks the Microphone button in our AppBar, we will use the Speech Recognition Control to capture the user’s speech, and take action.

  • if the user’s speech contains the phrase ‘Directions to’, we will attempt to obtain driving directions, using the content of the speech after this phrase as our destination
  • if the user’s speech does not contain that phrase, we will attempt to geocode the spoken location
  • if the ‘Confidence’ of the speech recognition is not sufficient, we will inform the user
private async void btnSpeech_Click(object sender, RoutedEventArgs e)
{
try
    {
var result = await speechRecognizer.RecognizeSpeechToTextAsync();

// Check confidence, and proceed unless confidence is Rejected:
if (result.TextConfidence != SpeechRecognitionConfidence.Rejected)
        {
// Clear current map contents:
            ClearMap();

// Show captured speech:
            txtSpeech.Text = result.Text;

// If user requested directions, calculate directions
//  Otherwise, attempt to geocode input
int dirIndex = result.Text.IndexOf("directions to");
if (dirIndex > -1)
            {
string destination = result.Text.Remove(0, dirIndex + 13);
                Speak("Getting directions to " + destination);
                await Directions(destination);

            } else {
                await Geocode(result.Text);
            }
        } else {
// Inform user of Rejected confidence:
            Speak("I didn't understand that.");
        }

    }
catch (Exception)
    {
// inform user of error, and ensure progress bar is hidden:
        Speak("Error encountered processing speech.");
        progressBar.Visibility = Visibility.Collapsed;
    }
}

 

When we capture speech for geocoding, we use the SearchManager to issue an asynchronous geocoding request, using the captured text as our ‘Query’ input.

It is worth noting that using speech recognition to capture location information can work very well for populated places (such as ‘Atlanta, Georgia’) administrative districts (such as ‘Florida’ or ‘Ontario’) countries, and landmarks (such as ‘The Empire State Building’). It is less effective in capturing locations with numeric components, such as full street addresses, or zip codes.

We analyze our response data, and for each location returned, we place a pushpin on the map, with a Tapped event handler. When tapped, the name of the location will be spoken to the user. We also use data binding to populate our geocoding results ListBox with the location data. When the selection in the ListBox is changed, the selected location will also be shown and the name spoken to the user:

private async Task Geocode(string query)
{

if (!string.IsNullOrWhiteSpace(query))
    {

// display progress bar:
        progressBar.Visibility = Visibility.Visible;

// Set the address string to geocode
        GeocodeRequestOptions requestOptions = new GeocodeRequestOptions(query);

// Make the geocode request 
        SearchManager searchManager = myMap.SearchManager;
        LocationDataResponse response = await searchManager.GeocodeAsync(requestOptions);


if (response != null &&
            response.HasError != true &&
            response.LocationData.Count > 0 )
        {

int i = 1;

foreach (GeocodeLocation l in response.LocationData)
            {
//Get the location of each result
                Bing.Maps.Location location = l.Location;

//Create a pushpin each location
                Pushpin pin = new Pushpin()
                {
                    Tag = l.Name,
                    Text = i.ToString()
                };

                i++;

//Add a Tapped event, which will speak the name of the location:
                pin.Tapped += (s, a) =>
                {
var p = s as Pushpin;
                    Speak(p.Tag as string);
                };

//Set the location of the pushpin
                MapLayer.SetPosition(pin, location);

//Add the pushpin to the map
                myMap.Children.Add(pin);

            }

//Set the map view based on the best view of the first location
            myMap.SetView(response.LocationData[0].Bounds);

//Pass the results to the item source of the GeocodeResult ListBox
            lbGeocodeResults.ItemsSource = response.LocationData;
        }
else
        {
            Speak("No geocoding results found.");
        }
// hide progress bar:
        progressBar.Visibility = Visibility.Collapsed;
    }
else
    {
        Speak("Error encountered geocoding input.");
    }
}

 

When we identify the desire to obtain driving directions, we use the DirectionsManager to issue an asynchronous request for directions, using the captured destination text as our destination waypoint, and we use the coordinates of the user’s current location, obtained from the Geolocator, as our starting waypoint.

We check our route response, and if we have successfully obtained a route, we use the first route returned, and:

  • display the route path
  • add labeled pushpins for the start and endpoints of the route
  • use data binding to populate our route results ListBox with the itinerary instructions for our route; note that we have also added an event handler such that when the selection in the ListBox is changed, the selected itinerary instruction will be shown and spoken to the user
  • Inform the user of the calculated drive time to the destination with speech
public async Task Directions(string destination)
{

// show progress bar:
    progressBar.Visibility = Visibility.Visible;

// get current location for starting point:
    Geoposition locCurrent = await geolocator.GetGeopositionAsync();


// Set the start (current location) and end (spoken destination) waypoints
    Waypoint startWaypoint = new Waypoint(new Location(locCurrent.Coordinate.Point.Position.Latitude, locCurrent.Coordinate.Point.Position.Longitude));
    Waypoint endWaypoint = new Waypoint(destination);

    WaypointCollection waypoints = new WaypointCollection();
    waypoints.Add(startWaypoint);
    waypoints.Add(endWaypoint);

// configure directions manager:
    DirectionsManager directionsManager = myMap.DirectionsManager;
    directionsManager.Waypoints = waypoints;
    directionsManager.RenderOptions.AutoSetActiveRoute = false;

// Calculate route directions
    RouteResponse response = await directionsManager.CalculateDirectionsAsync();

// Ensure we have a calculated route:
if (response.HasError != true && response.Routes.Count > 0) 
    {

// Use first route returned:
        Route myRoute = response.Routes[0];

// Display the route on the map
        directionsManager.ShowRoutePath(myRoute);

//Add custom start and end pushpins
        Pushpin start = new Pushpin()
        {
            Text = "A",
            Background = new SolidColorBrush(Colors.Green)
        };

        myMap.Children.Add(start);
        MapLayer.SetPosition(start,
new Bing.Maps.Location(myRoute.RouteLegs[0].ActualStart.Latitude,
                myRoute.RouteLegs[0].ActualStart.Longitude));

        Pushpin end = new Pushpin()
        {
            Text = "B",
            Background = new SolidColorBrush(Colors.Red)
        };

        myMap.Children.Add(end);
        MapLayer.SetPosition(end,
new Bing.Maps.Location(myRoute.RouteLegs[0].ActualEnd.Latitude,
            myRoute.RouteLegs[0].ActualEnd.Longitude));

//Pass the route to the Data context of the Route Results StackPanel:
        spRouteResults.DataContext = myRoute;

// set the view to display the route path:
        myMap.SetView(myRoute.Bounds);

// inform user that directions are being calculated:
        Speak("Driving time to destination is: " + (Math.Round(myRoute.TravelDuration / 60)).ToString() + " minutes.");
    } else {
// no route has been calculated:
        Speak("Error finding route.");
    }
// hide progress bar:
    progressBar.Visibility = Visibility.Collapsed;
}

 

In our previous code blocks, we have used synthesized speech to convey information to the user. This is accomplished by using a SpeechSynthesizer object to create an audio stream and output speech based on a plain text string:

private async void Speak(string txtSpeech)
{
// The object for controlling the speech synthesis engine (voice).
    Windows.Media.SpeechSynthesis.SpeechSynthesizer synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();

// Generate the audio stream from plain text.
    Windows.Media.SpeechSynthesis.SpeechSynthesisStream stream = await synth.SynthesizeTextToStreamAsync(txtSpeech);

// Send the stream to the media object.
this.media.AutoPlay = true;
    media.SetSource(stream, stream.ContentType);
    media.Play();

}

 

We now have the core of our app in place. With the addition of some code to handle our ListBox selection events, and clearing of the data on the map, our application is ready to go.

When running the application we find that we can quickly navigate our map from place to place by simply tapping our Microphone button and speaking location names, often faster than we would be able to even with a full keyboard present. The synthesized speech of place names and itinerary instructions adds reinforcement and context to the visual map presentation as well.

SpeechPoweredGeocodingRouting

Try the following activities:

  • Click the Microphone button and say "The Eiffel Tower"
  • Click the Microphone button and say "London"; you should see multiple options presented in the right panel, which you can select to have the place name spoken, and presented on the map
  • Click the Microphone button and say "Directions to Redmond, Washington" (or a suitable nearby town or landmark); After the directions have been calculated and presented, you should be informed of the driving time in minutes from your current location; select one of the itinerary items in the right panel, to navigate to that maneuver location, and have the maneuver instruction spoken

By powering our map navigation with Bing Speech, we can extend the usability of our mapping applications across a wider range of scenarios and devices, and make our end-user experience more efficient and engaging.

The complete source code for the project can be found here.

– Geoff Innis, Bing Maps Technical Specialist