One of the most intriguing capabilities that Bing offers for empowering always-connected devices is the ability to visually recognize and understand artifacts in the real world. This is made available to developers via the Bing Optical Character Recognition (OCR) control for Windows 8.1 store apps. This control enables text from the outside world to be read by taking a picture from an on-device camera and analyzing the picture via a web service, with text and position information returned for interpretation. In this post, we will leverage the Bing OCR control to capture addresses from real-world artifacts, and will geocode the addresses with Bing Maps REST Locations API to enable us to obtain accurate coordinates and other metadata, as well as visualize the locations using Bing Maps for Windows Store Apps.
The prerequisites for building and successfully testing our application include:
- Windows 8.1 and Visual Studio 2013
- A Windows 8.x certified device with a built in rear facing camera that supports 1280x720 or 640x480 resolution in photo mode
- The Bing Maps SDK for Windows Store Apps
- The Bing OCR Control
- A subscription to the Bing OCR Control in the Windows Azure Marketplace
- Registration of your application in the Windows Azure Marketplace
We can refer to Bing OCR Control documentation for detailed instructions on installing and registering the control, and enabling projects for optical character recognition.
Enabling Our Project for OCR
In Visual Studio 2013, we will first create a new project using the Visual C# Windows Store Blank App (XAML) template, and will name our project OCRAddressCapture.
We now add the following references to our project:
- Bing Maps for C#, C+ + or Visual Basic
- Microsoft Visual C+ + Runtime Package
- Bing Optical Character Recognition (OCR) Control
We must use Configuration Manager to select an individual platform to compile for, rather than All CPU, to satisfy a Visual C++ Runtime requirement:
We will add Webcam capabilities to our app in our Package.appmanifest, as detailed in the Bing OCR Control documentation:
Laying Out Our UI
Our UI will be divided into four quadrants:
- In the top-left corner, we will place the Bing OCR control; the control is effectively a viewing area showing video from the available camera, which, when clicked or tapped, will capture the current image and send it to the OCR service for processing
- In the top-right corner, we will display a Canvas which will be used to display the captured image; onto the canvas, individual ToggleButtons will be positioned for each Word the OCR control recognizes
- In the bottom-left corner, we will show a Bing Map displaying the results of geocoding requests
- In the bottom-right corner, we will display the address components and coordinates for geocoding results
Also included in the UI will be Buttonsto allow the user to choose when to submit selected address components for geocoding, and to clear all current data.
In our XAML code, we add:
- XML namespace declarations for Bing.Ocr and Bing.Maps
- Column and Row Definitions for our main Grid
- An OcrControl with a basic InstructionOverlay for user information
- A Canvas element to present OCR results, along with a bordered TextBlock in the same Grid cell to enhance presentation
- Two Buttons to enable resetting of data, and submitting of geocoding requests
- A bordered Map control, to which we add our Bing Maps Key as Credentials
- A bordered Grid to present geocoding results in
Our final markup should appear as shown below:
<Page x:Class="OCRAddressCapture.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:local="using:OCRAddressCapture" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:bm="using:Bing.Maps" xmlns:ocr="using:Bing.Ocr" mc:Ignorable="d"> <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <Grid.ColumnDefinitions> <ColumnDefinition></ColumnDefinition> <ColumnDefinition></ColumnDefinition> </Grid.ColumnDefinitions> <Grid.RowDefinitions> <RowDefinition></RowDefinition> <RowDefinition Height="50"></RowDefinition> <RowDefinition></RowDefinition> </Grid.RowDefinitions> <ocr:OcrControl x:Name="ocr" Grid.Row="0" Grid.Column="0" Margin="10,10,10,10" BorderBrush="Gray"> <ocr:OcrControl.InstructionOverlay> <TextBlock Text="Click or tap to capture image." IsHitTestVisible="False" /> </ocr:OcrControl.InstructionOverlay> </ocr:OcrControl> <Border Grid.Column="1" Grid.Row="0" Margin="10,10,10,10" BorderThickness="2" BorderBrush="Gray"> <TextBlock VerticalAlignment="Center" HorizontalAlignment="Center" FontSize="30" Foreground="Gray"> Captured OCR image will appear here. </TextBlock> </Border> <Canvas x:Name="OcrResultsCanvas" Grid.Row="0" Grid.Column="1" Margin="10,10,10,10" > </Canvas> <Button x:Name="btnReset" Grid.Row="1" Grid.Column="0" Content="Reset All" HorizontalAlignment="Center" Click="btnReset_Click" Foreground="Gray" BorderBrush="Gray"></Button> <Button x:Name="btnGeocode" Grid.Row="1" Grid.Column="1" Content="Submit Selected Address Fields" HorizontalAlignment="Center" Click="btnGeocode_Click" Foreground="Gray" BorderBrush="Gray"></Button> <Border Grid.Column="0" Grid.Row="2" Margin="10,10,10,10" BorderThickness="2" BorderBrush="Gray"> <bm:Map Name="myMap" Credentials="your bing maps key" BorderThickness="2" BorderBrush="Gray"/> </Border> <Border Grid.Column="1" Grid.Row="2" Margin="10,10,10,10" BorderThickness="2" BorderBrush="Gray"> <Grid Margin="10,10,10,10" Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <Grid.ColumnDefinitions> <ColumnDefinition Width="180"></ColumnDefinition> <ColumnDefinition></ColumnDefinition> </Grid.ColumnDefinitions> <Grid.RowDefinitions> <RowDefinition Height="50"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition Height="30"></RowDefinition> <RowDefinition></RowDefinition> </Grid.RowDefinitions> <TextBlock Text="Address of Selected Result" FontSize="30" FontWeight="Bold" Grid.ColumnSpan="2" Grid.Column="0" Grid.Row="0" Foreground="Gray"></TextBlock> <TextBlock Text="Display Name:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="1" Foreground="Gray" /> <TextBlock x:Name="txtDisplay" Text="" FontSize="20" Grid.Column="1" Grid.Row="1" Foreground="Gray"/> <TextBlock Text="Street:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="2" Foreground="Gray" /> <TextBlock x:Name="txtStreet" Text="" FontSize="20" Grid.Column="1" Grid.Row="2" Foreground="Gray"/> <TextBlock Text="Town:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="3" Foreground="Gray" /> <TextBlock x:Name="txtTown" Text="" FontSize="20" Grid.Column="1" Grid.Row="3" Foreground="Gray"/> <TextBlock Text="State:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="4" Foreground="Gray" /> <TextBlock x:Name="txtState" Text="" FontSize="20" Grid.Column="1" Grid.Row="4" Foreground="Gray"/> <TextBlock Text="Zip/Postal:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="5" Foreground="Gray" /> <TextBlock x:Name="txtPC" Text="" FontSize="20" Grid.Column="1" Grid.Row="5" Foreground="Gray"/> <TextBlock Text="Country:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="6" Foreground="Gray" /> <TextBlock x:Name="txtCountry" Text="" FontSize="20" Grid.Column="1" Grid.Row="6" Foreground="Gray"/> <TextBlock Text="Latitude:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="7" Foreground="Gray" /> <TextBlock x:Name="txtLat" Text="" FontSize="20" Grid.Column="1" Grid.Row="7" Foreground="Gray"/> <TextBlock Text="Longitude:" FontSize="20" FontWeight="Bold" Grid.Column="0" Grid.Row="8" Foreground="Gray" /> <TextBlock x:Name="txtLon" Text="" FontSize="20" Grid.Column="1" Grid.Row="8" Foreground="Gray"/> </Grid> </Border> </Grid> </Page>
Adding Our C# Code
We will leverage the Bing Maps REST Locations API for geocoding functionality, as this exposes more geocoding metadata than the Bing.Maps.Search API in the Windows Store Apps control. We will add a new item to our project: a Visual C# Code File, which we will name BingMapsRESTServices.cs. We will populate this file with the JSON data contracts for the REST Services found here. For a more complete review of how to consume the REST services via .NET, see the technical articleon MSDN.
In our MainPage.xaml.cs code-behind, we add the following using statements:
using Bing.Maps; using Bing.Ocr; using BingMapsRESTService.Common.JSON; using System.Runtime.Serialization.Json; using System.Threading.Tasks;
We now declare a private variable to capture our OCR image width, and add a handler for our page loaded event. This handler will:
- apply our OCR credentials we obtained when registering the control
- assign event handlers for the OCR Completed, Failed, and FrameCaptured events
- initiate the OCR control image preview, such that we are ready to capture an image
… private double imageWidth; public MainPage() { this.InitializeComponent(); this.Loaded += MainPage_Loaded; } async void MainPage_Loaded(object sender, RoutedEventArgs e) { // client id and client secret for OCR control: ocr.ClientId = "your client id"; ocr.ClientSecret = "your client secret"; // Assign event handlers for OCR: ocr.Completed += ocr_Completed; ocr.Failed += ocr_Failed; ocr.FrameCaptured += ocr_FrameCaptured; // Start the OCR control: await ocr.StartPreviewAsync(); } …
When the end-user taps or clicks the OCR control video preview area, the OCR control will raise the FrameCaptured event. We will add an event handler which will:
- retrieve the captured BitmapImage
- use the captured image as the background for our Canvas on which we will place the Words recognized during OCR
- determine the width in pixels of the captured image; this width will enable us to determine an appropriate scale to use when placing found Words on our results Canvas
void ocr_FrameCaptured(object sender, FrameCapturedEventArgs e) { // Retrieve captured bitmap image: var bitmap = new Windows.UI.Xaml.Media.Imaging.BitmapImage(); bitmap.SetSource(e.CapturedImage); // determine image width for scale calculation and placement of results on canvas: imageWidth = bitmap.PixelWidth; // Set Background of canvas with captured image: ImageBrush ibBackground = new ImageBrush(); ibBackground.ImageSource = bitmap; OcrResultsCanvas.Background = ibBackground; }
When the OCR control receives a response from the Bing OCR Service, the Completed event is raised. Our event handler will:
- determine an appropriate scale to use for placement of results in the Canvas, based on the actual width of the OcrResultsCanvas as compared to the imageWidth of the captured image
- clear any previously captured words from the OcrResultsCanvas
- loop through each captured Word in each Line, and show each word in a ToggleButton that we position as children on the OcrResultsCanvas
- restart the OCR control preview camera
private async void ocr_Completed(object sender, Bing.Ocr.OcrCompletedEventArgs e) { // Determine scale, for presentation of words for selection: var scale = OcrResultsCanvas.ActualWidth / imageWidth; // Clear results canvas of words: OcrResultsCanvas.Children.Clear(); // Confirm that we have captured text: if (e.Result.Lines.Count == 0) { // Inform user of error Notify("No text found."); await ocr.StartPreviewAsync(); return; } // Read the captured content, and present for selection foreach (Bing.Ocr.Line l in e.Result.Lines) { foreach (Word word in l.Words) { ToggleButton tbWord = new ToggleButton(); tbWord.Content = word.Value; OcrResultsCanvas.Children.Add(tbWord); Canvas.SetLeft(tbWord, word.Box.Left * scale); Canvas.SetTop(tbWord, word.Box.Top * scale); } } // Restart the preview camera. await ocr.StartPreviewAsync(); }
If an error occurs in the OCR Control, the Failed event will be raised. Our event handler will capture the error details, and notify the user. The OCR Control will also be started or reset as appropriate:
async void ocr_Failed(object sender, Bing.Ocr.OcrErrorEventArgs e) { // Display error message. string errorText = e.ErrorMessage; // Give guidance on specific errors. switch (e.ErrorCode) { case Bing.Ocr.ErrorCode.CameraBusy: errorText += "\nClose any other applications that may be using the camera and try again."; break; case Bing.Ocr.ErrorCode.CameraLowQuality: errorText += "\nAttach a camera that meets the requirements for OCR and try again."; break; case Bing.Ocr.ErrorCode.CameraNotAvailable: errorText += "\nAttach a camera and try again."; break; case Bing.Ocr.ErrorCode.CameraPermissionDenied: errorText += "\nTurn camera permissions on in your application settings."; break; case Bing.Ocr.ErrorCode.NetworkUnavailable: errorText += "\nCheck your Internet connection and try again."; break; default: errorText += "\nNotify your application provider."; break; } Notify(errorText); // Continue or cancel, depending on the error. if (e.ErrorCode == Bing.Ocr.ErrorCode.Success) await ocr.StartPreviewAsync(); else { await ocr.ResetAsync(); } }
Since captured images may contain text that is not address-related, we give the user the ability to select the address elements from the captured words. This is done by tapping or clicking each of the ToggleButtons containing the address elements. Once the desired address elements have been selected, the user can click the btnGeocode Button, which will initiate the geocoding process. The event handler will ensure that there are some child elements on the OcrResultsCanvas, and if so, will concatenate the content of all selected child elements into an address string. We are making the assumption that the address elements will appear in our captured text in an appropriate order, and concatenating them as such. Once we have created our address string, we pass it as a parameter to our asynchronous Geocode method.
private async void btnGeocode_Click(object sender, RoutedEventArgs e) { // Check to see if we have any selected words: if (OcrResultsCanvas.Children.Count > 0) { // Construct address string from selected words: // Note that we are appending the selected elements // in the order they were captured in the image // To-do: Consider allowing selection of order of // address elements. string address = ""; foreach (ToggleButton tbWord in OcrResultsCanvas.Children) { // If the word is selected, we append it to our address string if (tbWord.IsChecked == true) { address += tbWord.Content + " "; } } // If we have an address string, we will geocode it: if (address != "") { // geocode address await Geocode(address); } else { // inform user no words have been selected: Notify("No elements have been selected."); } } else { // inform user no words have been captured Notify("No address elements are available."); } }
Our Geocode method will clear our map and address display fields, and use the received address string as the query parameter in a request to the Bing Maps REST Locations API. The JSON response data will be serialized against the data contracts we previously added to our project. We loop through each Location returned in our results, and:
- add a Pushpin to our map, numbering it and adding the Location as the pushpin Tag
- add a Tapped event handler to each pushpin, which will display the address and coordinates when the pushpin is selected
- display the details of the first Location result by default
- set the map view as appropriate, depending on whether we have a single result, or multiple results
private async Task Geocode(string query) { // Clear pushpins from map: myMap.Children.Clear(); // Clear all address fields: ClearAddressFields(); // Ensure we have a query value: if (!string.IsNullOrWhiteSpace(query)) { // Obtain Map session credentials: string BingMapsKey = await myMap.GetSessionIdAsync(); //Create the request URL for the Geocoding service Uri geocodeRequest = new Uri( string.Format("http://dev.virtualearth.net/REST/v1/Locations?q={0}&key={1}", query, BingMapsKey)); //Make a request and get the response Response r = await GetResponse(geocodeRequest); if (r != null && r.ResourceSets != null && r.ResourceSets.Length > 0 && r.ResourceSets[0].Resources != null && r.ResourceSets[0].Resources.Length > 0) { LocationCollection locations = new LocationCollection(); int i = 1; foreach (BingMapsRESTService.Common.JSON.Location l in r.ResourceSets[0].Resources) { //Get the location of each result Bing.Maps.Location location = new Bing.Maps.Location(l.Point.Coordinates[0], l.Point.Coordinates[1]); //Create a pushpin for each location Pushpin pin = new Pushpin() { // make location available to be used in pushpin Tapped handler Tag = l, Text = i.ToString() }; i++; //Add a tapped event that will display the address of the location pin.Tapped += (s, a) => { var p = s as Pushpin; DisplayAddress(p.Tag as BingMapsRESTService.Common.JSON.Location); }; //Set the location of the pushpin MapLayer.SetPosition(pin, location); //Add the pushpin to the map myMap.Children.Add(pin); //Add the coordinates of the location to a location collection locations.Add(location); } // show first address found: BingMapsRESTService.Common.JSON.Location topLoc = r.ResourceSets[0].Resources[0] as BingMapsRESTService.Common.JSON.Location; DisplayAddress(topLoc); //Set the map view based on the location collection: if (locations.Count == 1) { myMap.SetView(new LocationRect(new Bing.Maps.Location(topLoc.BoundingBox[2], topLoc.BoundingBox[1]), new Bing.Maps.Location(topLoc.BoundingBox[0], topLoc.BoundingBox[3]))); } else { myMap.SetView(new LocationRect(locations)); } } else { Notify("No Results found."); } } else { Notify("Invalid Location Data Input"); } }
The DisplayAddress method shows the address and coordinate details for geocoding results as appropriate:
private void DisplayAddress(BingMapsRESTService.Common.JSON.Location location) { // populate address details for first result: if (location.Address.FormattedAddress != null) txtDisplay.Text = location.Address.FormattedAddress; if (location.Address.AddressLine != null) txtStreet.Text = location.Address.AddressLine; if (location.Address.Locality != null) txtTown.Text = location.Address.Locality; if (location.Address.AdminDistrict != null) txtState.Text = location.Address.AdminDistrict; if (location.Address.PostalCode != null) txtPC.Text = location.Address.PostalCode; if (location.Address.CountryRegion != null) txtCountry.Text = location.Address.CountryRegion; txtLat.Text = location.Point.Coordinates[0].ToString(); txtLon.Text = location.Point.Coordinates[1].ToString(); }
We add in additional methods to our class to handle the retrieving of the REST responses, resetting of data, and notifying user of errors.
When running the application, we are able to point our device camera towards clear text, and capture an image by tapping the OCR Control. If the capture was successful, we can select the address elements from the results, and tap the submit button to have the location data geocoded. If the input is able to be geocoded, all results will be displayed on the map, and the address details of each result can be viewed by tapping the respective pushpin.
By capturing address details from real-world artifacts using the OCR Control and geocoding them with Bing Maps, we can enable scenarios such as:
- Quickly and accurately capturing address details for new contacts from business cards or other promotional material, and adding them into our CRM system
- Capturing addresses from real-estate brochures and viewing the house locations using rich Bing Maps imagery
- Obtaining driving directions to restaurants offices, or other locations using printed menus, billboards, or printed listings
By integrating the robust cloud-based optical character recognition capabilities of the OCR Control with the powerful location-based tools offered by Bing Maps, developers can build applications that help bridge the gap between what we see in the real world and what we can do on our devices.
The complete source code for the project can be found here.
- Geoff Innis, Bing Maps Technical Specialist