Enable voice interface in your VR iOS app using IBM Watson SDK for Unity

In this post, we will create a simple VR iOS app to showcase how to use IBM Watson SDK for Unity to add voice interface to an application.

The app accepts voice command from user through the iOS device’s microphone and changes color of a 3D sphere accordingly. The tutorials below is for Mac environment.

Prerequisites:

  • Unity IDE . Version 2019.1 is recommended.
  • IBM Cloud account. Register here for free.
  • Be familiar with Unity IDE and game engine
  • Git client.

Create a Watson Speech to Text (STT) service instance in IBM Cloud:

You first need to create a Watson STT service instance in IBM Cloud by logging to the platform using your IBM ID, then search for “Text to speech” from the Catalog and follow the instruction.

Once the service’s been created, navigate to the service’s Dashboard and grasp the “API Key” token to use later in the code. You can either use the auto-generated key or create a new one as necessary.

Import IBM Watson SDK for Unity to your Unity project

Create a 3D Unity project under which an Assets directory will be generated. Use git to clone the SDKs from IBM’s github to the Assets directory:

$ git clone https://github.com/IBM/unity-sdk-core.git
$ git clone https://github.com/watson-developer-cloud/unity-sdk

Once the sdk-core and sdk artifacts are completely downloaded, pick the ExampleStreaming in the examples list to add into the project.

3D Unity project view

After that, add a Sphere and Plane 3D Game Object to the project (GameObject >> 3D Object on the menu)

Next step is to create 3 asset materials correspondingly to the 3 colors we want to set for the sphere. Name the materials red_sphere, green_sphere, yellow_sphere and set the color for each of them accordingly.

Create 3 asset materials

Add code to set colors of the sphere via voice command

Now open the ExampleStream.cs file using a text editor and add these lines into the ExampleStream class after the declaration of ResultsField as following to declare the MeshRenderer and 3 materials as public variables. These variables will then show in the Inspector view of ExampleStreaming asset in Unity.

[Tooltip("Text field to display the results of streaming.")]
public Text ResultsField;

public MeshRenderer sphereMeshRenderer;
public Material red_sphere;
public Material yellow_sphere;
public Material green_sphere;

And this part is to set color for the sphere based on the key word found in the speech command

Log.Debug("ExampleStreaming.OnRecognize()", text);
ResultsField.text = text;

if (alt.transcript.Contains("red")) {
    sphereMeshRenderer.material = red_sphere;
} 
else if (alt.transcript.Contains("green"))
{
    sphereMeshRenderer.material = green_sphere;
}
else if (alt.transcript.Contains("yellow"))
{
    sphereMeshRenderer.material = yellow_sphere;
}

Now link the materials to the variables accordingly using Inspector pane, and enter the API key you get from IBM Cloud for the STT service.

Link the materials to code using Inspector

Export the project to run on iOS platform

Go to Build Setting view of the project and switch it to iOS. Make sure to select Player Settings and set API Compatibility Level to .Net 4.x

Setup Player Settings …

After exporting to an Xcode project, make sure to add Privacy - Microphone Usage Description property to the Info.plist to allow the app to access the device’s Mic. The app will simply crushed otherwise.

Xcode project settings, Info.plist

Run the application on iPhone

Make sure your iOS device has internet connection, the launch the app. When you speak to the device, the app will change the color of the sphere accordingly to the key words found in the commands. Here are how it looks in my iPhone.

The Text is streamed on-the-fly from IBM Watson service to the device.

Live demo:

Keep your hands dirty – Build a small robot which can see, speak and read

This post guides you through how to connect your Raspberry Pi to IBM Watson services and make a simple robot which can listen to your commands, recognize objects, report back in voice. It can also read text content in English.

If you follow the instruction (and lucky), you will have something like this:

Architecture overview

Hardware requirements

  • A Raspberry Pi B. I use Raspberry 3 for this tutorial, which has built-in wifi. If you have other models, you need to make sure you have a Wifi dongle or other ways to get your Pi connected to the internet (eg: sharing the internet connection with your workstation through ethernet, etc)
  • A speaker. I use an Anker A7910 mini speaker
  • A Raspberry pi camera

Other requirements

  • IBM Cloud account, free, click here
  • Refer to this link to know how to setup your Pi. Try to get the latest OS version.
  • Git, Node.js, npm on your Raspberry:

$sudo apt-get install nodejs npm node-semver

Steps to “cook” this recipe

Create Watson text to speech service

Follow this instruction (https://github.com/dnguyenv/distance-bot#create-bluemix-text-to-speech-service) to create a text to speech service in IBM Cloud environment. Again, it’s free.
Create Watson speech to text service
Login to IBM Cloud (https://bluemix.net) with your registered ID, then Go to Catalog, Search for speech to text, or click here and select the service.

Name the service and credential if you want, or just leave them by default, Select Create
Once the service is created, go to Service credentials , View credentials section to record the username and password information. You will need it later for the app. You can always create a new credential to access the service as needed, by selecting New credential

Create Watson Visual Recognition service

Similarly to the Text To Speech and Speech To Text service, create a Visual Recognition service and then record the credentials to be used later in the code

Install OCR application on your Pi


$sudo apt-get install tesseract-ocr-eng

Clone the code and install dependencies

Clone the source code:
Get into your Pi using SSH or VNC client. From the terminal in your Raspberry Pi, perform this command to get the source code:

$sudo git clone https://github.com/dnguyenv/seebot.git

Put your Watson services credentials (Text to speech, Speech to text and Visual Recognition service) into the config.js file following the template. You also can configure other values in the file to meet your need.
Run the code:

$cd seebot
$sudo npm install
$sudo npm start

Now you can talk to the robot and experience what you see in the demo video.