Speaking of cool Web stuff...

The Web Speech APIs

"The Plan"...

  • Efforts to bring speech to the Web / Current state of standardization
  • Code examples / walkthrough of APIs (with demos)
  • Discussion of what's good and bad about the APIs
  • What's next?
Today -26 years
  • 1991: The Web

  • October 1, 1994: W3C Founded

  • December 15, 1994: Netscape Released

  • Microsoft didn't really enter the picture in a serious way until ~1996

Today -20 years
<object ID="AgentControl" width="0" height="0" CLASSID="clsid:D45FD31B-5C6E-11D1-9EC1-00C04FD7081F" CODEBASE="http://server/path/msagent.exe#VERSION=2,0,0,0">
</object>

But wow.... 1997!!!

A lot of people were thinking about speech on the Web...

Today -19 years

1998: CSS2

Aural Stylesheets!!!

Today -18 years

Like HTML, but for voice...

  • March 1999: VoiceXML Forum
    AT&T Corporation, IBM, Lucent, and Motorola
  • Handed over to W3C in 2000
    AskJeves, AT&T, Avaya, BT, Canon, Cisco, France Telecon, General Magic, Hitachi, HP, IBM, isSound, Intel, Locus Dialogue, Lucent, Microsoft, Mitre, Motorola, Nokia, Nortel, Nuance, Phillips, PipeBeach, Speech Works, Sun, Telecon Italia, TellMe.com, and Unisys
It's happening!

OMG!

Today -17 years
Today -17 years

So many XMLs

You get an XML, and you get an XML, and...

None of these made it to the browser...

Standards

Community Groups vs Working Groups

Today -7 years

2010: A Decade After VoiceXML...

Speech XG Community Group

Some of us at Google have been working on extending HTML elements with speech...

They got a lot more than they bargained for...

But more than that they got actual competing draft proposals from Google, Microsoft, Mozilla and Voxeo as well

It didn't.

Today -5 years

2012: Another Community Group!

Today -4 years

June 10, 2013: Extensible Web Manifesto

OK... so let's talk about where we are now...

The Web: Present Day

The Web: Present Day

The Web: Present Day

  • Implementations are buggy / inconsistent
  • Implementations and bugs are low priority
  • The APIs aren't super great
  • There is no W3C standard, official draft or WG

But wait...

Let's go to the details!

window.speechSynthesis

A top-level object

speechSynthesis.speak(...)
SpeechSynthesisUtterance

An utterance is the thing that the synthesizer "speaks"

speechSynthesis.speak(
  new SpeechSynthesisUtterance(
    'Hello Darkness, my old friend'
  )
)
.pitch, .rate, .volume

0-2, 0.1-10, 0-1

let utterance =   new SpeechSynthesisUtterance(
  `dude...setting the pitch,
  rate and volume is easy,
  but really weird.`
)
utterance.pitch = 0.1  // so low
utterance.rate = 0.5   // half speed
utterance.volume = 0.9 // a little quieter than normal
speechSynthesis.speak(utterance)

Expressiveness

  • Voices
  • ...?...
Today -78 years!

Bell Labs demonstrated digital voice
synthesis at the World's Fair in 1939

Speech, at the end of the day is full of really, really hard problems.

If you find this interesting, I wrote a whole
piece on The History of Speech

.lang

let apollonia = new SpeechSynthesisUtterance(
   `io so l'inglese:
      Monday Tuesday Thursday Wednesday
      Friday Sunday Saturday
    `
  )
apollonia.pitch = 1.1
apollonia.lang = 'it-US'
speechSynthesis.speak(apollonia)
Utterance events

.onstart, .onend, .onerror

let outEl = document.querySelector('#zepplin-out'),
    utteranceOne = new SpeechSynthesisUtterance(
       `We come from the land of the ice and snow`
    ),
    utteranceTwo = new SpeechSynthesisUtterance(
      `From the midnight sun where the hot springs flow`
    ),
    syncUIHandler = (event) => {
       outEl.innerText = event.target.text
    }

  utteranceOne.onstart = syncUIHandler
  utteranceTwo.onstart = syncUIHandler

  utteranceTwo.onend = () => {
    outEl.innerText = 'Ahh! Ahh!... Any questions?'
  }

speechSynthesis.speak(utteranceOne)
speechSynthesis.speak(utteranceTwo)

Err...

.speak() is async, .start() doesn't mean "started speaking"

// pauses processing of the queue
// utterances have a corresponding onpause
speechSynthesis.pause()

// resumes processing of the queue
// utterances have a corresponding onpause
speechSynthesis.resume()

// empties the queue, no effect on paused state
speechSynthesis.cancel()
                     

Now, speak!