1991: The Web
October 1, 1994: W3C Founded
December 15, 1994: Netscape Released
Microsoft didn't really enter the picture in a serious way until ~1996
<object ID="AgentControl" width="0" height="0" CLASSID="clsid:D45FD31B-5C6E-11D1-9EC1-00C04FD7081F" CODEBASE="http://server/path/msagent.exe#VERSION=2,0,0,0">
</object>
But wow.... 1997!!!
A lot of people were thinking about speech on the Web...
1998: CSS2
Aural Stylesheets!!!

Like HTML, but for voice...
OMG!
So many XMLs
You get an XML, and you get an XML, and...
None of these made it to the browser...
Community Groups vs Working Groups
2010: A Decade After VoiceXML...
Speech XG Community Group
Some of us at Google have been working on extending HTML elements with speech...
They got a lot more than they bargained for...
But more than that they got actual competing draft proposals from Google, Microsoft, Mozilla and Voxeo as well
OK... so let's talk about where we are now...
A top-level object
speechSynthesis.speak(...)
SpeechSynthesisUtterance
An utterance is the thing that the synthesizer "speaks"
speechSynthesis.speak(
new SpeechSynthesisUtterance(
'Hello Darkness, my old friend'
)
)
.pitch, .rate, .volume
0-2, 0.1-10, 0-1
let utterance = new SpeechSynthesisUtterance(
`dude...setting the pitch,
rate and volume is easy,
but really weird.`
)
utterance.pitch = 0.1 // so low
utterance.rate = 0.5 // half speed
utterance.volume = 0.9 // a little quieter than normal
speechSynthesis.speak(utterance)
Bell Labs demonstrated digital voice
synthesis at the World's Fair in 1939
Speech, at the end of the day is full of really, really hard problems.
If you find this interesting, I wrote a whole
piece on The History of Speech
.lang
let apollonia = new SpeechSynthesisUtterance(
`io so l'inglese:
Monday Tuesday Thursday Wednesday
Friday Sunday Saturday
`
)
apollonia.pitch = 1.1
apollonia.lang = 'it-US'
speechSynthesis.speak(apollonia)
Utterance events
.onstart, .onend, .onerror
let outEl = document.querySelector('#zepplin-out'),
utteranceOne = new SpeechSynthesisUtterance(
`We come from the land of the ice and snow`
),
utteranceTwo = new SpeechSynthesisUtterance(
`From the midnight sun where the hot springs flow`
),
syncUIHandler = (event) => {
outEl.innerText = event.target.text
}
utteranceOne.onstart = syncUIHandler
utteranceTwo.onstart = syncUIHandler
utteranceTwo.onend = () => {
outEl.innerText = 'Ahh! Ahh!... Any questions?'
}
speechSynthesis.speak(utteranceOne)
speechSynthesis.speak(utteranceTwo)
.speak() is async, .start() doesn't mean "started speaking"
// pauses processing of the queue
// utterances have a corresponding onpause
speechSynthesis.pause()
// resumes processing of the queue
// utterances have a corresponding onpause
speechSynthesis.resume()
// empties the queue, no effect on paused state
speechSynthesis.cancel()