Fun with Text-to-Speech

There is no greater fun than playing around with Text-to-Speech on the AT&T Labs website.

ATT Labs

Text-to-Speech has great advantages for aiding the communication needs of the disabled: Those without voice are given one; those who cannot see can hear text. Text-to-Speech can also be used in the everyday world of the ordinary consumer. I have visited the AT&T Labs website to have phrases like “Janna is calling!” made into .WAV files.

I then save the files and upload them to a service like Coolservice.dk so I can then easily download the files to my cellular phone. Once I have the sound files on my BlackBerry, I can then create special “ringtones” that will play a certain sound file when a particular person calls. Here’s what I hear on my BlackBerry when Janna calls me from her BlackBerry:

That phrase is spoken over and over by “Crystal” when Janna calls and it is an eerie feeling. Here are some more Text-to-Speech examples I created on the AT&T Labs site using different “voices.” The sentence each voice speaks is: “Welcome to David W. Boles’ Urban Semiotic!”

Text-to-Speech

I wanted to test the system with a long and fairly unique and complex phrase to challenge the Text-to-Speech system and it worked! Listen in to the results by clicking on the following links to the sound files and see if you can understand what is being spoken from the text:

How do you feel about the “accents” applied to the speech? Are the voices stereotypical or are they appropriately representative and culturally authentic?

35 Comments

fruey (Let's Have It) says:

January 4, 2007 at 9:25 am

Well the UK English and the French sound reasonably accurate, but the plethora of UK English accents is somewhat missing from (an admittedly US based) project.
-Fruey

Loading...
David W. Boles says:

January 4, 2007 at 9:29 am

Hiya fruey!
Thanks for the feedback!
Which UK accents are missing?

Loading...
fruey (Let's Have It) says:

January 4, 2007 at 9:49 am

David,
Well there are quite a few. Major ones I can kind of imitate (mostly badly)
Cockney, Scouse, Mancunian, Scottish, Glasgow Scottish, Welsh, Brummie, West Country, generic Northern (“northern” vowels), generic Southern (“southern” vowels), Geordie…
An exercise for other readers to put towns / regions to some of those, though all UK readers will know I expect 🙂

Loading...
David W. Boles says:

January 4, 2007 at 9:58 am

Ah! What a list, fruey!
I wonder how and why ATT picked the accents for their Text-to-Speech feature?
It’s interesting that the only example of a sound file I provided where the speaker’s intonation goes up at the end of the sentence is the UK version.

Loading...
Shirley says:

January 4, 2007 at 12:21 pm

Good morning, David. What a world this is—this electronic age in which we live. This morning as I sat in our living room Jerry asked, “Have you seen my phone, Shirley?”
I hadn’t. He punched his number into my cell phone and walked about the house, down into the garage, back upstairs, and, finally, within his closet he heard a muffled ring. Yep, there inside the suit coat he had worn yesterday was targeted phone.
A bit later, he was speaking, “Can you hear me now? How about now?” as he moved the phone from ear to ear.
My very intelligent mom died when I was 12, and should she be snapped back to life, and have been with me this morning, I’m sure she would have been startled to hear Jerry asking if I had seen his phone, and then to observe him walking about the house, pinging his own instrument. The AT&T Text to Speech program that in a flash said in a clear voice, “Welcome to Shirley Buxton’s blog” would surely have sent my mother into a confused shaking of her head.
An exciting world, indeed.
Shirley

Loading...
Nicola says:

January 4, 2007 at 12:49 pm

I am trying to decide if this is a good or a bad day not to have sound rigged up on my computer ?

Loading...
David W. Boles says:

January 4, 2007 at 1:13 pm

Hi Shirley!
Ah, what a lovely insight! Technology compresses time and space and forces us to act quicker and to think faster and that may not always be the best method for memory retention or for interacting with each other.
It must have been surreal in many ways watching Jerry use technology to tag his lost memory. Perhaps one day the phones will do all the thinking for us and ping us when they lose us.
😀
I’m sorry to hear you lost your mother so early in your life. I agree the technological advancements in your life are three times of those that touched her life and the current generation will probably have technological advances that circle our achievements four times over…

Loading...
David W. Boles says:

January 4, 2007 at 1:14 pm

Nicola —
I can’t believe you’re missing all the fun today!
We need your ear and your evaluation of the UK accent.
You can also speed up, pause and slow down the ATT Speech-to-Text engine to get some really funky things to happen and to also add some clarity if things get too hard to understand.

Loading...
Nicola says:

January 4, 2007 at 2:24 pm

Will see if I can borrow a computer with sound later

Loading...
David W. Boles says:

January 4, 2007 at 2:26 pm

Cool!
Don’t forget to go to the ATT site later and make your own funky voice files!
😀

Loading...
Chris says:

January 4, 2007 at 3:03 pm

Hi David,
I like the idea of playing with the voice files. I can see where the program might be a good tool to use with podcasting stories.

Loading...
David W. Boles says:

January 4, 2007 at 3:13 pm

Hi Chris!
Yes, it’s a lot of fun. A wonderful site.
Do you have sound today? If so, do the voice accents sound authentic to you or not? Are the accents even necessary?

Loading...
Chris says:

January 4, 2007 at 4:04 pm

Hi David,
The voices didn’t sound too bad and they were understandable. I like the idea of accents. It makes things more interesting and provides variety.
One thing I’d like to have is a choice of a Spanish accented English speaker — that’d be nice! Maybe a Penelope Cruz accent. 🙂

Loading...
Chris says:

January 4, 2007 at 4:09 pm

Here’s the perfect accent that is needed for the text-to-voice machine:
Sofia Vergara of ABC’s The Knights of Prosperity.

Loading...
David W. Boles says:

January 4, 2007 at 4:31 pm

Hi Chris!
Thanks for that feedback. I agree.
Hey, Sofia is pretty zippy! Looks a lot like Zeta-Jones!

Loading...
Chris says:

January 4, 2007 at 4:40 pm

Hi David,
Mummmmm … A Catherine Zeta-Jones voice wouldn’t be bad either. 😉

Loading...
David W. Boles says:

January 4, 2007 at 4:52 pm

Oh, yeah! They both definitely have looks that match their fab voices!

Loading...
Lisa says:

January 4, 2007 at 5:28 pm

This is a cool find. I have been using it to create ringtones as well. It’s a great way to personalize profiles.
I have personally used it on my blackberry but I have friends that have tried it with Nextel as well as LG phones on Verizon.
Hopefully it will stay free for a while.

Loading...
Kathakali Chatterjee says:

January 4, 2007 at 5:29 pm

Hi David,
Anjali sounds lot like a South Indian…it’s pretty close.

Loading...
David W. Boles says:

January 4, 2007 at 5:38 pm

Thanks for the info, Lisa.

Loading...
David W. Boles says:

January 4, 2007 at 5:39 pm

Katha —
Does providing an “Indian” accent via this service seem appropriate to you or is it pandering to please a culture?

Loading...
Kathakali Chatterjee says:

January 4, 2007 at 5:58 pm

It seems AT&T does have an Indian customer base they want to cater…and the service provider always try to keep the customer happy!

Loading...
David W. Boles says:

January 4, 2007 at 6:24 pm

Katha —
Let me make my question darker. There isn’t a “Southern English” accent or a “New Jersey English” accent — yet there is an “Indian English” accent.
Does that strike you as racist and stereotypical in any way providing an “Apu-like” Simpson’s Indian accent as a voicing choice?

Loading...
Kathakali Chatterjee says:

January 4, 2007 at 8:30 pm

David,
“Apu” is a comedy character and as an average comedy generally goes over the top – ‘The Simpsons’ is no exception.
Is it stereotypical? Yes, no less than any over used pre conceived notion.
Is that racist?
Well, after watching “There’s Something About Mary”, “Wedding Crashers” and “40 Year Old Virgin” if someone concludes that Americans don’t understand anything better than “gross humor” – would you call them racist?
AT & T just copied an worn out issue. Why blame the poor company?
Let there be light! 😀

Loading...
David W. Boles says:

January 4, 2007 at 11:40 pm

Hi Katha!

AT & T just copied an worn out issue.

That’s what bothers me so much!
They’re an excellent company that knows better than to take the low, lazy road!

Loading...
Kathakali Chatterjee says:

January 5, 2007 at 12:03 am

I understand David!
It’s people pleasing and it’s cross cultural – happens evrytime, everywhere.
Why?
That’s the easy way – I think.

Loading...
David W. Boles says:

January 5, 2007 at 12:04 am

I’m not a big fan of easy, Katha!

Loading...
fruey (Let's Have It) says:

January 5, 2007 at 4:48 am

Cultural stereotypes aside, there is a wide range of charming accents in international English, and some local grammatical and colloquialisms which are sufficiently prevalent to be not stereotypical but in “common usage”.
India has a fine history in English literature and English language culture and I love the accents that I hear from friends and colleagues. I really like imitating other English accents but have been accused of racism for that. I see it rather more as a linguistic challenge. Of course, it should be done in fun and never to belittle a nation by only saying stupid things in that accent, but rather to try to capture what it is that makes up a local accent and try to speak differently from usual. Sometimes accents can break down psychological barriers, I wrote about speaking “silly English” a while back.
Singapore has a lot of idiosyncratic usages as do Australia, New Zealand, etc. In fact, when I speak to English speakers from around the world I am sometimes infected by their usage and accent and reply in an accent which is a mix of my own and the one I’m hearing. I feel terrible sometimes, because I can’t stop myself doing it.
Indian English has its own habits and colloquialisms, like using present continuous too much “I am being pleased” for example. But here we’re talking just about an accent. Why Indian is singled out might be to do with the prevalence of Indians in the tech field in the US, perhaps? Note there isn’t even Canadian as a choice… (CA French, yes… don’t get me started on that).
Wasn’t it Oscar Wilde who said England and America are two nations divided by a common language?

Loading...
David W. Boles says:

January 5, 2007 at 9:58 am

Excellent response, fruey, and I think you’re right on all counts.
I, too, love the sound of the human voice and its regional accents and international pidgins, but — like you — I, too have found myself in trouble for the imitation as being disrespectful and inconsiderate. I always thought I was complimenting the person by doing a perfect imitation of their accent and intonation.
😀
FYI… reading your blog via RSS reveals your real first name as the author of the post… if that still matters to you.

Loading...
Kathakali Chatterjee says:

January 5, 2007 at 3:09 pm

I know you are not fond of “easy” David – neither am I. AT&T have chosen the easy path though!
On the other hand, it is possible that AT&T solely followed the trend – which was not expected from them being a leader.
Talking about accent, Indian English has various kinds of accent depending on the region; I agree with fruey, I suppose the reason of choosing South Indian accent is its significant presence in the tech field in USA.

Loading...
David W. Boles says:

January 5, 2007 at 3:15 pm

I’m not much fond of “trends” either, Katha!
😀

Loading...
Kathakali Chatterjee says:

January 5, 2007 at 3:22 pm

oooof! I know…neither am I – I just tried to explain! 🙁

Loading...
David W. Boles says:

January 5, 2007 at 3:28 pm

Okay, Katha, I understand!

Loading...
vocamedia says:

June 25, 2010 at 1:25 pm

I used many voices to turn anything into podcast and audiobooks. ATT voices are great. Although 16Khz, the recording quality was superb. I also put background music to make the listening more enjoyable. I can listen hours and hours without getting bored.

Loading...
1. David W. Boles says:
  
  June 25, 2010 at 2:43 pm
  
  Thank you for your expert comment! Your new WP.com blogs really interesting.
  
  Loading...

Comments are closed.

Like this:

Related

35 Comments

Share this:

Like this:

Related

35 Comments

Discover more from David Boles, Blogs