Here is my initial review of Google’s new Aloud service for auto-dubbing

March 14, 2022

Google’s Aloud auto-dubs your English video in Castilian or Portuguese, free 6

A couple of weeks ago, I announced that the renowned Descript service can finally auto-transcribe audio or audiovisual recordings originally made in Castilian, Catalán and 20 other languages (article here). Now, just a couple of weeks later, Google has announced its free Aloud service, which can automatically transcribe original English language videos, allow you to correct the auto-transcription manually and then —nearly instantly— translate it into either Castilian or Portuguese and then overdub it (not subtitle it) with humanesque-sounding artificial voices in the matching target language, with more languages coming soon. Both Descript and Google currently misname the Castilian language as “Spanish”, a topic covered in several of my books, radio shows/podcasts and a bilingual song. Ahead, I’ll cover the new Aloud service from Google and give my commentary about it.

Aloud transcribes your video. (Of course, it could be an audiogram made from your audio-only content. I have covered audiograms in past articles.)
You review and edit the auto-transcription (which takes a fraction of the time it would take for human being to do it).
Aloud automatically translates and dubs your video (or audiogram). (So far, Google hasn’t mentioned whether there is any option to review and edit the translation, which –as all bilinguals and certified translators know, is essential to achieve a proper translation. I certainly hope that Google offers that capability and simply wanted to make the marketing spiel work in three simple steps.)

Observations about the sample dubs supplied by Google

I only analyzed the Castilian dubs, which Google imprecisely calls “Spanish” dubs. As I have covered in several books (including The Castilian Conspiracy and La conspiración del castellano), radio shows/podcasts (CapicúaFM and SpeakCastilian) and a recent bilingual song (Let it be called Castilian — Debe llamarse castellano), there are actually at least six official languages in Spain, all of which are Spanish languages. Each of the official Spanish languages has a specific and unique name. The most widespread of all of the official Spanish languages is properly named Castilian (castellano). By calling the Castilian language “Spanish”, both Descript and Google are sadly fomenting the coverup about the attempted linguicide (linguistic genocide) from the past century, which I have covered in detail in my books. The proper Castilian (castellano) name is actually protected by law in eight countries, seven of which do so in their respective Constitutions and one in its Federal law. Both Descript and Google should read my books (which contain all of the evidence) and correct the language name accordingly, ASAP.

For both of the current target languages, Google’s Aloud offers two different regional “accents” in the humanesque artificial voices to achieve the maximum acceptability within a region. Just as there are different accents in the English language spoken in Australia, the Bahamas, Canada, South Africa, the United Kingdom and in the United States, there are different ones in all of the countries where Castilian is spoken, which includes most countries in the Americas, Spain, Equatorial Guinea and some regions in the Philippines. Of course, just as there are different accents within the United States (a Bostonian sounds very different from a Texan), there are many different accents between (for example) Argentina, Chile, Cuba, Colombia, Dominican Republic, México, Perú, Puerto Rico or Venezuela. Among those mentioned Castilian accents and slang (and several others), I can immediately distinguish them. In the particular cases of Colombia and Venezuela, I can even distinguish between internal regional accents. However, at least for now, Google’s Aloud didn’t attempt to offer so many varieties, Instead, Google chose to offer two accents for the Castilian language and two for Portuguese.

Just as Google’s Aloud currently misnames the Castilian language as “Spanish”, Google Aloud currently misnames its generic subclassification for Castilian for the Américas as “Spanish-United States”. For those who are unaware, by population, the United States is number two in the worldwide statistics of Castilian-speaking countries, second only to México, which is number one in population. After the US-intervention in México resulting in the Treaty of Guadalupe Hidalgo, signed on February 2, 1848, México ceded about 50% of its territory to the United States. As a result, an enormous amount of Castilian-speaking Mexicans suddenly became part of the United States, without having to move. That of course increased the percentage of of Castilian-speakers in the United States instantly and perhaps more drastically than any Hispanic immigration after 1848. According to my research, as of 2018, approximately 62% of the US’s Hispanic population were of Mexican origin. Another 9.6% were of Puerto Rican origin, with about 3.9% each of Cuban and Salvadorian and 3.4% Dominican origins. The remainder were of other Central American or of South American origin, or of origin directly from Spain. Two thirds of all Hispanics living in the US were born in the United States.

The above statistics probably explain why Aloud’s so-called “Spanish-United States” sounds so “Mexican” to me. Although —as I explained above, I can distinguish between many Castilian accents countries from the Americas, I can only distinguish regional accents within two countries: Colombia and Venezuela. That’s why, in preparation for this article, I asked two Mexican friends to analyze Aloud’s so-called “Spanish-United States” artificial voices to get their opinions about how they sounded to them, and perhaps which region of México it most resembles. I asked Ana Cristina Pérez de la Mora (a professional video editor I interviewed on CapicúaFM in 2017) and Memo Sauceda, an Emmy award winning actor who was a panelist on CapicúaFM in 2020 to analyze the Castilian accents used by multiple characters played by actor Santiago Cabrera on Star Trek: Picard. Ana Cristina is from the capital —México City— and Memo is originally from Monterrey. Memo said that although the Google/Aloud example was too short to be sure, his best guess would be an accent from México City. Ana Cristina emphasized that even though it didn’t sound exactly like an authentic accent from Mexico City per se, she believes that the artificial voice is designed to be a generic Castilian accent which does share some similarities with the Castilian spoken in that city, which is in the central part of México.

To that I will add the following: Google properly used a Seseo pronunciation for that one and a Distinction pronunciation for the Iberian Castilian accent. That is good for Google/Aloud.

Listen to episode 1 of SpeakCastilian (illustrated above) to get a full explanation of the difference between the Ceseo, Distinction and Seseo pronunciations.

However, Google completely messed up the pronunciation of the word vídeo in the Iberian Castilian version of the dub (although they may have fixed it by the time you listen to it). There are two acceptable spellings of the Castilian word for video. In the Americas, the Castilian word is spelled the same as in English (video) but is pronounced quite differently. In Castilian in the Americas, the emphasized syllable is the second-to last: vi-DE-o, since that is the default emphasized syllable for Castilian words that do not end in a consonant (other than n or s) and does not have any written accent mark to override that rule.

In Iberian Castilian (Spain), the word video has an accent mark on the i (vídeo) and the emphasized syllable is the third to last (i.e. the first one), just as in English: VI-de-o. So the current artificial voice that Aloud assigns to Iberian Castilian sounds like a guy from Madrid trying to imitate how a guy from the Americas would say that particular word (video), even though the rest of his pronunciation is indeed authentic.

Opinions about the Aloud service

I am glad to see the disclaimer Google/Aloud put under one of the dubbed video’s on its Aloud page:

This video has been dubbed using an artificial voice to increase accessibility.

From that perspective, it is true. Google states that

80% of the world doesn’t speak English.

Given that, it’s great to make content understandable to more people. However, I am concerned that Google/Alould has not (yet) mentioned any capability to edit/correct the automatic translation. So far, Google/Aloud only talks about editing/correcting the automatic transcription. As all professional translators know, automatic translations are getting better all the time, but they are still not perfect. I hope that Google/Aloud will both offer that option and encourage all translations to be verified by a human professional translator.

Even if Google/Aloud does that, there are still other precautions:

As is, the service does not allow different voices for multiple voices/characters. It offers a single voice for everything, which is fine for content whose original version uses a single voice, but not appropriate for multiple voices.
As is, the service does not offer human emotion as desired in the original performance.

I would summarize by saying that Aloud is appropriate for informational content which uses a single voice as long as it offers the possibility to correct/edit the auto-translation before the autodubbing occurs. Aloud is certainly not as good as using a professional voice talent even for those cases, but it is much better than not doing translating/localizing the content at all. Finally, Aloud is quite far from being used with dramatic material with multiple voices, something that our friends at Centauro have been doing for many decades for theatrical films and more.

To clarify: Aloud is for automatic dubbing, not for subtitles. For many years, Google’s YouTube has offered free and automatic transcription (which is correctable/editable) and automatic translation for subtitles. Descript is for automatic transcription which can be edited/corrected. Descript delivers a final transcript for subtitling and other purposes, but does not currently offer dubbing. However, the transcript could be used as a source for translation and later dubbing using professional voice talent. Descript also offers automatic filler word elimination (i.e. ums). After removing them from the text, Descript removes it from the audio also.

For more information about Google’s Aloud service, click here.

I thank James Cridland of Podnews.net for informing me about Aloud via his bulletin.

(Re-)Subscribe for upcoming articles, reviews, radio shows, books and seminars/webinars

Stand by for upcoming articles, reviews, books and courses by subscribing to my bulletins.

In English:

Email bulletins, bulletins.AllanTepper.com
In Telegram, t.me/TecnoTurBulletins
Twitter (bilingual), AllanLTepper

En castellano:

Boletines por correo electrónico, boletines.AllanTepper.com
En Telegram, t.me/boletinesdeAllan
Twitter (bilingüe), AllanLTepper

Most of my current books are at books.AllanTepper.com, and also visit AllanTepper.com and radio.AllanTepper.com.

FTC disclosure

There is currently no financial relationship between Google and Allan Tépper or TecnoTur LLC (other than the fact that Allan Tépper and TecnoTur LLC has purchased products and services from Google and that Allan Tépper receives book royalties from Google Books). Some of the other manufacturers listed above have contracted Tépper and/or TecnoTur LLC to carry out consulting and/or translations/localizations/transcreations. Many of the manufacturers listed above have sent Allan Tépper review units. So far, none of the manufacturers listed above is/are sponsors of the TecnoTur, BeyondPodcasting, CapicúaFM or TuSaludSecreta programs, although they are welcome to do so, and some are, may be (or may have been) sponsors of ProVideo Coalition magazine. Some links to third parties listed in this article and/or on this web page may indirectly benefit TecnoTur LLC via affiliate programs. Allan Tépper’s opinions are his own. Allan Tépper is not liable for misuse or misunderstanding of information he shares.