Tuesday, October 30, 2007

Google MT and dotSUB - an atomic combination

I wanted to do something cool for my presentation at the ATA this weekend -- Quality Still Doesn't Matter, which focuses on topics that should really matter to translators and LSPs, such as productivity, new technologies, sales, etc. -- and decided to start with a video subtitled in dotSUB.

dotSUB is a cool site that allows you to upload your video, transcribe and subtitle very fast. I actually wrote about it in the Global Watchtower almost one year ago.

So... this week I read the news about Google abandoning SYSTRAN and starting to use its own Statistical Machine Translation Engine instead. I played with it a little bit by writing some text in Portuguese and having it translated into English. I was very surprised with the quality of the translation into English. That's when I decided to really play with Google Translate and dotSUB.

Here is what I did:

1) Wrote a script in Portuguese.
2) Had it translated into English with Google Translate.
3) Read the script to my webcam.
4) Pasted the Portuguese text into dotSUB.
5) Pasted the English transaltion into dotSUB.

All of this took me no more than 10 minutes to do.

Then I decided to have some fun. I edited the English translation a little bit and used Google Translate to go from English into Arabic, Spanish, French, Italian, Japanese, Chinese, and Russian.

I found that the quality of the translation was much better than I expected. I can judge Spanish, French, and Italian. I asked someone here in the office to check the Russian, but I have no idea of how the Japanese, Chinese, and Arabic came out. I don't even know if my visual pasting of the subtitles didn't break any words in the middle.

Check out for yourself. Try changing the languages using the small arrows in the bottom right corner of the video.




What I am going to say in my presentation is essentially that translators that are not using Google to pre-process their jobs, are doing too much work. MT is here to stay... as I had predicted a couple of years ago, this can be the disruptive player in the market.

Look out for mash-ups of Translation Memory technologies with Google from Elanex, XML-Intl, Proz.com, and other players. I can now see huge projects incorporating all these new technologies: a Ning portal for discussion and training, Google Translate for preprocessing translations, a shared translation memory repository from LingoTek, a wiki in wikidot.com for editing the translations in a collaborative way. These are all free technologies that would allow a company to manage a huge project in a much more efficient way than using the tools of today. All of this could be managed in ]Project Open[ and the sales process might have been tracked in FreeCRM or SugarCRM.