Penn and Google preserve endangered languages

Anyone can submit text, audio and video files to the online database

· July 11, 2012, 9:04 pm

Share This

With about half of the world’s 7,000 languages threatened with extinction by the year 2100, Penn Libraries is collaborating with Google and other organizations to preserve them in the Endangered Languages Project.

The initiative uses technology to support both language documentation and revitalization efforts, Google project manager Jason Rissman wrote in an email. “It’s unique in allowing individuals from around the world to contribute directly to the effort, enabling them to accelerate research about these languages as well as use the internet to preserve and share documentation of them.”

Anyone with knowledge about endangered languages can submit text, audio and video files to the online database, he added.

David McKnight, director of Penn’s Rare Books and Manuscripts Library, and Penn Museum librarian John Weeks contributed most of Penn’s rare Berendt manuscripts, a 183-item collection of more than 40 Mexican and Central American languages.

“Many of these languages are now moribund or extinct, making the collection one of the most important of its kind,” Weeks said in a statement. Lyle Campbell, a linguistics professor at University of Hawaii at Manoa, had suggested Google contact Weeks to obtain this digitized data.

“They would have preferred language recordings, but the transcriptions are just as important,” Weeks wrote in an email.

Linguistics professor Mark Liberman sees the Endangered Languages Project as a worthwhile cataloguing initiative, but feels it does not add to existing documentation and that there is “a critical issue that this effort doesn’t really address,” he wrote in an email.

“A key problem for today’s endangered languages is that few if any of them are written or read to any significant extent,” he wrote. While oral communication such as story-telling, poetry and rhetoric readily exist, “the amount of these ‘oral texts’ that are being recorded and transcribed for the historical record — including for the descendents of speakers of the language — is in most cases pathetically small.”

He added that documenting a language — classical Latin for example — requires about 500 hours of diverse recordings.

“For how many endangered languages do we have 500 hours of diverse transcribed and analyzed recordings? Essentially none,” he said. “For how many is a process under way for creating this? The answer is a depressing one.”

Although Google “collecting links to existing documentation of many endangered languages is a good thing to do,” Liberman added, “the trouble is that this documentation is inadequate in almost all cases.”

To obtain more recorded documentation in Papua New Guinea, Liberman and Steven Bird, associate director of the Linguistic Data Consortium, suggested using a technique called “basic oral language documentation,” in which digital voice recorders are put in the hands of speakers of endangered languages.

Liberman and Bird believe the technique would help further the goals of the Endangered Languages Project, whose website states that “with every language that dies we lose an enormous cultural heritage … and most importantly, we lose the expression of communities’ humor, love and life. In short, we lose the testimony of centuries of life.”

Comments powered by Disqus