The Daily Pennsylvanian is a student-run nonprofit.

Please support us by disabling your ad blocker on our site.

In a discreet office complex on 36th and Market streets, a group of linguists are on the cutting edge of language analysis. The Linguistic Data Consortium, directed by Linguistics Professor Mark Liberman, was established in 1992 as a research center that "creates, collects and distributes speech and text databases, lexicons and other resources." One of only eight such programs in the nation, the consortium has undertaken such tasks as putting dictionaries of the Mandarin Chinese and Russian languages on the World Wide Web. It is also analyzing hundreds of gigabytes of speech to give companies data to produce voice-activated computers. And the center is providing data for improving such automated-voice systems as 411 information numbers. Among its diverse purposes are collecting speech samples from volunteers around the country, analyzing speech from broadcast news programs and making the computerized data gathered available to other consortium members. Institutions that subscribe to the LDC's research service include Carnegie Mellon University and the Massachusetts Institute of Technology, as well as companies like AT&T; and IBM. The LDC, funded in part by the National Science Foundation, also provides "multilingual data for teaching purposes," according to Liberman. "It will make a difference in the way language teaching is done at Penn and elsewhere," he said. Students from both Penn and Drexel University work on the "shop floor" of the LDC, entering the lexicon data as well as transcribing multilingual speech samples. Anne Johnson, a 1996 College graduate, and College sophomore Renata Pavlovic spent Wednesday entering a Polish dictionary into the LDC computer system. "It's really interesting," said Pavlovic, who is proficient or fluent in five languages and chose the job because of her interest and skills in languages and computers. Johnson, who majored in Anthropology and is fluent in Russian, started the job as an undergraduate in part to learn to cope with an increasingly "digitized world." David Miller, a Germanic Languages graduate student, also works at the LDC, juggling his part-time job with another job as an administrative fellow at the Modern Languages College House. "It's laid back," said Miller, who emphasized the research consortium's unique work. Lead programmer David Graff further explained the nonprofit consortium's "phenomenal system." "We have three major funded projects at the moment," Graff said. In one, LDC workers go to various parts of the country to sample volunteers' voices on a variety of different phones. The data collected from a "socially homogenous group" is then entered into the computer and made available to consortium members -- who pay either $2,000 or $20,000 to access the LDC research depending on the institution's profit status. Liberman -- one of the founders of the LDC -- commented on the access to the "very large archives" that the consortium creates for its members. "We make it possible for a large number of organizations to get transcribed data on terms that they wouldn't be able to negotiate individually," Liberman said. "Our most important function is as an intermediary for intellectual property rights." Using the LDC system, members can view the data online before committing major research dollars to projects. "We're like a publishing clearinghouse," Liberman said, emphasizing the LDC's ability to provide CD-ROMs of speech information to members at a reasonable price because of "low overhead" costs. "There remains a need [for computerized data] that we fill," he added.

Comments powered by Disqus

Please note All comments are eligible for publication in The Daily Pennsylvanian.