Berkeley Engineering Home
Volume 4, Issue 4
May 2004



In This Issue
Medical Imaging by Modem

Seeing Patterns

Concrete Band-Aids for Buildings

Berkeley Engineers: Changing Our World

Dean's Digest

Archives 2004
2003
2002
2001

Lab Notes, Research from the College of Engineering

Seeing Patterns
by David Pescovitz

Printer-friendly versionPrinter-friendly version

Photo of Jordan

Professor Michael I. Jordan is also working with computer science professors David Patterson and Randy Katz to apply machine learning techniques to the Internet infrastructure. The aim is to provide routers, servers, and other hardware with self-diagnosis and self-repair capabilities. (photo courtesy the researcher)

The computers in UC Berkeley professor Michael I. Jordan's laboratory are on a treasure hunt. Loaded with the software he and his colleagues have developed, the machines are sifting through jumbles of data for gems--from the buried bugs that causes computer software to crash to the relatively small number of genes hidden in the three billion letters in human DNA. Through innovative algorithms that draw from the esoteric science of statistics, Jordan is enabling computers to recognize subtle patterns in myriad kinds of data and learn from what they uncover.

"Learning is a branch of statistics," says Jordan, who holds joint faculty positions in the Computer Science Division and the Department of Statistics. "Classical statistics generally centered around a person sitting down with some data, applying some procedure, and ending up with a pattern or hypothesis. We want to more thoroughly automate the process so computers can learn from data in a statistical sense."

The trick is developing pattern-finding algorithms that can compare various sets of historical data and identify commonalities between them. Those specific instances can then be used to generate a predictive model of what's likely to occur in the future.

One application of this kind of artificial intelligence is software program analysis, "bug isolation in an imperfect world" as computer science graduate student Ben Liblit describes it. Jordan, Liblit, graduate student Alice Zheng, and former UC Berkeley professor Alex Aiken have launched the Cooperative Bug Isolation Project, an effort to use statistical algorithms to reveal the bugs in several popular open source applications such as email, spreadsheet, and music playing software.

Users are invited to download the software from the project Web site. The applications are instrumented with special code called a "sampler" that, with the user's approval, invisibly monitors the application's behavior. Every so often, the sampler grabs the state of certain variables in the software application and "sends those numbers back to home base," Jordan says. If the application crashes on a given run, the reason should be hidden inside the data that's captured. Back at Berkeley , the algorithm compares the data from tens of thousands of crashes to suss out the events that seem to be most predictive of failure.

"Those are likely to be the smoking guns," Jordan says.

One of the most interesting characteristics of Jordan's research though is the broadness of the applications. To Jordan, data is data, even if it's the entire human genome. In fact, he and graduate student Jon McAuliffe are collaborating with Berkeley mathematics professor Lior Pachter for gene finding. Biologists now estimate that there are approximately 30,000 genes in the human genome. However, current methods that automatically scan the human genome for patterns that indicate a gene are only accurate about half the time, Jordan says. The researchers expect to beat that with a pattern-finding technique that compares the genome of multiple species.

"Our system goes through the human genome and other primate genomes at the same time," Jordan says. "If you find evidence of the start of a gene in all of the species, it's probably a gene. Finding the commonalities cuts down on the false positives."

Your Turn

Do you have a comment or question regarding this research?

We want to hear from you...

So far, Jordan and his colleagues have proven the concept by using their "toolbox" of pattern finding algorithms to locate five known genes in a dozen species. Eventually, Jordan hopes biologists will use it to tackle the entire human genome.

Perhaps artificial intelligence is finally ready for prime-time, even if it's not HAL 9000.

"It's very hard to define intelligence," Jordan says. "Because it's hard to define, it's also hard to operationalize with a computer. It's much easier to operationalize pattern recognition and learning."


Related Sites
Michael I. Jordan's home page

Cooperative Bug Isolation Project

Lior Pacther's home page


Lab Notes is published online by the Public Affairs Office of the UC Berkeley College of Engineering. The Lab Notes mission is to illuminate groundbreaking research underway today at the College of Engineering that will dramatically change our lives tomorrow.

Media contact: Teresa Moore, Lab Notes editor, Director of Public Affairs
Writer, Researcher: David Pescovitz
Web Manager: Michele Foley

Subscribe or send comments to the Engineering Public Affairs Office: lab-notes@coe.berkeley.edu.

© 2004 UC Regents. Updated 4/30/04.