We are looking
primarily at Internet-scale searching, and trying to understand the
ways in which information on that scale develops, working on the
core technologies for providing the most relevant and freshest
search results possible. These technologies and research aims
include:
- Crawling: finding the raw documents to search
- Duplicate detection and suppression: finding documents at most
once
- Spam detection: finding documents to exclude completely
- Indexing: locating documents from keywords
- Relevancy: locating the most useful documents
In addition, we look at theoretical models for the Web, trying to
abstract the properties of the Web graph that link pages together.
This allows us to evaluate algorithms in an abstract setting without
the distractions of spam, duplicates, aliases, crawling
restrictions, and other pragmatic concerns.
|