PhD Thesis: Web Crawling
About the thesis
Title: Effective Web Crawling
Author: Carlos Castillo
Advisor: Ricardo Baeza-Yates
Comitee: Mauricio Marin, Alistair Moffat, Gonzalo Navarro, Nivio Ziviani
Date: November 2004
Download the full thesis
Ph.D. Thesis: EFFECTIVE WEB CRAWLING by Carlos Castillo [4Mb, 180 pags.]
Download the thesis by chapters
Thesis divided by chapters. Main chapters are marked in bold.
- Front matter: cover, indexes, etc.
- Introduction
- Related work, bibliographic review of Web crawling: state of the art in Web crawling, survey.
- A new crawling model and architecture: framework and classification of Web crawlers.
- Scheduling algorithms for effective Web crawling: long-term and short-term scheduling.
- Crawling the infinite Web, when to stop crawling: dynamic Web sites can be unbounded.
- Cooperation schemes for Web servers: to improve their representation in the search engine.
- Crawler implementation: algorithms and data structures.
- Web characterization: study of the Chilean Web.
- Conclusions
- Appendix: Practical Web crawling problems, Web crawling in practice: practical issues and caveats of Web crawling.
Abstract was published by ACM SIGIR Forum: "Effective Web Crawling (Doctoral Abstract)". ACM SIGIR Forum 55 Vol.39 No. 1, pp. 55-56. June 2005. [acm]
This thesis is part of the WIRE project for developing an open-source Web Information Retrieval Environment.