Class Indexer

java.lang.Object
   |
   +----java.lang.Thread
           |
           +----Indexer

public class Indexer
extends Thread

Netrand Project

Software Engineering - CS536

University of Wisconsin - Milwaukee

Authors:


File: Indexer.java
Note: This file was originally part of a Web Crawler program written by Tim Macinta in 1997 that gathered links form the internet and formed a web search database. The files containing the logic for crawling across the internet were taken from this program and slightly modified for the purpose of the NetRand project.
The Indexer is a thread which can index URLs that have been cached using the URLStatus class. Use the queueURL() method to add cached URLs to the Indexer's list of URLs. Once the start() method is called, the Indexer will start processing URLs in its queue. More URLs can also be added after calling start, in fact this may be the best way to use the Indexer. Calling the stopWhenDone() method will cause the Indexer thread to stop as soon as its queue empties.


Variable Index

 o crawler
 o exit_when_done
 o prefs
 o q
 o q_mutex
 o running
 o total_bytes
 o working_dir

Constructor Index

 o Indexer(File, Crawler, EnginePrefs)
"working_dir" should be a directory that only this Indexer and a given Cralwer will be accessing.

Method Index

 o addNewURLs(LinkExtractor)
Adds new URLs to the crawler's queue.
 o cleanUp()
Removes all ".tmp" files in the directory "working_dir".
 o queueURL(URLStatus)
Use this method to add a cached url to the Indexer.
 o run()
This is where the actual indexing is done.
 o start()
Starts the Indexer.
 o stopWhenDone(boolean)
Causes this Indexer to stop whenever it finishes indexing the URLs in its queue.

Variables

 o working_dir
 File working_dir
 o q
 FIFOQueue q
 o q_mutex
 Object q_mutex
 o running
 boolean running
 o crawler
 Crawler crawler
 o prefs
 EnginePrefs prefs
 o exit_when_done
 boolean exit_when_done
 o total_bytes
 long total_bytes

Constructors

 o Indexer
 public Indexer(File working_dir,
                Crawler crawler,
                EnginePrefs prefs)
"working_dir" should be a directory that only this Indexer and a given Cralwer will be accessing. This means that if several Indexers are running simultaneously, they should all be given different "working_dir" directories. Also, no other threads should write to this directory (except for the selected Crawler).

Methods

 o queueURL
 public void queueURL(URLStatus url)
Use this method to add a cached url to the Indexer.

 o start
 public void start()
Starts the Indexer.

Overrides:
start in class Thread
 o run
 public void run()
This is where the actual indexing is done.

Overrides:
run in class Thread
 o stopWhenDone
 public void stopWhenDone(boolean exit_when_done)
Causes this Indexer to stop whenever it finishes indexing the URLs in its queue.

 o cleanUp
 void cleanUp()
Removes all ".tmp" files in the directory "working_dir".

 o addNewURLs
 void addNewURLs(LinkExtractor urls)
Adds new URLs to the crawler's queue.