Class Indexer
java.lang.Object
|
+----java.lang.Thread
|
+----Indexer
- public class Indexer
- extends Thread
Netrand Project
Software Engineering - CS536
University of Wisconsin - Milwaukee
Authors:
- Spring 1998 - Francis William Kasper
File: Indexer.java
Note:
This file was originally part of a Web Crawler program written by
Tim Macinta in 1997 that gathered links form the internet and formed
a web search database. The files containing the logic for crawling
across the internet were taken from this program and slightly modified
for the purpose of the NetRand project.
The Indexer is a thread which can index URLs that have been
cached using the URLStatus class. Use the queueURL() method
to add cached URLs to the Indexer's list of URLs. Once the
start() method is called, the Indexer will start processing
URLs in its queue. More URLs can also be added after calling
start, in fact this may be the best way to use the Indexer.
Calling the stopWhenDone() method will cause the Indexer
thread to stop as soon as its queue empties.
-
crawler
-
-
exit_when_done
-
-
prefs
-
-
q
-
-
q_mutex
-
-
running
-
-
total_bytes
-
-
working_dir
-
-
Indexer(File, Crawler, EnginePrefs)
- "working_dir" should be a directory that only this
Indexer and a given Cralwer will be
accessing.
-
addNewURLs(LinkExtractor)
- Adds new URLs to the crawler's queue.
-
cleanUp()
- Removes all ".tmp" files in the directory "working_dir".
-
queueURL(URLStatus)
- Use this method to add a cached url to the Indexer.
-
run()
- This is where the actual indexing is done.
-
start()
- Starts the Indexer.
-
stopWhenDone(boolean)
- Causes this Indexer to stop whenever it finishes indexing the URLs
in its queue.
working_dir
File working_dir
q
FIFOQueue q
q_mutex
Object q_mutex
running
boolean running
crawler
Crawler crawler
prefs
EnginePrefs prefs
exit_when_done
boolean exit_when_done
total_bytes
long total_bytes
Indexer
public Indexer(File working_dir,
Crawler crawler,
EnginePrefs prefs)
- "working_dir" should be a directory that only this
Indexer and a given Cralwer will be
accessing. This means that if several Indexers are running
simultaneously, they should all be given different "working_dir"
directories. Also, no other threads should write to this
directory (except for the selected Crawler).
queueURL
public void queueURL(URLStatus url)
- Use this method to add a cached url to the Indexer.
start
public void start()
- Starts the Indexer.
- Overrides:
- start in class Thread
run
public void run()
- This is where the actual indexing is done.
- Overrides:
- run in class Thread
stopWhenDone
public void stopWhenDone(boolean exit_when_done)
- Causes this Indexer to stop whenever it finishes indexing the URLs
in its queue.
cleanUp
void cleanUp()
- Removes all ".tmp" files in the directory "working_dir".
addNewURLs
void addNewURLs(LinkExtractor urls)
- Adds new URLs to the crawler's queue.