Class URLStatus

java.lang.Object
   |
   +----URLStatus

public class URLStatus
extends Object

Netrand Project

Software Engineering - CS536

University of Wisconsin - Milwaukee

Authors:


File: URLStatus.java
Note: This file was originally part of a Web Crawler program written by Tim Macinta in 1997 that gathered links form the internet and formed a web search database. The files containing the logic for crawling across the internet were taken from this program and slightly modified for the purpose of the NetRand project.
This class holds information about the content at a particular URL. It can also be used to fetch and parse an URL.


Variable Index

 o actual_url
 o DUPLICATE
 o eng_prefs
 o given_url
 o IO_ERROR
 o LOADED
 o mime_type
 o MISC_ERROR
 o MISSING
 o MOVED
 o NOT_LOADED
 o status
 o temp_file
 o TIMED_OUT
 o UNSUPPORTED_MIMETYPE

Constructor Index

 o URLStatus(URL, File, EnginePrefs)
"url" is the location of the information and "temp_file" is the temporary file that can be used to store the contents of this url.

Method Index

 o finalize()
Gets rid of the temporary file.
 o getCacheFile()
Returns the file that is used to cache the contents of this URL.
 o getContentLength()
Returns the length of the content, or 0 if it's unknown.
 o getLinkExtractor()
Returns a LinkExtractor that can handle this URL's mime type.
 o loaded()
Returns true if and only if this URL was loaded without an error.
 o mimeTypeUnderstood(String)
Returns true if and only if this mime type can be processed.
 o moved()
Returns true if and only if this URL causes a redirection.
 o pipe(InputStream, OutputStream)
Pipes "in" to "out" until "in" is exhausted then closes "in".
 o readContent()
Downloads the content of the given URL and stores it in a temporary cache file.
 o readGeneric()
This method provides a fallback to the default Java implementation for protocols which have not been re-implemented.
 o readHTTP()
Downloads a file using the HTTP protocol.
 o readLine(PushbackInputStream)
A replacement for the java.io.DataInputStream which doesn't return the line ending characters like it should.

Variables

 o given_url
 URL given_url
 o actual_url
 URL actual_url
 o temp_file
 File temp_file
 o eng_prefs
 EnginePrefs eng_prefs
 o mime_type
 String mime_type
 o LOADED
 static final int LOADED
 o NOT_LOADED
 static final int NOT_LOADED
 o MOVED
 static final int MOVED
 o DUPLICATE
 static final int DUPLICATE
 o MISSING
 static final int MISSING
 o TIMED_OUT
 static final int TIMED_OUT
 o IO_ERROR
 static final int IO_ERROR
 o UNSUPPORTED_MIMETYPE
 static final int UNSUPPORTED_MIMETYPE
 o MISC_ERROR
 static final int MISC_ERROR
 o status
 int status

Constructors

 o URLStatus
 public URLStatus(URL url,
                  File temp_file,
                  EnginePrefs eng_prefs)
"url" is the location of the information and "temp_file" is the temporary file that can be used to store the contents of this url.

Methods

 o loaded
 public boolean loaded()
Returns true if and only if this URL was loaded without an error.

 o getLinkExtractor
 public LinkExtractor getLinkExtractor() throws IOException
Returns a LinkExtractor that can handle this URL's mime type. To add support for new mime types add a LinkExtractor that handles those mime types here and add appropriate WordExtractors to the getWordExtractor() method. Also, add the mime type to the list in the mimeTypeUnderstood() method.

 o mimeTypeUnderstood
 public boolean mimeTypeUnderstood(String mime_type)
Returns true if and only if this mime type can be processed.

 o getCacheFile
 public File getCacheFile()
Returns the file that is used to cache the contents of this URL.

 o readContent
 public void readContent()
Downloads the content of the given URL and stores it in a temporary cache file.

 o readHTTP
 void readHTTP() throws IOException
Downloads a file using the HTTP protocol. It was necessary to write a method to do this from scratch rather than using the default method in Java because:

 o readLine
 String readLine(PushbackInputStream in) throws IOException
A replacement for the java.io.DataInputStream which doesn't return the line ending characters like it should.

 o readGeneric
 void readGeneric() throws IOException
This method provides a fallback to the default Java implementation for protocols which have not been re-implemented.

 o finalize
 public void finalize() throws Throwable
Gets rid of the temporary file.

Throws: Throwable
is thrown
Overrides:
finalize in class Object
 o pipe
 void pipe(InputStream in,
           OutputStream out) throws IOException
Pipes "in" to "out" until "in" is exhausted then closes "in".

 o moved
 public boolean moved()
Returns true if and only if this URL causes a redirection.

 o getContentLength
 public long getContentLength()
Returns the length of the content, or 0 if it's unknown.