Class URLStatus
java.lang.Object
|
+----URLStatus
- public class URLStatus
- extends Object
Netrand Project
Software Engineering - CS536
University of Wisconsin - Milwaukee
Authors:
- Spring 1998 - Francis William Kasper
File: URLStatus.java
Note:
This file was originally part of a Web Crawler program written by
Tim Macinta in 1997 that gathered links form the internet and formed
a web search database. The files containing the logic for crawling
across the internet were taken from this program and slightly modified
for the purpose of the NetRand project.
This class holds information about the content at a particular URL.
It can also be used to fetch and parse an URL.
-
actual_url
-
-
DUPLICATE
-
-
eng_prefs
-
-
given_url
-
-
IO_ERROR
-
-
LOADED
-
-
mime_type
-
-
MISC_ERROR
-
-
MISSING
-
-
MOVED
-
-
NOT_LOADED
-
-
status
-
-
temp_file
-
-
TIMED_OUT
-
-
UNSUPPORTED_MIMETYPE
-
-
URLStatus(URL, File, EnginePrefs)
- "url" is the location of the information and "temp_file" is the
temporary file that can be used to store the contents of this
url.
-
finalize()
- Gets rid of the temporary file.
-
getCacheFile()
- Returns the file that is used to cache the contents of this URL.
-
getContentLength()
- Returns the length of the content, or 0 if it's unknown.
-
getLinkExtractor()
- Returns a LinkExtractor that can handle this URL's mime type.
-
loaded()
- Returns true if and only if this URL was loaded without an error.
-
mimeTypeUnderstood(String)
- Returns true if and only if this mime type can be processed.
-
moved()
- Returns true if and only if this URL causes a redirection.
-
pipe(InputStream, OutputStream)
- Pipes "in" to "out" until "in" is exhausted then closes "in".
-
readContent()
- Downloads the content of the given URL and stores it in a temporary
cache file.
-
readGeneric()
- This method provides a fallback to the default Java implementation
for protocols which have not been re-implemented.
-
readHTTP()
- Downloads a file using the HTTP protocol.
-
readLine(PushbackInputStream)
- A replacement for the java.io.DataInputStream which doesn't return
the line ending characters like it should.
given_url
URL given_url
actual_url
URL actual_url
temp_file
File temp_file
eng_prefs
EnginePrefs eng_prefs
mime_type
String mime_type
LOADED
static final int LOADED
NOT_LOADED
static final int NOT_LOADED
MOVED
static final int MOVED
DUPLICATE
static final int DUPLICATE
MISSING
static final int MISSING
TIMED_OUT
static final int TIMED_OUT
IO_ERROR
static final int IO_ERROR
UNSUPPORTED_MIMETYPE
static final int UNSUPPORTED_MIMETYPE
MISC_ERROR
static final int MISC_ERROR
status
int status
URLStatus
public URLStatus(URL url,
File temp_file,
EnginePrefs eng_prefs)
- "url" is the location of the information and "temp_file" is the
temporary file that can be used to store the contents of this
url.
loaded
public boolean loaded()
- Returns true if and only if this URL was loaded without an error.
getLinkExtractor
public LinkExtractor getLinkExtractor() throws IOException
- Returns a LinkExtractor that can handle this URL's mime type.
To add support for new mime types add a LinkExtractor that handles
those mime types here and add appropriate WordExtractors to the
getWordExtractor() method. Also, add the mime type to the list in
the mimeTypeUnderstood() method.
mimeTypeUnderstood
public boolean mimeTypeUnderstood(String mime_type)
- Returns true if and only if this mime type can be processed.
getCacheFile
public File getCacheFile()
- Returns the file that is used to cache the contents of this URL.
readContent
public void readContent()
- Downloads the content of the given URL and stores it in a temporary
cache file.
readHTTP
void readHTTP() throws IOException
- Downloads a file using the HTTP protocol. It was necessary to
write a method to do this from scratch rather than using the default
method in Java because:
- There is no means for specifying the user agent
using the default method.
- There is a bug in Java 1.0 implementation that makes
it incompatible with HTTP version 1.1.
- Redirects are automatically followed (at least in
Java 1.0) without providing a way to determine
whether a redirect has occured.
readLine
String readLine(PushbackInputStream in) throws IOException
- A replacement for the java.io.DataInputStream which doesn't return
the line ending characters like it should.
readGeneric
void readGeneric() throws IOException
- This method provides a fallback to the default Java implementation
for protocols which have not been re-implemented.
finalize
public void finalize() throws Throwable
- Gets rid of the temporary file.
- Throws: Throwable
- is thrown
- Overrides:
- finalize in class Object
pipe
void pipe(InputStream in,
OutputStream out) throws IOException
- Pipes "in" to "out" until "in" is exhausted then closes "in".
moved
public boolean moved()
- Returns true if and only if this URL causes a redirection.
getContentLength
public long getContentLength()
- Returns the length of the content, or 0 if it's unknown.