Class HTMLLinkExtractor
java.lang.Object
|
+----HTMLLinkExtractor
- public class HTMLLinkExtractor
- extends Object
- implements LinkExtractor
Netrand Project
Software Engineering - CS536
University of Wisconsin - Milwaukee
Authors:
- Spring 1998 - Francis William Kasper
File: HTMLLinkExtractor.java
Note:
This file was originally part of a Web Crawler program written by
Tim Macinta in 1997 that gathered links form the internet and formed
a web search database. The files containing the logic for crawling
across the internet were taken from this program and slightly modified
for the purpose of the NetRand project.
This LinkExtractor can extract URLs from HTML files.
-
base
-
-
next_url
-
-
url_count
-
-
urls
-
-
HTMLLinkExtractor(File, URL)
- Creates a new HTMLLinkExtractor that will enumerate all the
URLs in the give "cache_file".
-
addURL(URL)
- Adds "url" to the list of URLs.
-
analyze(String)
- Analyzes "param", which should be the contents between a '<' and a '>',
and adds any URLs that are found to the list of URLs.
-
analyzeAnchor(String)
- Analyzes the tag.
-
analyzeFrame(String)
- Analyzes the tag.
-
extract(String, String)
- Returns the value in "line" associated with "key", or null if "key"
is not found.
-
extractBase(String)
- Extracts the base URL from the tag.
-
hasMoreElements()
-
-
nextElement()
-
-
reset()
- Resets this enumeration.
urls
Vector urls
next_url
int next_url
url_count
int url_count
base
URL base
HTMLLinkExtractor
public HTMLLinkExtractor(File cache_file,
URL base_url) throws IOException
- Creates a new HTMLLinkExtractor that will enumerate all the
URLs in the give "cache_file".
analyze
public void analyze(String param)
- Analyzes "param", which should be the contents between a '<' and a '>',
and adds any URLs that are found to the list of URLs.
analyzeAnchor
void analyzeAnchor(String anchor)
- Analyzes the tag.
analyzeFrame
void analyzeFrame(String frame)
- Analyzes the tag.
extractBase
void extractBase(String b)
- Extracts the base URL from the tag.
addURL
public void addURL(URL url)
- Adds "url" to the list of URLs.
hasMoreElements
public boolean hasMoreElements()
nextElement
public Object nextElement()
reset
public void reset()
- Resets this enumeration.
extract
String extract(String line,
String key)
- Returns the value in "line" associated with "key", or null if "key"
is not found. For instance, if line were "a href="blah blah blah"
and "key" were "href" this method would return "blah blah blah".
Keys are case insensitive.