Class HTMLLinkExtractor

java.lang.Object
   |
   +----HTMLLinkExtractor

public class HTMLLinkExtractor
extends Object
implements LinkExtractor

Netrand Project

Software Engineering - CS536

University of Wisconsin - Milwaukee

Authors:


File: HTMLLinkExtractor.java
Note: This file was originally part of a Web Crawler program written by Tim Macinta in 1997 that gathered links form the internet and formed a web search database. The files containing the logic for crawling across the internet were taken from this program and slightly modified for the purpose of the NetRand project.
This LinkExtractor can extract URLs from HTML files.


Variable Index

 o base
 o next_url
 o url_count
 o urls

Constructor Index

 o HTMLLinkExtractor(File, URL)
Creates a new HTMLLinkExtractor that will enumerate all the URLs in the give "cache_file".

Method Index

 o addURL(URL)
Adds "url" to the list of URLs.
 o analyze(String)
Analyzes "param", which should be the contents between a '<' and a '>', and adds any URLs that are found to the list of URLs.
 o analyzeAnchor(String)
Analyzes the tag.
 o analyzeFrame(String)
Analyzes the tag.
 o extract(String, String)
Returns the value in "line" associated with "key", or null if "key" is not found.
 o extractBase(String)
Extracts the base URL from the tag.
 o hasMoreElements()
 o nextElement()
 o reset()
Resets this enumeration.

Variables

 o urls
 Vector urls
 o next_url
 int next_url
 o url_count
 int url_count
 o base
 URL base

Constructors

 o HTMLLinkExtractor
 public HTMLLinkExtractor(File cache_file,
                          URL base_url) throws IOException
Creates a new HTMLLinkExtractor that will enumerate all the URLs in the give "cache_file".

Methods

 o analyze
 public void analyze(String param)
Analyzes "param", which should be the contents between a '<' and a '>', and adds any URLs that are found to the list of URLs.

 o analyzeAnchor
 void analyzeAnchor(String anchor)
Analyzes the tag.

 o analyzeFrame
 void analyzeFrame(String frame)
Analyzes the tag.

 o extractBase
 void extractBase(String b)
Extracts the base URL from the tag.

 o addURL
 public void addURL(URL url)
Adds "url" to the list of URLs.

 o hasMoreElements
 public boolean hasMoreElements()
 o nextElement
 public Object nextElement()
 o reset
 public void reset()
Resets this enumeration.

 o extract
 String extract(String line,
                String key)
Returns the value in "line" associated with "key", or null if "key" is not found. For instance, if line were "a href="blah blah blah" and "key" were "href" this method would return "blah blah blah".

Keys are case insensitive.