Project Name

Intelligent Web Project

Member

Hiroyuki Sano, Tomotaka Tsujino, Tatsuya Doi, Ryoji Suzuki

Keyword

Web Page Segmentation, Web mining, Structured Web Information, Information Extraction

Purpose

Web Page Segmentation and Analysis.

Outline

There are some problems for machines to extract or search in Web pages, because Web pages are written in HTML, which is semi-structured documents.

Our gorl is that we establish an algorithm to struct Web information on existing Web pages based on visitors' angle and a new technology to support Web browsing more effectively. A web page contains many contents, for example, main contents, site logo, advertisements, site menu, and so on. We define these Web contents which have some information as "Web Block". Not only static contents, but also dynamic contents provided by Web services belongs to Web Block. We propose an algorithm to extract Web Block from existing Web pages and to convert html documents into structured documents, "WBML(Web Block Markup Language)".

To verify that our algorithm is useful, we will implement a management system for Web Block. Web block will be managed by Intelligent agents in the system. If we mix this our technology with semantic web technology, Web contents disposal will be more powerful. And new research domain for re-using existing Web information will be appear.


Copyright (c) 2010 Shintani Lab. All rights reserved.