Project Name

Web Intelligence

Member

Yuasa, Niwa (2013)

Sano, Yuasa, Niwa (2012)

Keyword

Web Page Segmentation, Web Agent, Web Data Structuring

Purpose

We aim to extract Web data structure based on viewpoints of users to improve reusing Web contents.

Outline

We study a new Web page segmentation method to extract the semantic structure from a Web page. A typical Web page consists of multiple elements with different functionalities, such as main content, navigation panels, copyright and privacy notices, and advertisements, and Web page segmentation is the division of the page into visually and semantically cohesive pieces. We call the pieces ''Web Blocks''.

We investigated viewpoints of web visitors to develop a new segmentation method. The web agent has implemented to collect examples of web blocks from many web visitors. And we proposed a new segmentation method. The method first divides a Web page into minimum blocks. Next, the method assembles minimum blocks into Web content blocks by using title blocks. While the minimum blocks can play many roles, in this study we have focused on the those that are the titles of various Web content bits. Web page designers assign a title block to each Web content on a page to make for easy reading, and these title blocks can be used as separators to segment the different parts of a Web page. The method assembles an initial title block followed by consecutive non-title blocks below it.


Copyright (c) 2012-2013 Shintani Lab. All rights reserved.