Web Block Extraction System based on Client-Side Imaging for Clickable Image Map
佐野博之, 白松俊, 大囿忠親, 新谷虎松


We propose a new Web information extraction system. The outline of the system and the algorithm to extract information are explained in this paper. A typical Web page consists of multiple elements with different functionalities, such as main content, navigation panels, copyright and privacy notices, and advertisements. Visitors to Web pages need only a little of the pages. A system to extract a piece of Web pages is needed. Our system enables users to extract Web blocks only by setting clipping areas with their mouse. Web blocks are clickable image maps. Imaging and detecting hyperlink areas on client-side are used to generate image maps. The specialty of our system is that Web blocks perfect layouts and hyperlinks on the original Web pages. Users can access and manage their Web blocks via Evernote, which is a cloud storage system. And HTML snippets for Web blocks enable users to easily reuse Web contents on their own Web site.