Welcome to Project Solutions !

Project Solutions is the leading final year engineering project providers for IT and Computer Science students across India

FoCUS: Learning to Crawl Web Forums

NameFoCUS: Learning to Crawl Web Forums
Categorydata mining

In this paper, we present FoCUS (Forum Crawler Under Supervision), a supervised web-scale forum crawler. The goal of FoCUS is to only trawl relevant forum content from the web with minimal overhead. Forum threads contain information content that is the target of forum crawlers. Although forums have different layouts or styles and are powered by different forum software packages, they always have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages. Based on this observation, we reduce the web forum crawling problem to a URL type recognition problem and show how to learn accurate and effective regular expression patterns of implicit navigation paths from an automatically created training set using aggregated results from weak page type classifiers. Robust page type classifiers can be trained from as few as 5 annotated forums and applied to a large set of unseen forums. Our test results show that FoCUS achieved over 98% effectiveness and 97% coverage on a large set of test forums powered by over 150 different forum software packages.

is ieee
ieee paper year2013
price rangehigh
Share this google icon

Get a Call Back

related projects

more projects+


Total 0 comments.
    view all

Post your comment

(It will not be published)