Cost-Sensitive Feature Extraction and Selection in Genre Classification

标题	Cost-Sensitive Feature Extraction and Selection in Genre Classification
Publication Type	Journal Article
Year of Publication	2009
Authors	Levering, Ryan, and Michal Cutler
Journal	Journal for Language Technology and Computational Linguistics
音量	24
Pagination	57–72
关键词	automation, classificaiton, digital, genre, information science, web
Abstract	Automatic genre classification of Web pages is currently young comparedto other Web classification tasks. Corpora are just starting to be collected and organized in a systematic way, feature extraction techniques are incon sistent and not well detailed, genres are constantly in dispute, and novel applications have not been implemented. This paper attempts to review and make progress in the area of feature extraction, an area that we believe can benefit all Web page classification, and genre classification in particular. We first present a framework for the extraction of various Web-specific feature groups from distinct data models based on a tree of potentials models and the transformations that create them. Then we introduce the concept of cost-sensitivity to this tree and provide an algorithm for per forming wrapper-based feature selection on this tree. Finally, we apply the cost-sensitive feature selection algorithm on two genre corpora and analyze the performance of the classification results.

当前位置