%0 Conference Paper %B 35th Annual Hawaii International Conference on System Sciences %D 2002 %T Towards Automatic Web Genre Identification %A Rehm, Georg %K automatic detection %K classification %K corpus %K genre %K personal homepage %K web %X We argue for a systematic analysis of one particular, well structureddomain—academic Web pages—with regard to a special class of digital genres: Web genres. For this purpose, we have developed a database-driven system that will ultimately consist of more than 3 000 000 HTML documents, written in German, which are the empirical basis for our research. We introduce the notions of Web genre type which constitutes the basic framework for a certain Web genre, and compulsory and optional Web genre modules. These act as building blocks which go together to make up the structure characterised by theWeb genre type and furthermore, operate as modifiers for the default assignment involved. The analysis of a 200 document sample illustrates our notion of Web genre hierarchy, into which Web genre types and modules are embedded. The analysis of four different documents of theWeb genre Academic’s Personal Homepage, not only illustrates our approach, but also our long-term goal of automatically extracting the contents of Web genre modules in order to build up structured XML documents of groups of unstructured HTML documents. %B 35th Annual Hawaii International Conference on System Sciences %P 1143–1152 %8 2002 %G eng