A rich family of generic Information Extraction (IE) techniques have been developed by researchers nowadays. This paper proposes WebKER, a system for automatically extracting knowledge from semi-structured content on Web pages based on wrappers and domain ontologies. Within the extracting process, wrappers are learned through su x arrays.Then domain ontologies automatically align the raw data extracted by wrappers and knowledge are generated by describing the data with Resource Description Framework (RDF)statements. After the merging process, newly generated knowledge are added to the Knowledge Base (KB) nally for users to query regardless of resources' derivation. A prototype of WebKER is implemented. This paper also gives the performance evaluation of this system and the comparison between querying information in the KB and querying information in the traditional database, indicating the superiority of our system. In addition, the evaluation of the outstanding wrapper and the method for merging knowledge are also presented.
Xi Bai, Jigui Sun, Haiyan Che, Lian Shi. Towards Knowledge Acquisition from Semi-Structured Content. International Journal of Software and Informatics, 2008,2(2):233~248Copy