SWSE - Mission Statement
You can now try out the SWSE prototype.
Although the Semantic Web (SW) is still very much in its infancy, there is already a lot of data out there which conforms to the proposed SW standards (e.g. RDF and OWL). Small vertical vocabularies and ontologies have emerged, and the community of people using these is growing daily. People publish descriptions about themselves using FOAF (Friend of a Friend), news providers publish newsfeeds in RSS (RDF Site Summary), and pictures are being annotated using various RDF vocabularies. The amount of available formal data is growing steadily, but a means to find and thus utilize this data is still missing. What is needed is the equivalent of the services a search engine currently provides for the HTML-web: a service which continuously explores and indexes the Semantic Web and provides an easy-to-use interface through which users can find the data they are looking for. We are therefore developing a Semantic Web Search Engine (SWSE, pronounced "swizzy").
Extended Query Functionality
Because of the inherent semantics of RDF and other SW languages, the search and information retrieval capabilities of SWSE are potentially much more powerful than those of current search engines. The use of vocabularies and ontologies makes it possible to apply powerful inferencing techniques and get much more accurate search results.
SWSE will combine the experiences and results which the individual members have made and are still making in other projects:
- The goal of SWAN (Semantic Web Annotator) is to provide a means to extract formal SW data from existing legacy data that is already available on the web. To reach these goals, we are applying ontology aware Information Extraction (IE) techniques. To allow the processing of a considerable portion of the web, SWAN is based on a highly scalable cluster architecture.
- YARS (Yet Another RDF Store) provides distributed storage and retrieval facilities. Indexing structures are optimized for retrieval of RDF statements including context (quads) while minimising the need for joins, plus Lucene fulltext indexing for efficient keyword searches. Since YARS needs to accomodate a very large amount of data, distribution of the index to a number of nodes is crucial to allow for cost-effective scalability and extensibility.
- SECO provides a means to crawl RDF files and to integrate repositories that are potentially dispersed across the Internet. Highly dynamic sites with frequently changing content can be "virtually" integrated without the need to replicate and materialize the entire dataset of the sites. Additionally, if sites provide a fast and feature-rich query interface, SECO can act as "meta-query and integration engine", virtually combining data sets that are made available from SWAN or SIOC enabled sites.
- The Semantically Interlinked Online Community (SIOC) ontology is needed to semantically enable and link current community sites, and will prove a useful testbed for SWSE due to the large amount of community data that can be made available from bulletin boards, newsgroups and mailing lists. SIOC combines terms from vocabularies that already exist with new terms needed to describe the relationships between concepts in the realm of online community sites.
- JeromeDL and MarcOnt
- JeromeDL, a digital library with semantics, will utilize the power of SWSE during both local and distributed (P2P) queries, while the MarcOnt initiative will provide an ontology and related tools that will make this searching process more adequate to users' requirements and possible even in heterogenous P2P networks.