DCMI: IllumiNet Corpus

Corpus is a search engine written in Java that index content, context and metadata in documents on the network or on the local file system. It is used as a "distributed relational database" for document-oriented solutions with metadata as indexed keys.

It also features:

Rich result overviews, structured by document context or by clustering
XML and RSS result output from built in HTTP server east to integrate with XSL transforms
Parser for multiple formats including HTML, XML-formats, PDF, DOC etc.
Auto generate metadata agent that uses LDAP and URL mappings
Direct indexing using WebDAV log or file system auditing events.
Web and command line administration
API's for new agents, spiders and parsers

Contact:
Jonas Bosson, IllumiNet AB, Stockholm, Sweden
+46-(0)8 666 96 61 (CET)
http://www.illuminet.se/
[email protected]