The Search Service is the element of DILIGENT that will receive an end-user query, along with any supplementary data it might contain (such as a document to be used for content-based search). The Query is a construct based on XML language, managed through special elements (classes) and the system user interface is the responsible for its production.
After accepting the query the search service will have to collect system information regarding its resources and their status. This will be required in order to figure out if and how it can execute the query satisfying the user in terms of accuracy and performance without mis-utilising system resources.
At this point the system can actually create a plan for executing the query. However the DILIGENT search service adds an advanced Query-Preprocessing layer that comprehends and modifies (re-writes) the query in order to improve user experience in various terms. The two major query preprocessors in DILIGENT are the Query Personalisation and the Content Source Selection. The former one is responsible for modifying the query in order to much the user preferences that are recorded in his/her profile. The later one attempts to filter out content collections that seem to not have information relative to the user query, so that the system does not consume any resources for a near meaningless operation. Linguistic processing (such as stemming) or using ontologies to expand search terms are also examples of such preprocessing, but although they are supported, they not part of the DILIGENT priority implementation.
In order to create the full picture of the aforementioned Query preprocessors, we have to mention that a separate set of elements of the Personalisation service is responsible for managing profiles and their schemas, while Content Source Description element is the one who creates collection descriptions out of statistics of their indices.
After the query is finally formed by the preprocessors, the system has to construct a plan in order to carry out its execution. In the case that a complex query is posted by the user, this execution might involve a large series of steps and there might be a number of alternative means for achieving its target. For example semantically equivalent series of operations might exit or multiple alternative implementations and/or instances of services might be available in a particular Digital Library. Thus the system through its planner has to find a way so that the query is served “optimally”.
The outcome of the planner is the query execution plan, which is a workflow of the so called Search Operators. Search Operators are services that have a particular highly focused task to carry out. Examples of operators would to perform sorting on XML results, to lookup for similar images based on a provided prototype, to perform XPATH on XML, to merge results coming out of different collections etc.
Among these operators Index lookups, the Data Fusion and the Feature Extraction are very special ones. Index lookups are made possible through providing a mechanism that creates and maintains these constructs in an efficient manner and allows to perform index-specific matching operations. Data fusion consolidates information on the result sources in order to achieve merging ranked results in meaningful ranked list. Feature Extraction utilises complex computationally intensive algorithms to calculate indicators and extract features of raw data, in order to facilitate content based search (e.g. search for images similar to a prototype etc).
Although Search Service constructs the workflow of operations it is the one that will execute it by invoking the various services and transferring the data among them. The workflow will be authored in a standard language (BPEL) and it will be submitted to the dedicated mechanism of DILIGENT which is provided by the Process Management group of services.
The last operation of this workflow will be to return the results to the workflow originator, i.e. the Search Service. Finally the Search Service will deliver the results back to its requestor.
In the above diagram elements of the Index and Search group of services are coloured in yellow tones while thematically related services are grouped with bounding boxes. The main flows of information (and control) are drawn with heavy arrows while secondary flows are drawn with light ones.
|