Table of Contents

Developer Guide
Community Edition | Enterprise Edition


Architecture

SWIRL AI Search Architecture

SWIRL AI Search Architecture Part 2

SWIRL RAG Architecture

SWIRL RAG Architecture

SWIRL AI Search Assistant

SWIRL AI Search Assistant Architecture

Workflow

  1. Creating a Search

    A new Search object is created at the /swirl/search/ endpoint.

    • This calls the create method in swirl/views.py.
    • SWIRL responds with the id of the newly created search.
    • The federation process is then managed by swirl/search.py.
  2. Executing the Search

    swirl/search.py performs:

    • Pre-query processing using Search.pre_query_processors.
    • Federation by creating a federate_task for each SearchProvider.
  3. Waiting for Results

    • SWIRL waits for all tasks to complete or until settings.SWIRL_TIMEOUT is reached.
    • Meanwhile, each federate_task:

      • Creates a Connector.
      • Processes the query using Search.query_processors.
      • Builds and validates the query (url, query_template, query_mappings).
      • Sends the query to the SearchProvider.
      • Normalizes and processes results (Search.result_processors).
      • Saves results in the database.
  4. Post-Processing and Relevancy Ranking

    Once results are available (or the timeout occurs):

    • search.py invokes Search.post_result_processors.
    • Relevancy ranking and duplicate detection are applied.
    • The Search.status is updated to FULL_RESULTS_READY or PARTIAL_RESULTS_READY.
  5. Retrieving Results

    To retrieve results, use /swirl/results:

    • All result objects are listed.
    • Individual results can be retrieved using their id.
    • Adding search_id groups results using the Result Mixer.
  6. Continuous Updates with subscribe

    If Search.subscribe = true:

    • SWIRL will periodically re-run the search.
    • The sort order is set to date, fetching newer results.
    • Merging and de-duplication ensure no duplicate results.

To retrieve only new results, use Search.new_results_url or select a NewItem Mixer.

How To…

Work with JSON Endpoints

When using a browser to interact with SWIRL API endpoints (such as those in this guide), disable prefetching to prevent accidental creation of multiple objects via ?q= and ?qs= parameters.

Create a Search Object via API

  1. Navigate to: http://localhost:8000/swirl/search/

    SWIRL Search Form

  2. Scroll to the form at the bottom of the page.
  3. Switch to Raw data mode and clear any pre-filled content.
  4. Copy and paste an example Search object.
  5. Click POST.

SWIRL responds with the newly created Search object, including its id:

SWIRL Search Created - Google PSE Example

Save the id value—it is required for retrieving ranked results.

Create a Search Object with the q= URL Parameter

To create a Search object with only a query_string (and default settings), append ?q=your-query-string to the API URL.

Example:
http://localhost:8000/swirl/search?q=knowledge+management

After a few seconds, SWIRL redirects to the fully mixed results page:

SWIRL Results Header
SWIRL Results, Ranked by Cosine Vector Similarity

Limitations of q=:

  • The query must be URL-encoded (e.g., spaces → +). Use a free URL encoder for assistance.
  • All active and default SearchProviders are queried.
  • Limited error handling—if no results appear, inspect the Search object:
    http://localhost:8000/swirl/search/<your-search-id>

Specify SearchProviders with providers= URL Parameter

Use the providers= parameter to specify a single SearchProvider or a list of Tags.

Example: Querying a single provider

http://localhost:8000/swirl/search/?q=knowledge+management&providers=maritime

Example: Querying multiple providers by Tag

http://localhost:8000/swirl/search/?q=knowledge+management&providers=maritime,news

Get Synchronous Results with qs= URL Parameter

The qs= parameter functions like q=, except that it immediately returns the first page of results instead of redirecting.

Example:
http://localhost:8000/swirl/search?qs=knowledge+management

qs= Supports:

  • Filtering by SearchProviders using providers=.
  • Using custom Mixers via result_mixer=.
  • Enabling RAG processing in a single call:
    ?qs=metasearch&rag=true

Overriding RAG Timeout

Starting in SWIRL 3.7.0, you can override the default AI Summary timeout:

Example:
http://localhost:8000/galaxy/?q=gig%20economics&rag=true&rag_timeout=90

rag_timeout is specified in seconds.

Paging with qs=

&page= is NOT supported with qs=.

Instead, use the next_page property from the info.results structure:

"results": {
    "retrieved_total": 30,
    "retrieved": 10,
    "federation_time": 2.2,
    "result_blocks": ["ai_summary"],
    "next_page": "http://localhost:8000/swirl/results/?search_id=2&page=2"
}

Request Date-Sorted Results

If "sort": "date" is specified in a Search object, SWIRL will request results in chronological order from providers that support date sorting.

However, by default, SWIRL still applies relevancy ranking, ensuring a mix of the most recent and most relevant results.

SWIRL Results Header, Sort/Date, Relevancy Mixer
SWIRL Results, Sort/Date, Relevancy Mixer

Handling Missing Date Information

Some sources do not provide a date_published field.

To address this, use the DateFindingResultProcessor to detect dates from content fields and map them to date_published.

Use an LLM to Rewrite Queries

SWIRL AI Search (Community Edition) supports query rewriting using ChatGPTQueryProcessor.

To enable it, add "ChatGPTQueryProcessor" to SearchProvider.query_processors.

For details, see: Developer Reference - Query Processors.

Adjust swirl_score for Starred Results in Galaxy UI

**SWIRL Community Edition **

  • Configured via "theminimumSwirlScore" in static/api/config/default.
  • Default: 100. Increase this to reduce starred results.

**SWIRL Enterprise Edition **

  • Configured via "minimumConfidenceScore" in static/api/config/default.
  • Default: 0.7. Increase this to reduce starred results.

SWIRL AI Search 4.0 Results

Handle NOT Queries

If a SearchProvider returns a result containing a NOT-ted term, SWIRL logs a Relevancy Explain message.

Solution

  1. Verify the SearchProvider supports NOT queries.
  2. Ensure the correct NOT query-mapping is set.

When "subscribe": true, SWIRL automatically re-runs the search every four hours, with sort set to "date" to fetch new results.

Example Search Object with Subscription

{
    "id": 10,
    "query_string": "electric vehicles NOT tesla",
    "sort": "relevancy",
    "subscribe": true,
    "status": "FULL_RESULTS_READY",
    "result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyMixer",
    "new_result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyNewItemsMixer"
}

Updating a Subscription

Once SWIRL updates the Search, it sets:

"status": "FULL_UPDATE_READY"

New results will have "new": 1. Use new_result_url to retrieve only new results.

Example: Updated Search Object

{
    "id": 10,
    "query_string": "electric vehicles NOT tesla",
    "sort": "date",
    "subscribe": true,
    "status": "FULL_UPDATE_READY",
    "messages": [
        "[16:51:43] DedupeByFieldPostResultProcessor deleted 2 results",
        "[16:55:02] CosineRelevancyPostResultProcessor updated 58 results",
        "[17:00:02] DedupeByFieldPostResultProcessor deleted 30 results"
    ],
    "result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyMixer",
    "new_result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyNewItemsMixer"
}

The messages field logs federation processing details, while individual Result objects contain source-specific messages.

Viewing Only New Results

Use the NewItems Mixers to retrieve only newly added results.

Detect and Remove Duplicate Results

SWIRL includes two PostResultProcessors for duplicate detection:

Processor Description Notes
DedupeByFieldResultProcessor Removes duplicates based on exact match of a field. The field is set in swirl_server/settings.py (default: url).
DedupeBySimilarityResultProcessor Removes duplicates based on similarity of title and body. The similarity threshold is configured in settings.py.

Default Configuration

DedupeByFieldResultProcessor is enabled by default in Search.post_result_processors.

To modify this, edit the getSearchPostResultProcessorsDefault method in swirl/models.py.

Manage Search Objects

To edit a Search, append its id to the /swirl/search/ URL:

Example:
http://localhost:8000/swirl/search/1/

SWIRL Edit Search - Google PSE Example

Available Actions:

  • DELETE the Search (permanently deletes associated Results).
  • Edit the request body and PUT the updated Search.

Deleting a Search also deletes all associated Results immediately. Future versions may change this behavior.

To discard previous results and re-run a Search, use:

http://localhost:8000/swirl/search?rerun=1
  • This restarts the search from scratch.
  • The re-run URL is included in the info.search section of every mixed result response.

To re-run a Search but keep previous results, use:

http://localhost:8000/swirl/search/?update=<search-id>

Behavior:

  • Changes Search.sort to "date" to prioritize new results.
  • De-duplicates results using the url field.
  • Updates Search and Result messages as the process runs.

Use RelevancyNewItemsMixer and DateNewItemsMixer to retrieve only new results.

Improve Relevancy for a Single SearchProvider

To filter results where the query string is not in the title, use:

"RequireQueryStringInTitleResultProcessor"

How It Works:

  • Install it after MappingResultProcessor in result_processors.
  • Removes results that do not contain the query in the title.

When to Use:

  • Recommended for sources like LinkedIn, which may return related but irrelevant profiles.
  • Normally, SWIRL ranks these results poorly—this eliminates them entirely.

Find Dates in Body/Title Responses

To detect and extract dates from result content, use:

"DateFindingResultProcessor"

How It Works:

  • Finds dates in results that lack a date_published field.
  • Copies the detected date into date_published.

Usage:

  • Add to SearchProvider.result_processorsProcesses results from that provider only.
  • Add to Search.post_result_processorsAttempts date detection for all results.

Automatically Map Results Using Profiling

The AutomaticPayloadMapperResultProcessor profiles response data to find the best matches for:

  • title
  • body
  • date_published

When to Use:

  • Recommended for SearchProviders with poor or missing result_mappings.
  • Allows SWIRL to auto-map relevant fields.

Configuration:

  • Install after MappingResultProcessor.
  • Leave result_mappings blank.

Visualize Structured Data Results

To organize a columnar response into a structured dataset:

"result_mappings": "DATASET"

Example Output: Galaxy UI with charts displayed

Key Features:

  • Fully compatible with result_mappings, including NO_PAYLOAD.
  • Automatically generates visualizations using chart.js.

Chart Selection Logic:

  1. No Numeric Fields → Adds a pseudo-count field → Bar Chart.
  2. One Numeric Field → Uses Bar Chart.
  3. Two Numeric Fields → Uses Scatter Chart (if both ranges are positive), otherwise Bar Chart.
  4. Three+ Numeric Fields → Uses Bubble Chart (if a valid range is found), otherwise Bar Chart.

For assistance, please contact support.

Expire Search Objects

If Search Expiration Service is enabled, users can set Search retention policies.

Retention Value Meaning
0 Retain indefinitely (default)
1 Retain for 1 hour
2 Retain for 1 day
3 Retain for 1 month

Expiration Timing:

Manage Results

To delete or edit a Result, use its id:

Example:
http://localhost:8000/swirl/results/1/

Available Actions:

  • DELETE the result permanently.
  • Edit the result and PUT it back.

Deleting a Result does NOT delete the associated Search.

Get Unified Results

Result Mixers organize results from multiple SearchProviders into unified result sets.

Key Features:

  • Mixers operate on saved results, not live federated data.
  • Re-running a search updates mixed results dynamically.
  • Different mixers can be applied on-the-fly via URL parameters.

Retrieve Unified Results

To fetch results for a specific Search, use:

http://localhost:8000/swirl/results?search_id=<search-id>

Example:
http://localhost:8000/swirl/results?search_id=1

SWIRL returns results using the result_mixer specified in the Search object.

SWIRL Results Header
SWIRL Results, Ranked by Cosine Vector Similarity

Override Mixer in Real Time

To apply a different mixer, append result_mixer=:

http://localhost:8000/swirl/results?search_id=<search-id>&result_mixer=<mixer-name>

Example:
http://localhost:8000/swirl/results?search_id=1&result_mixer=Stack1Mixer

Page Through Results

By default, SWIRL retrieves at least 10 results per SearchProvider.

To navigate results, append page=:

http://localhost:8000/swirl/results?search_id=<search-id>&page=<page-number>

Example:
http://localhost:8000/swirl/results?search_id=1&page=2

Increase Available Results

To store more results for paging, update results_per_query in the SearchProvider configuration.

  • Default: 10
  • Recommended for extensive paging: 20, 50, or 100

Increasing results_per_query requires re-running the search to fetch more results.

Get Search Times

SWIRL reports search execution times per source in the info block:

"info": "Web (Google PSE)": {
    "found": 8640,
    "retrieved": 10,
    "search_time": 2.1
}

The total federation time appears in info.results:

"results": {
    "retrieved_total": 50,
    "retrieved": 10,
    "federation_time": 3.2,
    "next_page": "http://localhost:8000/swirl/results/?search_id=507&page=2"
}

Timing Details:

  • Units: Seconds (rounded to 0.1 precision).
  • Federation Time Includes: Query execution, response processing, post-processing.
  • Mixer Processing Time is NOT included in federation time.

Configure Pipelines

Result processing happens in two stages:

  1. SearchProvider.result_processors → Initial processing.
  2. Search.post_result_processors → Final processing & ranking.

Example: Google PSE Result Processors

"result_processors": [
    "MappingResultProcessor",
    "DateFinderResultProcessor",
    "CosineRelevancyResultProcessor"
]

Modify Default Pipelines

To customize:

  • Post Result Processors: Edit getSearchPostResultProcessorsDefault() in swirl/models.py.
  • Default Mixer: Change the Search.result_mixer default.
result_mixer = models.CharField(max_length=200, default='RelevancyMixer', choices=MIXER_CHOICES)

Configure Relevancy Field Weights

To adjust field weights for relevancy scoring, update RELEVANCY_CONFIG in:
swirl_server/settings.py.

Default Weights:

Field Weight Notes
title 1.5  
body 1.0 Base relevancy score
author 1.0  

Configure Stopwords Language

By default, SWIRL loads English stopwords. To change this:

  1. Update SWIRL_DEFAULT_QUERY_LANGUAGE in:
    swirl_server/settings.py.
  2. Set it to another NLTK stopword language.

Redact or Remove Personally Identifiable Information (PII)

SWIRL supports PII removal and redaction using Microsoft Presidio.

RemovePIIQueryProcessor (Redacts Queries)

Removes PII before querying.

Enable for a Specific SearchProvider:

"query_processors": [
    "AdaptiveQueryProcessor",
    "RemovePIIQueryProcessor"
]

Enable for ALL SearchProviders:

Modify swirl/models.py:

def getSearchPreQueryProcessorsDefault():
    return ["RemovePIIQueryProcessor"]

More details: ResultProcessors

RemovePIIResultProcessor (Redacts Results)

Redacts PII in results (e.g., "James T. Kirk""<PERSON>").

Enable for a Specific SearchProvider:

"result_processors": [
    "MappingResultProcessor",
    "DateFinderResultProcessor",
    "CosineRelevancyResultProcessor",
    "RemovePIIResultProcessor"
]

More details: ResultProcessors

RemovePIIPostResultProcessor

This processor applies PII redaction after all results are processed.

Understand the Explain Structure

The CosineRelevancyProcessor outputs a JSON structure explaining swirl_score calculations.

Viewing the Explain Data:

  • Enabled by default.
  • To disable, add &explain=False to the mixer URL.

Example:
SWIRL Result with Explain

Explain Match Types:

Postfix Meaning Example
_* Query partially matched against the entire result field. "knowledge_management_*", 0.7332...
_s* Query matched one or more sentences, highest similarity recorded. "knowledge_management_s*", 0.7332...
_n Query matched at word position 'n' in the field. "Knowledge_Management_0", 0.7332...

Additional Data:

  • stems → Shows matching stems.
  • result and query length adjustments are recorded.
  • hits → Displays zero-offset token positions for each match.

Develop New Connectors

To connect to a new endpoint using an existing Connector (e.g., RequestsGet), create a new SearchProvider instead.

Example: The Google PSE SearchProvider JSON demonstrates how one Connector can be used to define hundreds of SearchProviders.

When to Develop a New Connector

Create a new Connector if:

  • The target API requires a unique transport method not supported by existing connectors.
  • A high-quality Python package exists to interface with the API.

Connector Base Class

All Connectors extend the Connector base class, which defines the workflow in federate().
Source: swirl/connectors.

Connector Workflow (federate() Method)

def federate(self):
    '''
    Executes the workflow for a given search and provider
    ''' 
    self.start_time = time.time()

    if self.status == 'READY':
        self.status = 'FEDERATING'
        try:
            self.process_query()
            self.construct_query()
            if self.validate_query():
                self.execute_search()
                if self.status == 'FEDERATING':
                    self.normalize_response()
                    self.process_results()
                if self.status == 'READY':
                    return self.save_results()
            else:
                self.error('validate_query() failed')
        except Exception as err:
            self.error(f'{err}')
    return False

Connector Execution Stages

Stage Description Notes
process_query Calls the Query Processor to adapt the query for this SearchProvider.  
construct_query Assembles the final query format.  
validate_query Checks if the query is valid and error-free. Returns False if invalid.
execute_search Connects to the SearchProvider, executes the query, and stores the response.  
normalize_response Transforms the provider’s response into JSON format for SWIRL.  
process_results Runs Result Processors to map data to SWIRL’s schema.  
save_results Saves results in the Django database.  

A new Connector must override:

  • execute_search() → Handles the API connection & query execution.
  • normalize_response() → Converts raw API responses into structured JSON.

Connector Development Guidelines

  • Import new connectors in swirl/connectors/__init__.py.
  • Register new processors in CHOICES inside swirl/models.py (requires a database migration).
  • Limit imports to only the required libraries (e.g., requests, elasticsearch, sqlite3).
  • To extend an existing transport, subclass it and override normalize_response().
  • Ensure execute_search() supports:
    • results_per_query > 10 (handle paging if needed).
    • Date sorting (if supported by the data source).

Using eval_credentials for Secure Authentication

To use session-based credentials dynamically in a SearchProvider:

  1. Store the authentication token in a session variable.
  2. Use eval_credentials to inject it into the SearchProvider.

Example:

{
   "eval_credentials": "session['my-connector-token']",
   "credentials": "myusername:{credentials}"
}

Required Query Mappings

When developing a new Connector, implement query_mappings:

  • DATE_SORT → Enables date-based sorting.
  • PAGE → Enables pagination support.
  • NOT_CHAR / NOT → Defines negation behavior.

Required Result Processing

Each Connector should process results using a Result Processor, ideally:

"result_processors": [
    "MappingResultProcessor"
]

More details: MappingResultProcessor.

Develop New Processors

Processor classes are located in: swirl/processors.

Key Guidelines:

  • Processors execute in sequence and should perform one transformation only.
  • Inherit from QueryProcessor, ResultProcessor, or PostResultProcessor.
  • Override process() for simple changes or define new variables in __init__.
  • Use validate() to check input values.
  • Return: Processed data (for Query/Result processors) or an integer count of results updated (for PostResultProcessors).

Development Notes:


Develop New Mixers

Mixer classes are located in: swirl/mixers.

Mixer Workflow

def mix(self):
    '''
    Executes the workflow for a given mixer
    '''
    self.order()
    self.finalize()
    return self.mix_wrapper
  • Most Mixers override order().
  • order() should sort and save self.all_results into self.mixed_results.

Example: **Basic Paging Mixer

def order(self):
    '''
    Orders all_results into mixed_results
    Base class, intended to be overridden!
    '''
    self.mixed_results = self.all_results[(self.page-1)*self.results_requested:(self.page)*self.results_requested]

Example: **RelevancyMixer

class RelevancyMixer(Mixer):

    type = 'RelevancyMixer'

    def order(self):
        # Sort results by SWIRL score, then by SearchProvider rank
        self.mixed_results = sorted(
            sorted(self.all_results, key=itemgetter('searchprovider_rank')), 
            key=itemgetter('swirl_score'), 
            reverse=True
        )

Finalizing Results

  • finalize() trims self.mixed_results, adds metadata, and returns mix_wrapper.
  • Mixers automatically page results if enough are available.

Development Notes:

Retrieval Augmented Generation WebSocket API

WebSocket Interaction Protocol for UI Developers

This section outlines the protocol for WebSocket interactions with the SWIRL server.

1. Initialize the WebSocket

  • Action: Create a WebSocket connection.
  • Parameters:
    • searchId: SWIRL search ID (required).
    • ragItems: (Optional) Array of integers identifying search results for RAG processing.
  • Behavior:
    • Opens a new WebSocket connection.
    • Sends initial parameters (searchId, ragItems) to the server.
    • Authentication should be handled before opening the WebSocket.

2. Sending Data

  • Action: Send a RAG request over the WebSocket.
  • Parameters:
    • Empty message → Starts RAG processing.
    • "stop" → Cancels an ongoing RAG request for the searchId.

3. Receiving Data

  • Action: Receive RAG results from the WebSocket.
  • Response Format: JSON or "No data" if an error occurs.
{
    "message": {
        "date_published": "<timestamp-response-creation>",
        "title": "<query-string>",
        "body": "<ai_response>",
        "author": "ChatGPT",
        "searchprovider": "ChatGPT",
        "searchprovider_rank": 1,
        "result_block": "ai_summary",
        "rag_query_items": [<list-of-rag-items-passed-in>]
    }
}

4. Connection Teardown

  • Action: Close the WebSocket gracefully when finished.

Using Query Transformations

Query Transformation Rules

Developers can apply query transformation rules using the Query Transformation feature.

  • Pre-query rules → Apply before queries are sent to all sources.
  • Per-source rules → Apply to individual SearchProviders.

Supported Transformation Types:

Type Description
Replace Replaces a string in the query (or removes it entirely).
Synonym Replaces a term with an OR expression containing synonyms.
Synonym Bag Expands a term into an OR expression containing multiple synonyms.

Rules are provided as CSV files uploaded to SWIRL.


Replace/Rewrite Rules

CSV Format:

Column 1 Column 2
List of patterns to replace (separated by ;). Supports * wildcards (non-leading). Replacement string (leave blank to remove the term).

Example Configuration:

# column1, column2
mobiles; ombile; mo bile, mobile
computers, computer
cheap* smartphones, cheap smartphone
on

Example Transformations:

Query Transformed Query
mobiles mobile
ombile mobile
mo bile mobile
on computing computing
cheaper smartphones cheap smartphone
computers go figure computer go figure

Synonym Rules

CSV Format:

Column 1 Column 2
Term Synonym

Example Configuration:

# column1, column2
notebook, laptop
laptop, personal computer
pc, personal computer
personal computer, pc
car, ride

Example Transformations:

Query Transformed Query
notebook (notebook OR laptop)
pc (pc OR personal computer)
personal computer (personal computer OR pc)
I love my notebook I love my (notebook OR laptop)
This pc, it is better than a notebook This (pc OR personal computer), it is better than a (notebook OR laptop)
My favorite song is "You got a fast car" My favorite song is "You got a fast (car OR ride)"

Synonym Bag Rules

CSV Format:

Column 1 Column 2…N
Term List of synonyms

Example Configuration:

# column1, column2, column3, column4
notebook, personal computer, laptop, pc
car, automobile, ride

Example Transformations:

Query Transformed Query
car (car OR automobile OR ride)
automobile (automobile OR car OR ride)
ride (ride OR car OR automobile)
pimp my ride pimp my (ride OR car OR automobile)
automobile, yours is fast (automobile OR car OR ride), yours is fast
I love the movie The Notebook I love the movie The Notebook
My new notebook is slow My new (notebook OR personal computer OR laptop OR pc) is slow

Uploading a Query Transformation CSV

  1. Log in as an admin user on the SWIRL homepage.
  2. Select Upload Query Transform CSV:

    Upload CSV option

  3. Enter a Name and select a Type:

    Name and Type

  4. Choose the CSV file to upload:

    Choose CSV file

  5. Click Upload:

    Upload button

Using the Uploaded CSV

Once uploaded, reference the file as <name>.<type>.

Example: If the file was named TestQueryTransform with type synonym, the reference is:

TestQueryTransform.synonym

Pre-Query Processing

Apply query transformations before execution:

Option 1: Use pre_query_processor in the API

/api/swirl/search/search?q=notebook&pre_query_processor=TestQueryTransform.synonym

Option 2: Update the **SWIRL Search Object

Modify pre_query_processors in the Search object to include the transformation.

More details: Creating a Search Object with the API.


Query Processing

Update the SearchProvider’s query_processors field:

{
    "name": "TEST Web (Google PSE) with synonym processor",
    "active": "true",
    "default": "true",
    "connector": "RequestsGet",
    "query_processors": [
        "AdaptiveQueryProcessor",
        "TestQueryTransform.synonym"
    ],
    "query_mappings": "cx=0c38029ddd002c006,DATE_SORT=sort=date,PAGE=start=RESULT_INDEX,NOT_CHAR=-",
    "result_processors": [
        "MappingResultProcessor",
        "CosineRelevancyResultProcessor"
    ]
}

Integrate Source Synonyms into SWIRL Relevancy

SWIRL can extract source-specific synonym feedback and integrate it into relevancy scoring.

Why?

Some search engines apply synonyms internally (e.g., notebooklaptop), but SWIRL’s relevancy scoring is not aware of these extra terms. Hit highlighting extraction enables SWIRL to detect them.

Supported SearchProviders

  • OpenSearch
  • Elasticsearch
  • Solr

Configuration

1. Enable Hit Highlighting in the SearchProvider

Modify the query_template to enable hit highlighting on all fields:

"query_template": {
    "highlight": { "fields": { "*": {} } }
}

Synonym Relevancy - 1

Consult the search engine’s documentation for additional highlighting options.


2. Map Highlighted Fields in results_mapping

Assign highlighted synonyms to the following SWIRL result fields:

  • title_hit_highlights
  • body_hit_highlights

Example: Elasticsearch Response

{
    "_source": {
        "title": "Laptop computer",
        "content": "I need a new laptop computer for work."
    },
    "highlight": {
        "title": ["<em>Notebook</em> computer"],
        "content": ["I need a new <em>notebook</em> computer for work."]
    }
}

Mapping Configuration in results_mapping

title_hit_highlights=highlight.title, body_hit_highlights=highlight.content

Synonym Relevancy - 2


Results

The configuration appears in the info section of the results:

Synonym Relevancy - 3

  • The original query term was "notebook".
  • The search engine used "laptop" as a synonym.
  • Both terms were extracted and used in SWIRL's relevancy ranking.

Complete Highlighted Synonyms

The full hit highlighting content is available in:

  • body_hit_highlights → Synonym highlights in content.
  • title_hit_highlights → Synonym highlights in titles.

Synonym Relevancy - 4

Example Search Objects

Runs a default configuration:

  • Retrieves 10 results.
  • Uses the RelevancyMixer.
{
    "query_string": "search engine"
}

Run as a GET request

Using the q= URL parameter:

http://localhost:8000/swirl/search?q=search+engine

Using NOT Queries

{
    "query_string": "search engine -SEO"
}
{
    "query_string": "generative ai NOT chatgpt"
}

Note:

  • SWIRL may rewrite these queries based on query_mappings in the SearchProvider.
  • See: Search Syntax.

Sorting by Date

{
    "query_string": "search engine",
    "sort": "date"
}

Using the DateMixer (instead of RelevancyMixer)

{
    "query_string": "search engine",
    "sort": "date",
    "result_mixer": "DateMixer"
}

Spellcheck Example

{
    "query_string": "search engine",
    "pre_query_processors": "SpellcheckQueryProcessor"
}
  • Spellcheck runs before federated search.
  • The corrected query is sent to each SearchProvider.
  • Not recommended for Google PSE, as it handles spellchecking natively.

Searches specifying "sort", "result_mixer", or "pre_query_processors" must be POSTed to the Search API.


Advanced Search Example

This request:

  • Retrieves 20 results.
  • Queries SearchProviders 1 & 3 only.
  • Uses the RoundRobinMixer instead of relevancy ranking.
  • Sets a retention time of 1 hour.
{
    "query_string": "search engine",
    "results_requested": 20,
    "searchprovider_list": [1, 3],
    "result_mixer": "RoundRobinMixer",
    "retention": 1
}

Retention setting (retention: 1) ensures the search is deleted after 1 hour, assuming the Search Expiration Service is running.


Funding Dataset Examples

If the Funding Dataset is installed, the following queries work:

electric vehicle company:tesla
social media company:facebook
company:slack