Table of Contents

Architecture
Workflow
How To…
Retrieval Augmented Generation WebSocket API
- WebSocket Interaction Protocol for UI Developers
Using Query Transformations
Integrate Source Synonyms into SWIRL Relevancy
- Configuration
- Results
Example Search Objects

Developer Guide
Community Edition | Enterprise Edition

Architecture

SWIRL AI Search

SWIRL AI Search Architecture

SWIRL AI Search Architecture Part 2

SWIRL RAG Architecture

SWIRL AI Search Assistant

SWIRL AI Search Assistant Architecture

Workflow

Creating a Search

A new Search object is created at the /swirl/search/ endpoint.
- This calls the create method in swirl/views.py.
- SWIRL responds with the id of the newly created search.
- The federation process is then managed by swirl/search.py.
Executing the Search

swirl/search.py performs:
- Pre-query processing using Search.pre_query_processors.
- Federation by creating a federate_task for each SearchProvider.
Waiting for Results
- SWIRL waits for all tasks to complete or until settings.SWIRL_TIMEOUT is reached.
- Meanwhile, each federate_task:
  - Creates a Connector.
  - Processes the query using Search.query_processors.
  - Builds and validates the query (url, query_template, query_mappings).
  - Sends the query to the SearchProvider.
  - Normalizes and processes results (Search.result_processors).
  - Saves results in the database.
Post-Processing and Relevancy Ranking

Once results are available (or the timeout occurs):
- search.py invokes Search.post_result_processors.
- Relevancy ranking and duplicate detection are applied.
- The Search.status is updated to FULL_RESULTS_READY or PARTIAL_RESULTS_READY.
Retrieving Results

To retrieve results, use /swirl/results:
- All result objects are listed.
- Individual results can be retrieved using their id.
- Adding search_id groups results using the Result Mixer.
Continuous Updates with subscribe

If Search.subscribe = true:
- SWIRL will periodically re-run the search.
- The sort order is set to date, fetching newer results.
- Merging and de-duplication ensure no duplicate results.

To retrieve only new results, use Search.new_results_url or select a NewItem Mixer.

How To…

Work with JSON Endpoints

When using a browser to interact with SWIRL API endpoints (such as those in this guide), disable prefetching to prevent accidental creation of multiple objects via ?q= and ?qs= parameters.

Create a Search Object via API

Navigate to: http://localhost:8000/swirl/search/
Scroll to the form at the bottom of the page.
Switch to Raw data mode and clear any pre-filled content.
Copy and paste an example Search object.
Click POST.

SWIRL responds with the newly created Search object, including its id:

SWIRL Search Created - Google PSE Example

Save the id value—it is required for retrieving ranked results.

Create a Search Object with the `q=` URL Parameter

To create a Search object with only a query_string (and default settings), append ?q=your-query-string to the API URL.

Example:
http://localhost:8000/swirl/search?q=knowledge+management

After a few seconds, SWIRL redirects to the fully mixed results page:

SWIRL Results Header
SWIRL Results, Ranked by Cosine Vector Similarity

Limitations of q=:

The query must be URL-encoded (e.g., spaces → +). Use a free URL encoder for assistance.
All active and default SearchProviders are queried.
Limited error handling—if no results appear, inspect the Search object:
http://localhost:8000/swirl/search/<your-search-id>

Specify SearchProviders with `providers=` URL Parameter

Use the providers= parameter to specify a single SearchProvider or a list of Tags.

Example: Querying a single provider

http://localhost:8000/swirl/search/?q=knowledge+management&providers=maritime

Example: Querying multiple providers by Tag

http://localhost:8000/swirl/search/?q=knowledge+management&providers=maritime,news

Get Synchronous Results with `qs=` URL Parameter

The qs= parameter functions like q=, except that it immediately returns the first page of results instead of redirecting.

Example:
http://localhost:8000/swirl/search?qs=knowledge+management

qs= Supports:

Filtering by SearchProviders using providers=.
Using custom Mixers via result_mixer=.
Enabling RAG processing in a single call:
?qs=metasearch&rag=true

Overriding RAG Timeout

Starting in SWIRL 3.7.0, you can override the default AI Summary timeout:

Example:
http://localhost:8000/galaxy/?q=gig%20economics&rag=true&rag_timeout=90

rag_timeout is specified in seconds.

Paging with qs=

&page= is NOT supported with qs=.

Instead, use the next_page property from the info.results structure:

"results": {
    "retrieved_total": 30,
    "retrieved": 10,
    "federation_time": 2.2,
    "result_blocks": ["ai_summary"],
    "next_page": "http://localhost:8000/swirl/results/?search_id=2&page=2"
}

Request Date-Sorted Results

If "sort": "date" is specified in a Search object, SWIRL will request results in chronological order from providers that support date sorting.

However, by default, SWIRL still applies relevancy ranking, ensuring a mix of the most recent and most relevant results.

SWIRL Results Header, Sort/Date, Relevancy Mixer
SWIRL Results, Sort/Date, Relevancy Mixer

Handling Missing Date Information

Some sources do not provide a date_published field.

To address this, use the DateFindingResultProcessor to detect dates from content fields and map them to date_published.

Use an LLM to Rewrite Queries

SWIRL AI Search supports query rewriting using ChatGPTQueryProcessor (for the Community edition) or the GenAIQueryProcessor (for Enterprise).

To enable it, add the processor to SearchProvider.query_processors.

For details, see: Developer Reference - Query Processors.

Adjust `swirl_score` for Starred Results in Galaxy UI

**SWIRL Community Edition **

Configured via "theminimumSwirlScore" in static/api/config/default.
Default: 100. Increase this to reduce starred results.

**SWIRL Enterprise Edition **

Configured via "minimumConfidenceScore" in static/api/config/default.
Default: 0.7. Increase this to reduce starred results.

SWIRL AI Search 4.0 Results

Handle NOT Queries

If a SearchProvider returns a result containing a NOT-ted term, SWIRL logs a Relevancy Explain message.

Solution

Verify the SearchProvider supports NOT queries.
Ensure the correct NOT query-mapping is set.

When "subscribe": true, SWIRL automatically re-runs the search every four hours, with sort set to "date" to fetch new results.

Example Search Object with Subscription

{
    "id": 10,
    "query_string": "electric vehicles NOT tesla",
    "sort": "relevancy",
    "subscribe": true,
    "status": "FULL_RESULTS_READY",
    "result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyMixer",
    "new_result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyNewItemsMixer"
}

Updating a Subscription

Once SWIRL updates the Search, it sets:

"status": "FULL_UPDATE_READY"

New results will have "new": 1. Use new_result_url to retrieve only new results.

Example: Updated Search Object

{
    "id": 10,
    "query_string": "electric vehicles NOT tesla",
    "sort": "date",
    "subscribe": true,
    "status": "FULL_UPDATE_READY",
    "messages": [
        "[16:51:43] DedupeByFieldPostResultProcessor deleted 2 results",
        "[16:55:02] CosineRelevancyPostResultProcessor updated 58 results",
        "[17:00:02] DedupeByFieldPostResultProcessor deleted 30 results"
    ],
    "result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyMixer",
    "new_result_url": "http://localhost:8000/swirl/results?search_id=10&result_mixer=RelevancyNewItemsMixer"
}

The messages field logs federation processing details, while individual Result objects contain source-specific messages.

Viewing Only New Results

Use the NewItems Mixers to retrieve only newly added results.

Detect and Remove Duplicate Results

SWIRL includes two PostResultProcessors for duplicate detection:

Processor	Description	Notes
DedupeByFieldResultProcessor	Removes duplicates based on exact match of a field.	The field is set in `swirl_server/settings.py` (default: `url`).
DedupeBySimilarityResultProcessor	Removes duplicates based on similarity of `title` and `body`.	The similarity threshold is configured in `settings.py`.

Default Configuration

DedupeByFieldResultProcessor is enabled by default in Search.post_result_processors.

To modify this, edit the getSearchPostResultProcessorsDefault method in swirl/models.py.

Manage Search Objects

To edit a Search, append its id to the /swirl/search/ URL:

Example:
http://localhost:8000/swirl/search/1/

SWIRL Edit Search - Google PSE Example

Available Actions:

DELETE the Search (permanently deletes associated Results).
Edit the request body and PUT the updated Search.

Deleting a Search also deletes all associated Results immediately. Future versions may change this behavior.

Re-Run a Search

To discard previous results and re-run a Search, use:

http://localhost:8000/swirl/search?rerun=1

This restarts the search from scratch.
The re-run URL is included in the info.search section of every mixed result response.

Update a Search

To re-run a Search but keep previous results, use:

http://localhost:8000/swirl/search/?update=<search-id>

Behavior:

Changes Search.sort to "date" to prioritize new results.
De-duplicates results using the url field.
Updates Search and Result messages as the process runs.

Use RelevancyNewItemsMixer and DateNewItemsMixer to retrieve only new results.

Improve Relevancy for a Single SearchProvider

To filter results where the query string is not in the title, use:

"RequireQueryStringInTitleResultProcessor"

How It Works:

Install it after MappingResultProcessor in result_processors.
Removes results that do not contain the query in the title.

When to Use:

Recommended for sources like LinkedIn, which may return related but irrelevant profiles.
Normally, SWIRL ranks these results poorly—this eliminates them entirely.

Find Dates in Body/Title Responses

To detect and extract dates from result content, use:

"DateFindingResultProcessor"

How It Works:

Finds dates in results that lack a date_published field.
Copies the detected date into date_published.

Usage:

Add to SearchProvider.result_processors → Processes results from that provider only.
Add to Search.post_result_processors → Attempts date detection for all results.

Automatically Map Results Using Profiling

The AutomaticPayloadMapperResultProcessor profiles response data to find the best matches for:

title
body
date_published

When to Use:

Recommended for SearchProviders with poor or missing result_mappings.
Allows SWIRL to auto-map relevant fields.

Configuration:

Install after MappingResultProcessor.
Leave result_mappings blank.

Visualize Structured Data Results

To organize a columnar response into a structured dataset:

"result_mappings": "DATASET"

Example Output: Galaxy UI with charts displayed

Key Features:

Fully compatible with result_mappings, including NO_PAYLOAD.
Automatically generates visualizations using chart.js.

Chart Selection Logic:

No Numeric Fields → Adds a pseudo-count field → Bar Chart.
One Numeric Field → Uses Bar Chart.
Two Numeric Fields → Uses Scatter Chart (if both ranges are positive), otherwise Bar Chart.
Three+ Numeric Fields → Uses Bubble Chart (if a valid range is found), otherwise Bar Chart.

For assistance, please contact support.

Expire Search Objects

If Search Expiration Service is enabled, users can set Search retention policies.

Retention Value	Meaning
0	Retain indefinitely (default)
1	Retain for 1 hour
2	Retain for 1 day
3	Retain for 1 month

Expiration Timing:

Controlled by Celery Beat Configuration.
Runs based on Search Expiration Service settings.

Manage Results

To delete or edit a Result, use its id:

Example:
http://localhost:8000/swirl/results/1/

Available Actions:

DELETE the result permanently.
Edit the result and PUT it back.

Deleting a Result does NOT delete the associated Search.

Get Unified Results

Result Mixers organize results from multiple SearchProviders into unified result sets.

Key Features:

Mixers operate on saved results, not live federated data.
Re-running a search updates mixed results dynamically.
Different mixers can be applied on-the-fly via URL parameters.

Retrieve Unified Results

To fetch results for a specific Search, use:

http://localhost:8000/swirl/results?search_id=<search-id>

Example:
http://localhost:8000/swirl/results?search_id=1

SWIRL returns results using the result_mixer specified in the Search object.

SWIRL Results Header
SWIRL Results, Ranked by Cosine Vector Similarity

Override Mixer in Real Time

To apply a different mixer, append result_mixer=:

http://localhost:8000/swirl/results?search_id=<search-id>&result_mixer=<mixer-name>

Example:
http://localhost:8000/swirl/results?search_id=1&result_mixer=Stack1Mixer

Page Through Results

By default, SWIRL retrieves at least 10 results per SearchProvider.

To navigate results, append page=:

http://localhost:8000/swirl/results?search_id=<search-id>&page=<page-number>

Example:
http://localhost:8000/swirl/results?search_id=1&page=2

Increase Available Results

To store more results for paging, update results_per_query in the SearchProvider configuration.

Default: 10
Recommended for extensive paging: 20, 50, or 100

Increasing results_per_query requires re-running the search to fetch more results.

Get Search Times

SWIRL reports search execution times per source in the info block:

"info": "Web (Google PSE)": {
    "found": 8640,
    "retrieved": 10,
    "search_time": 2.1
}

The total federation time appears in info.results:

"results": {
    "retrieved_total": 50,
    "retrieved": 10,
    "federation_time": 3.2,
    "next_page": "http://localhost:8000/swirl/results/?search_id=507&page=2"
}

Timing Details:

Units: Seconds (rounded to 0.1 precision).
Federation Time Includes: Query execution, response processing, post-processing.
Mixer Processing Time is NOT included in federation time.

Configure Pipelines

Result processing happens in two stages:

SearchProvider.result_processors → Initial processing.
Search.post_result_processors → Final processing & ranking.

Example: Google PSE Result Processors

"result_processors": [
    "MappingResultProcessor",
    "DateFinderResultProcessor",
    "CosineRelevancyResultProcessor"
]

Modify Default Pipelines

To customize:

Post Result Processors: Edit getSearchPostResultProcessorsDefault() in swirl/models.py.
Default Mixer: Change the Search.result_mixer default.

result_mixer = models.CharField(max_length=200, default='RelevancyMixer', choices=MIXER_CHOICES)

Configure Relevancy Field Weights

To adjust field weights for relevancy scoring, update RELEVANCY_CONFIG in:
swirl_server/settings.py.

Default Weights:

Field	Weight	Notes
title	`1.5`
body	`1.0`	Base relevancy score
author	`1.0`

Configure Stopwords Language

By default, SWIRL loads English stopwords. To change this:

Update SWIRL_DEFAULT_QUERY_LANGUAGE in:
swirl_server/settings.py.
Set it to another NLTK stopword language.

Redact or Remove Personally Identifiable Information (PII)

SWIRL supports PII removal and redaction using Microsoft Presidio.

RemovePIIQueryProcessor (Redacts Queries)

Removes PII before querying.

Enable for a Specific SearchProvider:

"query_processors": [
    "AdaptiveQueryProcessor",
    "RemovePIIQueryProcessor"
]

Enable for ALL SearchProviders:

Modify swirl/models.py:

def getSearchPreQueryProcessorsDefault():
    return ["RemovePIIQueryProcessor"]

More details: ResultProcessors

RemovePIIResultProcessor (Redacts Results)

Redacts PII in results (e.g., "James T. Kirk" → "<PERSON>").

Enable for a Specific SearchProvider:

"result_processors": [
    "MappingResultProcessor",
    "DateFinderResultProcessor",
    "CosineRelevancyResultProcessor",
    "RemovePIIResultProcessor"
]

More details: ResultProcessors

RemovePIIPostResultProcessor

This processor applies PII redaction after all results are processed.

Understand the Explain Structure

The CosineRelevancyProcessor outputs a JSON structure explaining swirl_score calculations.

Viewing the Explain Data:

Enabled by default.
To disable, add &explain=False to the mixer URL.

Example:
SWIRL Result with Explain

Explain Match Types:

Postfix	Meaning	Example
*`_`**	Query partially matched against the entire result field.	`"knowledge_management_*", 0.7332...`
*`_s`**	Query matched one or more sentences, highest similarity recorded.	`"knowledge_management_s*", 0.7332...`
`_n`	Query matched at word position `'n'` in the field.	`"Knowledge_Management_0", 0.7332...`

Additional Data:

stems → Shows matching stems.
result and query length adjustments are recorded.
hits → Displays zero-offset token positions for each match.

Develop New Connectors

To connect to a new endpoint using an existing Connector (e.g., RequestsGet), create a new SearchProvider instead.

Example: The Google PSE SearchProvider JSON demonstrates how one Connector can be used to define hundreds of SearchProviders.

When to Develop a New Connector

Create a new Connector if:

The target API requires a unique transport method not supported by existing connectors.
A high-quality Python package exists to interface with the API.

Connector Base Class

All Connectors extend the Connector base class, which defines the workflow in federate().
Source: swirl/connectors.

Connector Workflow (federate() Method)

def federate(self):
    '''
    Executes the workflow for a given search and provider
    ''' 
    self.start_time = time.time()

    if self.status == 'READY':
        self.status = 'FEDERATING'
        try:
            self.process_query()
            self.construct_query()
            if self.validate_query():
                self.execute_search()
                if self.status == 'FEDERATING':
                    self.normalize_response()
                    self.process_results()
                if self.status == 'READY':
                    return self.save_results()
            else:
                self.error('validate_query() failed')
        except Exception as err:
            self.error(f'{err}')
    return False

Connector Execution Stages

Stage	Description	Notes
process_query	Calls the Query Processor to adapt the query for this SearchProvider.
construct_query	Assembles the final query format.
validate_query	Checks if the query is valid and error-free.	Returns `False` if invalid.
execute_search	Connects to the SearchProvider, executes the query, and stores the response.
normalize_response	Transforms the provider’s response into JSON format for SWIRL.
process_results	Runs Result Processors to map data to SWIRL’s schema.
save_results	Saves results in the Django database.

A new Connector must override:

execute_search() → Handles the API connection & query execution.
normalize_response() → Converts raw API responses into structured JSON.

Connector Development Guidelines

Import new connectors in swirl/connectors/__init__.py.
Register new processors in CHOICES inside swirl/models.py (requires a database migration).
Limit imports to only the required libraries (e.g., requests, elasticsearch, sqlite3).
To extend an existing transport, subclass it and override normalize_response().
Ensure execute_search() supports:
- results_per_query > 10 (handle paging if needed).
- Date sorting (if supported by the data source).

Using `eval_credentials` for Secure Authentication

To use session-based credentials dynamically in a SearchProvider:

Store the authentication token in a session variable.
Use eval_credentials to inject it into the SearchProvider.

Example:

{
   "eval_credentials": "session['my-connector-token']",
   "credentials": "myusername:{credentials}"
}

Required Query Mappings

When developing a new Connector, implement query_mappings:

DATE_SORT → Enables date-based sorting.
PAGE → Enables pagination support.
NOT_CHAR / NOT → Defines negation behavior.

Required Result Processing

Each Connector should process results using a Result Processor, ideally:

"result_processors": [
    "MappingResultProcessor"
]

More details: MappingResultProcessor.

Develop New Processors

Processor classes are located in: swirl/processors.

Key Guidelines:

Processors execute in sequence and should perform one transformation only.
Inherit from QueryProcessor, ResultProcessor, or PostResultProcessor.
Override process() for simple changes or define new variables in __init__.
Use validate() to check input values.
Return: Processed data (for Query/Result processors) or an integer count of results updated (for PostResultProcessors).

Development Notes:

Import new processors in: swirl/processors/__init__.py.
Register processors in CHOICES inside: swirl/models.py (requires a database migration).
PostResultProcessors should be the only processors accessing model data.
Ensure process() returns either:
- Processed data (Query/Result processors).
- Number of updated results (PostResultProcessors).
Use helper functions in: swirl/processors/utils.py.

Develop New Mixers

Mixer classes are located in: swirl/mixers.

Mixer Workflow

def mix(self):
    '''
    Executes the workflow for a given mixer
    '''
    self.order()
    self.finalize()
    return self.mix_wrapper

Most Mixers override order().
order() should sort and save self.all_results into self.mixed_results.

Example: **Basic Paging Mixer

def order(self):
    '''
    Orders all_results into mixed_results
    Base class, intended to be overridden!
    '''
    self.mixed_results = self.all_results[(self.page-1)*self.results_requested:(self.page)*self.results_requested]

Example: **RelevancyMixer

class RelevancyMixer(Mixer):

    type = 'RelevancyMixer'

    def order(self):
        # Sort results by SWIRL score, then by SearchProvider rank
        self.mixed_results = sorted(
            sorted(self.all_results, key=itemgetter('searchprovider_rank')), 
            key=itemgetter('swirl_score'), 
            reverse=True
        )

Finalizing Results

finalize() trims self.mixed_results, adds metadata, and returns mix_wrapper.
Mixers automatically page results if enough are available.

Development Notes:

Import new mixers in: swirl/mixers/__init__.py.
Register mixers in CHOICES inside: swirl/models.py (requires a database migration).

Retrieval Augmented Generation WebSocket API

WebSocket Interaction Protocol for UI Developers

This section outlines the protocol for WebSocket interactions with the SWIRL server.

1. Initialize the WebSocket

Action: Create a WebSocket connection.
Parameters:
- searchId: SWIRL search ID (required).
- ragItems: (Optional) Array of integers identifying search results for RAG processing.
Behavior:
- Opens a new WebSocket connection.
- Sends initial parameters (searchId, ragItems) to the server.
- Authentication should be handled before opening the WebSocket.

2. Sending Data

Action: Send a RAG request over the WebSocket.
Parameters:
- Empty message → Starts RAG processing.
- "stop" → Cancels an ongoing RAG request for the searchId.

3. Receiving Data

Action: Receive RAG results from the WebSocket.
Response Format: JSON or "No data" if an error occurs.

{
    "message": {
        "date_published": "<timestamp-response-creation>",
        "title": "<query-string>",
        "body": "<ai_response>",
        "author": "ChatGPT",
        "searchprovider": "ChatGPT",
        "searchprovider_rank": 1,
        "result_block": "ai_summary",
        "rag_query_items": [<list-of-rag-items-passed-in>]
    }
}

4. Connection Teardown

Action: Close the WebSocket gracefully when finished.

Using Query Transformations

Query Transformation Rules

Developers can apply query transformation rules using the Query Transformation feature.

Pre-query rules → Apply before queries are sent to all sources.
Per-source rules → Apply to individual SearchProviders.

Supported Transformation Types:

Type	Description
Replace	Replaces a string in the query (or removes it entirely).
Synonym	Replaces a term with an OR expression containing synonyms.
Synonym Bag	Expands a term into an OR expression containing multiple synonyms.

Rules are provided as CSV files uploaded to SWIRL.

Replace/Rewrite Rules

CSV Format:

Column 1	Column 2
List of patterns to replace (separated by `;`). Supports `*` wildcards (non-leading).	Replacement string (leave blank to remove the term).

Example Configuration:

# column1, column2
mobiles; ombile; mo bile, mobile
computers, computer
cheap* smartphones, cheap smartphone
on

Example Transformations:

Query	Transformed Query
`mobiles`	`mobile`
`ombile`	`mobile`
`mo bile`	`mobile`
`on computing`	`computing`
`cheaper smartphones`	`cheap smartphone`
`computers go figure`	`computer go figure`

Synonym Rules

CSV Format:

Column 1	Column 2
Term	Synonym

Example Configuration:

# column1, column2
notebook, laptop
laptop, personal computer
pc, personal computer
personal computer, pc
car, ride

Example Transformations:

Query	Transformed Query
`notebook`	`(notebook OR laptop)`
`pc`	`(pc OR personal computer)`
`personal computer`	`(personal computer OR pc)`
`I love my notebook`	`I love my (notebook OR laptop)`
`This pc, it is better than a notebook`	`This (pc OR personal computer), it is better than a (notebook OR laptop)`
`My favorite song is "You got a fast car"`	`My favorite song is "You got a fast (car OR ride)"`

Synonym Bag Rules

CSV Format:

Column 1	Column 2…N
Term	List of synonyms

Example Configuration:

# column1, column2, column3, column4
notebook, personal computer, laptop, pc
car, automobile, ride

Example Transformations:

Query	Transformed Query
`car`	`(car OR automobile OR ride)`
`automobile`	`(automobile OR car OR ride)`
`ride`	`(ride OR car OR automobile)`
`pimp my ride`	`pimp my (ride OR car OR automobile)`
`automobile, yours is fast`	`(automobile OR car OR ride), yours is fast`
`I love the movie The Notebook`	`I love the movie The Notebook`
`My new notebook is slow`	`My new (notebook OR personal computer OR laptop OR pc) is slow`

Uploading a Query Transformation CSV

Log in as an admin user on the SWIRL homepage.
Select Upload Query Transform CSV:
Enter a Name and select a Type:
Choose the CSV file to upload:
Click Upload:

Using the Uploaded CSV

Once uploaded, reference the file as <name>.<type>.

Example: If the file was named TestQueryTransform with type synonym, the reference is:

TestQueryTransform.synonym

Pre-Query Processing

Apply query transformations before execution:

Option 1: Use pre_query_processor in the API

/api/swirl/search/search?q=notebook&pre_query_processor=TestQueryTransform.synonym

Option 2: Update the **SWIRL Search Object

Modify pre_query_processors in the Search object to include the transformation.

More details: Creating a Search Object with the API.

Query Processing

Update the SearchProvider’s query_processors field:

{
    "name": "TEST Web (Google PSE) with synonym processor",
    "active": "true",
    "default": "true",
    "connector": "RequestsGet",
    "query_processors": [
        "AdaptiveQueryProcessor",
        "TestQueryTransform.synonym"
    ],
    "query_mappings": "cx=0c38029ddd002c006,DATE_SORT=sort=date,PAGE=start=RESULT_INDEX,NOT_CHAR=-",
    "result_processors": [
        "MappingResultProcessor",
        "CosineRelevancyResultProcessor"
    ]
}

Integrate Source Synonyms into SWIRL Relevancy

SWIRL can extract source-specific synonym feedback and integrate it into relevancy scoring.

Why?

Some search engines apply synonyms internally (e.g., notebook → laptop), but SWIRL’s relevancy scoring is not aware of these extra terms. Hit highlighting extraction enables SWIRL to detect them.

Supported SearchProviders

OpenSearch
Elasticsearch
Solr

Configuration

1. Enable Hit Highlighting in the SearchProvider

Modify the query_template to enable hit highlighting on all fields:

"query_template": {
    "highlight": { "fields": { "*": {} } }
}

Synonym Relevancy - 1

Consult the search engine’s documentation for additional highlighting options.

2. Map Highlighted Fields in results_mapping

Assign highlighted synonyms to the following SWIRL result fields:

title_hit_highlights
body_hit_highlights

Example: Elasticsearch Response

{
    "_source": {
        "title": "Laptop computer",
        "content": "I need a new laptop computer for work."
    },
    "highlight": {
        "title": ["<em>Notebook</em> computer"],
        "content": ["I need a new <em>notebook</em> computer for work."]
    }
}

Mapping Configuration in results_mapping

title_hit_highlights=highlight.title, body_hit_highlights=highlight.content

Synonym Relevancy - 2

Results

The configuration appears in the info section of the results:

Synonym Relevancy - 3

The original query term was "notebook".
The search engine used "laptop" as a synonym.
Both terms were extracted and used in SWIRL's relevancy ranking.

Complete Highlighted Synonyms

The full hit highlighting content is available in:

body_hit_highlights → Synonym highlights in content.
title_hit_highlights → Synonym highlights in titles.

Synonym Relevancy - 4

Example Search Objects

Basic Search

Runs a default configuration:

Retrieves 10 results.
Uses the RelevancyMixer.

{
    "query_string": "search engine"
}

Run as a GET request

Using the q= URL parameter:

http://localhost:8000/swirl/search?q=search+engine

Using NOT Queries

{
    "query_string": "search engine -SEO"
}

{
    "query_string": "generative ai NOT chatgpt"
}

Note:

SWIRL may rewrite these queries based on query_mappings in the SearchProvider.
See: Search Syntax.

Sorting by Date

{
    "query_string": "search engine",
    "sort": "date"
}

Using the DateMixer (instead of RelevancyMixer)

{
    "query_string": "search engine",
    "sort": "date",
    "result_mixer": "DateMixer"
}

Spellcheck Example

{
    "query_string": "search engine",
    "pre_query_processors": "SpellcheckQueryProcessor"
}

Spellcheck runs before federated search.
The corrected query is sent to each SearchProvider.
Not recommended for Google PSE, as it handles spellchecking natively.

Searches specifying "sort", "result_mixer", or "pre_query_processors" must be POSTed to the Search API.

Advanced Search Example

This request:

Retrieves 20 results.
Queries SearchProviders 1 & 3 only.
Uses the RoundRobinMixer instead of relevancy ranking.
Sets a retention time of 1 hour.

{
    "query_string": "search engine",
    "results_requested": 20,
    "searchprovider_list": [1, 3],
    "result_mixer": "RoundRobinMixer",
    "retention": 1
}

Retention setting (retention: 1) ensures the search is deleted after 1 hour, assuming the Search Expiration Service is running.

Funding Dataset Examples

If the Funding Dataset is installed, the following queries work:

electric vehicle company:tesla

social media company:facebook

company:slack

Architecture

SWIRL AI Search

SWIRL RAG Architecture

SWIRL AI Search Assistant

Workflow

How To…

Work with JSON Endpoints

Create a Search Object via API

Create a Search Object with the q= URL Parameter

Specify SearchProviders with providers= URL Parameter

Get Synchronous Results with qs= URL Parameter

Request Date-Sorted Results

Use an LLM to Rewrite Queries

Adjust swirl_score for Starred Results in Galaxy UI

Handle NOT Queries

Subscribe to a Search

Detect and Remove Duplicate Results

Manage Search Objects

Re-Run a Search

Update a Search

Improve Relevancy for a Single SearchProvider

Find Dates in Body/Title Responses

Automatically Map Results Using Profiling

Visualize Structured Data Results

Expire Search Objects

Manage Results

Get Unified Results

Page Through Results

Get Search Times

Configure Pipelines

Configure Relevancy Field Weights

Configure Stopwords Language

Redact or Remove Personally Identifiable Information (PII)

Understand the Explain Structure

Develop New Connectors

Using eval_credentials for Secure Authentication

Required Query Mappings

Required Result Processing

Develop New Processors

Develop New Mixers

Retrieval Augmented Generation WebSocket API

WebSocket Interaction Protocol for UI Developers

Using Query Transformations

Query Transformation Rules

Replace/Rewrite Rules

Synonym Rules

Synonym Bag Rules

Uploading a Query Transformation CSV

Pre-Query Processing

Query Processing

Integrate Source Synonyms into SWIRL Relevancy

Configuration

Results

Example Search Objects

Basic Search

Using NOT Queries

Sorting by Date

Using the DateMixer (instead of RelevancyMixer)

Spellcheck Example

Advanced Search Example

Funding Dataset Examples

Create a Search Object with the `q=` URL Parameter

Specify SearchProviders with `providers=` URL Parameter

Get Synchronous Results with `qs=` URL Parameter

Adjust `swirl_score` for Starred Results in Galaxy UI

Using `eval_credentials` for Secure Authentication