Table of Contents

SearchProviders Guide
Community Edition | Enterprise Edition


SearchProviders are the core of SWIRL, enabling easy connections to various data sources without writing any code. SWIRL includes preconfigured providers for sources like M365, iManage, Box.com, ServiceNow, SalesForce, HubSpot, Arxiv, Google News and many more.

Default Providers

SWIRL comes with active SearchProviders for arXiv.org, European PMC, and Google News that work "out of the box" if internet access is available.

Additionally, inactive SearchProviders for Google Web Search and SWIRL Documentation use Google Programmable Search Engine (PSE). These require a Google API key. See the SearchProvider Guide for setup details.

SearchProvider Example JSON

Preloaded SearchProviders

To view the list of preloaded SearchProviders, use the admin console: http://localhost:8000/admin/

SWIRL Admin Console showing SearchProvider link

Click the link SearchProviders at the bottom, left of the screen, under the SWIRL menu. This will bring up a list of the current SearchProviders:

SWIRL Admin Console showing SearchProvider list

Editing a SearchProvider

From the admin console click on a SearchProvider to edit it.

SWIRL Admin Console showing SearchProvider list

Use the form that appears to make changes:

SWIRL Admin Console showing SearchProvider editing

Click the "SAVE" button, at the bottom of the form, to commit changes.

SWIRL Admin Console showing SearchProvider editing with SAVE button

Cloning a SearchProvider

To clone a SearchProvider, open it as described above, then:

  • Change the name of the SearchProvider to a unique value
  • Click the "Save as new" button at the bottom of the page.

The "Save as new" button is pictured above.

Activating a SearchProvider

To activate a SearchProvider edit it using the Admin Console as shown above. Then check the active field, if it is not already.

SWIRL Admin Console showing SearchProvider active setting checked

Click the "SAVE" button, at the bottom of the page, to commit this change.

Adding SearchProviders

Using the Admin Console

From the admin console, go to the SearchProvider page:

SWIRL admin console showing SearchProvider

Click the "Add SearchProvider" button at the top/right of the SearchProvider list that appears:

SWIRL admin console showing Add SearchProvider

Fill in the field(s) with the appropriate values for your SearchProvider. Contact SWIRL for assistance.

Via Copy/Paste

If you have a SearchProvider JSON file, copy and paste it into the form at the bottom of the SearchProvider API endpoint.

SWIRL API

Steps:

  1. Go to http://localhost:8000/swirl/searchproviders/
  2. Click the Raw data tab at the bottom of the page.
  3. Paste the SearchProvider JSON (either a single record or a list of records).
  4. Click the POST button.
  5. SWIRL will confirm the new SearchProvider(s).

Using the Bulk Loader

Use the swirl_load.py script to bulk-load SearchProviders.

Steps:

  1. Open a terminal and navigate to your SWIRL home directory:
    cd <swirl-home>
    
  2. Run the following command:
    python swirl_load.py SearchProviders/provider-name.json -u admin -p your-admin-password
    
  3. The script will load all configurations from the specified file.
  4. Visit http://localhost:8000/swirl/searchproviders/ to verify.

Example:

SWIRL SearchProviders List - Google PSE Example 1 SWIRL SearchProviders List - Google PSE Example 2

Query Templates

Most SearchProviders require a query_template, which binds to query_mappings during the federation process.

For example, the original query_template for the MongoDB movie SearchProvider:

    "query_template": "{'$text': {'$search': '{query_string}'}}"

This format is a string, not valid JSON. The single quotes are required because the JSON itself uses double quotes.

Starting in SWIRL 3.2.0, MongoDB SearchProviders now use the query_template_json field, which stores the template as valid JSON:

"query_template_json": {
    "$text": {
        "$search": "{query_string}"
    }
}

Organizing SearchProviders with Active, Default, and Tags

SearchProviders have three properties that control their participation in queries:

Property Description
Active true/false – If false, the SearchProvider will not receive queries, even if specified in a searchprovider_list.
Default true/false – If false, the SearchProvider will only be queried if explicitly listed in searchprovider_list.
Tags List of strings grouping providers by topic. Tags can be used in searchprovider_list, as a providers= URL parameter, or as tag:term in a query.

Best Practices for SearchProvider Organization:

  • General-purpose providers should have "Default": true to be included in broad searches.
  • Topic-specific providers should have "Default": false and use "Tags": ["topic1", "topic2"].
  • Users can target specific providers using a mix of Tags, SearchProvider names, or IDs.

This ensures broad searches use the best general providers, while topic-specific searches can target precise data sources.

Sharing a SearchProvider with Specific Users

By default, SearchProviders are shared with all SWIRL users. To enable group-level access to a SearchProvider, the admin user should do the following:

  1. Create a new Django Group and add one or more SWIRL users to it. Use the Django Admin tool as described here: https://docs.swirlaiconnect.com/Admin-Guide.html#django-admin

  2. Edit the SearchProvider

Use the Admin Console to edit the SearchProvider as follows:

  • Change the shared property to false. Add the name of the group created in step 1 to the SearchProvider groups property. Note that this field must be a list. For example:
    "groups": ["some_new_group"]
    

SWIRL Admin Console showing SearchProvider group and shared options

  1. Click the SAVE button at the bottom of the form to save changes!

Now, only the users who are members of the newly created group (from step 1) can use that SearchProvider, and it should appear for them in the Galaxy sources pulldown.

Query Mappings

SearchProvider query_mappings are key-value pairs that define how queries are structured for a given SearchProvider.

These mappings configure field replacements and query transformations that SWIRL's processors (such as AdaptiveQueryProcessor) use to adapt the query format to each provider's requirements.

Available query_mappings Options

Mapping Format Description Example
key = value Replaces {key} in the query_template with value. "query_template": "{url}?cx={cx}&key={key}&q={query_string}","query_mappings": "cx=google-pse-key"
DATE_SORT=url-snippet Inserts the specified string into the URL when date sorting is enabled. "query_mappings": "DATE_SORT=sort=date"
RELEVANCY_SORT=url-snippet Inserts the specified string into the URL when relevancy sorting is enabled. "query_mappings": "RELEVANCY_SORT=sort=relevancy"
PAGE=url-snippet Enables pagination by inserting either RESULT_INDEX (absolute result number) or RESULT_PAGE (page number). "query_mappings": "PAGE=start=RESULT_INDEX"
NOT=True Indicates that the provider supports basic NOT operators. elon musk NOT twitter
NOT_CHAR=- Defines a character for NOT operators. elon musk -twitter

Query Field Mappings

In query_mappings, keys enclosed in braces within query_template are replaced with mapped values.

Example Configuration

"url": "https://www.googleapis.com/customsearch/v1",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",
"query_processors": [
        "AdaptiveQueryProcessor"
    ],
"query_mappings": "cx=0c38029ddd002c006,DATE_SORT=sort=date,PAGE=start=RESULT_INDEX",

Example Query Output

At query execution time, this configuration generates:

https://www.googleapis.com/customsearch/v1?cx=0c38029ddd002c006&q=some_query_string

Key Configuration Guidelines:

  • The url field is specific to each SearchProvider and should contain static parameters that never change.
  • query_mappings allow dynamic replacements using query-time values.
  • The query_string is populated by SWIRL as described in the Developer Guide.

HTTP Request Headers

The optional http_request_headers field allows custom HTTP headers to be sent along with a query.

For example, the GitHub SearchProvider uses this to request enhanced search snippets, which are then mapped to SWIRL's body field:

"http_request_headers": {
    "Accept": "application/vnd.github.text-match+json"
},

"result_mappings": "title=name,body=text_matches[*].fragment, ..."

This feature ensures richer, more relevant search results by enabling source-specific header configurations.

Result Processors

Each SearchProvider can define its own Result Processing pipeline. A typical configuration looks like this:

"result_processors": [
    "MappingResultProcessor",
    "CosineRelevancyResultProcessor"
],

Enabling Relevancy Ranking

If Relevancy Ranking is required:

  1. The CosineRelevancyResultProcessor must be the last item in the result_processors list.
  2. The CosineRelevancyPostResultProcessor must be included in the Search.post_result_processors method, located in swirl/models.py.

For more details, refer to the Relevancy Ranking Guide.

Additional ResultProcessors

SWIRL provides other ResultProcessors that may be useful in specific cases. See the Developer Guide for more details.

Authentication & Credentials

The credentials property stores authentication information required by a SearchProvider.

Supported Authentication Formats

Key-Value Format (Appended to URL)

Used when an API key is passed as a query parameter.

Example: Google PSE SearchProvider

"credentials": "key=your-google-api-key-here",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",

Bearer Token (Sent in HTTP Header)

Supported by the RequestsGet and RequestsPost connectors.

Example: Miro SearchProvider

"credentials": "bearer=your-miro-api-token",

X-Api-Key Format (Sent in HTTP Header)

"credentials": "X-Api-Key=<your-api-key>",

HTTP Basic/Digest/Proxy Authentication

Supported by RequestsGet, ElasticSearch, and OpenSearch connectors.

Example: Solr with Auth SearchProvider

"credentials": "HTTPBasicAuth('solr-username','solr-password')",

Other Authentication Methods

For advanced authentication techniques, consult the Developer Guide.

Response Mappings

SearchProvider response_mappings determine how each source's response is normalized into JSON.
They are processed by the Connector's normalize_response method.

Example: Google PSE Response Mappings

"response_mappings": "FOUND=searchInformation.totalResults,RETRIEVED=queries.request[0].count,RESULTS=items",

Response Mapping Options

Mapping JSONPath Source Required? Example
FOUND Total number of results available for the query (default: same as RETRIEVED if not specified) No searchInformation.totalResults=FOUND
RETRIEVED Number of results returned for this query (default: length of RESULTS list) No queries.request[0].count=RETRIEVED
RESULTS Path to the list of result items Yes items=RESULTS
RESULT Path to the document (if result items are stored within a dictionary/wrapper) No document=RESULT

Proper response mappings ensure consistent search results across different sources.

Result Mappings

SearchProvider result_mappings define how JSON result sets from external sources are mapped to SWIRL's standard result schema. Each mapping follows JSONPath conventions.

Configuration Options

Use the following configuration options to override default SP behavior.

They must be placed in the "config" block.

Retrieval Augmented Generation (RAG)

The following configuration items change the RAG defaults for a single SearchProvider:

"swirl": { 
    "rag": {
        "swirl_rag_max_to_consider": <integer-max-to-consider>,
        "swirl_rag_fetch_timeout": <integer-rag-fetch-timeout>,
        "swirl_rag_score_inclusion_threshold": <float-rag-score-inclusion-threshold>,
        "swirl_rag_distribution_strategy": <rag-distribution-strategy>,
        "swirl_rag_inclusion_field": "<swirl_confidence_score|swirl_score>"
     }
}

The following are valid RAG distribution strategies that can be selected by swirl_rag_distribution_strategy:

  • distributed
  • roundrobin
  • sorted
  • roundrobinthreshold

For example:

"swirl": {
    "rag": {
        "swirl_rag_inclusion_field": "swirl_score",
        "swirl_rag_distribution_strategy": "sorted",
        "swirl_rag_score_inclusion_threshold": 2500,
        "swirl_rag_max_to_consider": 4,
        "swirl_rag_fetch_timeout": 1
    }
},

Page Fetching

The following configuration items allow modification of the page fetching defaults for a single SearchProvider:

"config": {
        "swirl": {
            "fetch_url_body": {
               "body_pagefetch_min_tokens": <min-tokens>,
               "body_pagefetch_token_length":  <token-length>,
               "body_pagefetch_fallback_token_length": <fallback-token-length>,
               "body_pagefetch_generation_method":"<generation-method>",
               "body_pagefetch_text_extract_timeout": <text-extraction-timeout>
             }
        }
    }

The following are valid generation methods that may be selected using body_pagefetch_generation_method:

  • TERM_COUNT
  • TERM_VECTOR

For example:

"config": {
        "swirl": {
            "fetch_url_body": {
               "body_pagefetch_min_tokens": 5,
               "body_pagefetch_token_length":64,
               "body_pagefetch_fallback_token_length":128,
               "body_pagefetch_generation_method":"TERM_COUNT",
               "body_pagefetch_text_extract_timeout":30
             }
        }
    }

Google Calendar

The following configuration items allow modification of the Google Calendar defaults:

"config": {
        "swirl": {
            "google_calendar": {
               "calendar_lookback_days": <lookback-days>,
               "calendar_lookahead_days": <lookahead-days>
            }
        }
    }

In both cases, specify the number of days. For example:

"config": {
        "swirl": {
            "google_calendar": {
               "calendar_lookback_days": 30,
               "calendar_lookahead_days": 30
            }
        }
    }

This feature is only supported in SWIRL Enterprise.

To retrieve more results when the user (or the Search Assistant) selects a single SearchProvider for a search, add the following to the config block:

"config":{
  "swirl": {
    "connector_use": {
      "single_provider_results_requested": 50
    }
  }
}

SWIRL will retrieve the number of results specified by single_provider_results_requested, instead of results_per_query.

To disable this behavior, remove the configuration item.

In addition you can pass single_provider_results_requested=<int> to a GET /api/swirl/search REST request. If there is also exactly one Search Provider ID in the Search Provider list the number of results passed in will be fetched. If the value is also set in the configuration of that Search Provider, the passed in value is used.

Default SWIRL Fields

Field Name Description
author Author of the item (not always reliable for web content).
body Main content extracted from the result.
date_published Original publication date (not always reliable for web content).
date_retrieved Date and time SWIRL retrieved the result.
title Title of the item.
url URL of the result item.

Example: Google PSE Result Mapping

"result_mappings": "url=link,body=snippet,author=displayLink,cacheId,pagemap.metatags[*].['og:type'],pagemap.metatags[*].['og:site_name'],pagemap.metatags[*].['og:description'],NO_PAYLOAD"

Here, url=link and body=snippet map Google PSE result fields to SWIRL result fields.

XML to JSON Conversion

The requests.py connector automatically converts XML to JSON for mapping.

It also handles list-of-list responses, where the first list element contains field names.

Example:

[
    ["urlkey", "timestamp", "original", "mimetype", "statuscode"],
    ["today,swirl)/", "20221012214440", "http://swirl.today/", "text/html"]
]

This format is automatically converted into a structured JSON array.

Constructing URLs from Mappings

If a SearchProvider does not return full URLs, JSONPath syntax can construct them dynamically.

Example: Europe PubMed Central

"url='https://europepmc.org/article/{source}/{id}'"

Here, {source} and {id} are values from the JSON result, inserted into the URL dynamically.

Aggregating Field Values

To aggregate list values into a single string, use JSONPath syntax.

Example: Google PSE Metadata Aggregation

"pagemap.metatags[*].['og:type']"

This merges all og:type values from the metadata into a single result field.

Example: ArXiv Author Aggregation

"author[*].name"

This collects all author names into a single field.

Multiple Mappings

SWIRL allows multiple source fields to map to a single SWIRL field.

"result_mappings": "body=content|description,..."
  • If one field is populated, it maps to body.
  • If both fields contain data, the second field is moved to PAYLOAD as <swirl-field>_<source_field>.

Example Result Object:

{
    "swirl_rank": 1,
    "title": "What The Mid-Term Elections Mean For U.S. Energy",
    "url": "https://www.forbes.com/sites/davidblackmon/2022/11/13/what-the-mid-term-elections-mean-for-us-energy/",
    "body": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened...",
    "payload": {
        "body_description": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened..."
    }
}

Result Mapping Options

Mapping Format Description Example
swirl_key = source_key Maps a field from the source provider to SWIRL. "body=_source.email"
swirl_key = source_key1|source_key2 Maps multiple fields; the first populated field is mapped, others go to PAYLOAD. "body=content|description"
swirl_key='template {variable}' Formats multiple values into a single string. "'{x}: {y}'=title"
source_key Maps a field from the raw source result into PAYLOAD. "cacheId, _source.products"
sw_urlencode URL-encodes the specified value. "url=sw_urlencode(<hitId>)"
sw_btcconvert Converts Satoshi to Bitcoin. "sw_btcconvert(<fee>)"
NO_PAYLOAD Disables automatic copying of all source fields to PAYLOAD. "NO_PAYLOAD"
FILE_SYSTEM Treats the SearchProvider as a file system, increasing body weight in ranking. "FILE_SYSTEM"
LC_URL Converts url to lowercase. "LC_URL"
BLOCK Used in SWIRL's RAG processing; stores output in the info block of the result object. "BLOCK=ai_summary"
DATASET Formats columnar responses into a single result. "DATASET"

Controlling date_published Display

As of SWIRL 2.1, different values can be mapped to date_published and date_published_display.

"result_mappings": "... date_published=foo.bar.date1,date_published_display=foo.bar.date2 ..."

This results in:

"date_published": "2010-01-01 00:00:00",
"date_published_display": "c2010"

Result Schema

The JSON result schema is defined in:

Result Mixers further process and merge data from multiple sources.

PAYLOAD Field

The PAYLOAD field stores all unmapped result data from the source.

Using NO_PAYLOAD Effectively

To exclude unnecessary fields from PAYLOAD:

  1. Run a test query without NO_PAYLOAD to inspect raw fields.
  2. Add specific mappings for the fields you need.
  3. Enable "NO_PAYLOAD" to discard unmapped data.

SWIRL copies all source data to PAYLOAD by default unless NO_PAYLOAD is specified.

Rate Limiting or Throttling of SWIRL by Sources

Please note: SWIRL queries may be subject to rate limits or throttling imposed by the sources being queried. Consult the terms for the service in question for details.

SWIRL honors 429 responses to HTTP requests (including MS Graph API) and automatically back-off for a configurable time period, or the time reported.

SWIRL may be configured to limit the rate sent to any given SearchProvider. Contact support for assistance.