Table of Contents

Preloaded SearchProviders
Activating a SearchProvider
Activating a Google Programmable Search Engine (PSE) SearchProvider
- Create a Google Programmable Search Engine (PSE)
- Create a Google API Key
- Activate the Google PSE SearchProvider
Copy/Paste Install
- Steps:
Bulk Loading
- Steps:
- Example:
Editing a SearchProvider
- Available Actions:
Query Templating
Organizing SearchProviders with Active, Default, and Tags
- Best Practices for SearchProvider Organization:
Query Mappings
- Available query_mappings Options
Query Field Mappings
- Example Configuration
- Example Query Output
- Key Configuration Guidelines:
HTTP Request Headers
Result Processors
- Enabling Relevancy Ranking
- Additional ResultProcessors
Authentication & Credentials
- Supported Authentication Formats
Response Mappings
- Example: Google PSE Response Mappings
- Response Mapping Options
Result Mappings
Configuration Options
- Retrieval Augmented Generation (RAG)
- Page Fetching
- Google Calendar
Default SWIRL Fields
Example: Google PSE Result Mapping
- XML to JSON Conversion
Constructing URLs from Mappings
Aggregating Field Values
Multiple Mappings
Result Mapping Options
- Controlling date_published Display
Result Schema
PAYLOAD Field
- Using NO_PAYLOAD Effectively

SearchProviders Guide
Community Edition | Enterprise Edition

SWIRL queries may be subject to rate limits or throttling imposed by the sources being queried.

SearchProviders are the core of SWIRL, enabling easy connections to various data sources without writing any code.

Each SearchProvider is a JSON object. SWIRL includes preconfigured providers for sources like Elastic, Solr, PostgreSQL, BigQuery, NLResearch.com, Miro.com, Atlassian, and more.

SWIRL comes with active SearchProviders for arXiv.org, European PMC, and Google News that work "out of the box" if internet access is available.

Additionally, inactive SearchProviders for Google Web Search and SWIRL Documentation use Google Programmable Search Engine (PSE). These require a Google API key. See the SearchProvider Guide for setup details.

SearchProvider Example JSON

Preloaded SearchProviders

SearchProvider	Description	Notes
arxiv.json	Searches the arXiv.org repository of scientific papers	No authentication required
asana.json	Searches tasks in Asana	Requires Asana personal access token
atlassian.json	Searches Atlassian Confluence Cloud, Jira Cloud, and Trello	Requires a bearer token and/or Trello API key
blockchain-bitcoin.json	Searches Blockchain.com for Bitcoin addresses and transactions	Requires Blockchain.com API key
chatgpt.json	OpenAI ChatGPT AI chatbot	Requires OpenAI API key
company_snowflake.json	Queries the Snowflake `FreeCompanyResearch` dataset	Requires Snowflake username and password
crunchbase.json	Searches organizations via Crunchbase API	Requires Crunchbase API key
document_db.json	SQLite3 document database	Sample Data
elastic_cloud.json	ElasticSearch (cloud version)	Enron Email Dataset
elasticsearch.json	ElasticSearch (local install)	Enron Email Dataset
europe_pmc.json	Searches EuropePMC.org for life sciences literature	No authentication required
funding_db_bigquery.json	BigQuery funding database	Funding Dataset
funding_db_postgres.json	PostgreSQL funding database	Funding Dataset
funding_db_sqlite3.json	SQLite3 funding database	Funding Dataset
github.json	Searches public repositories for Code, Commits, Issues, and PRs	Requires GitHub bearer token
google_news.json	Queries Google News	No authentication required
google_pse.json	Web search via Google Programmable Search Engine (PSE)	Requires Google API key
google_workspace.json	Queries Google Workspace	See the Google Workspace Guide
hacker_news.json	Queries Hacker News	No authentication required
http_get_with_auth.json	Generic HTTP GET with authentication	Requires URL and credentials
http_post_with_auth.json	Generic HTTP POST with authentication	Requires URL and credentials
hubspot.json	Searches the HubSpot CRM for Companies, Contacts, and Deals	Requires API token with these scopes
internet_archive.json	Queries the Internet Archive	No authentication required
littlesis.json	Queries LittleSis.org database of influential business and government figures	No authentication required
microsoft.json	Queries Microsoft 365 (Outlook, OneDrive, SharePoint, Teams)	See the M365 Guide
miro.json	Searches Miro.com boards	Requires bearer token
movies_mongodb.json	Queries MongoDB Atlas `sample_mflix.movies` dataset	Requires MongoDB credentials
newsdata_io.json	Searches Newsdata.io	Requires API key
nlresearch.json	Searches NLResearch.com premium content	Requires credentials
open_sanctions.json	Queries OpenSanctions.org	Requires API key
opensearch.json	OpenSearch 2.x	Developer Guide
oracle.json	Queries Oracle 23c Free (and earlier versions)	Requires Oracle credentials
preloaded.json	All preloaded SearchProviders	Default in SWIRL
servicenow.json	Searches ServiceNow Knowledge and Service Catalog	Requires username and password
solr.json	Queries Apache Solr (local install)	Requires host, port, collection
solr_with_auth.json	Secured Solr instance	Requires credentials
youtrack.json	Searches JetBrains YouTrack	Requires bearer token

This table provides a high-level overview of the available SearchProviders. Detailed configurations can be found in the SearchProviders repository.

Activating a SearchProvider

To activate a preloaded SearchProvider, edit it and change:

    "active": false

    "active": true

Click the PUT button to save the change. You can also use the HTML Form at the bottom of the page for convenience.

SearchProvider HTML form

Activating a Google Programmable Search Engine (PSE) SearchProvider

SWIRL includes an inactive Google PSE configuration that allows searching the web or a defined "slice" of it.
Google PSE is not free and requires a valid Google API key.

Create a Google Programmable Search Engine (PSE)

Go to Google Programmable Search Engine
Click Get Started and log in with your Google account
Follow the steps to create a PSE and note the cx parameter (your Google PSE ID)

Create a Google API Key

Visit the Google API Custom Search overview
Follow the instructions to generate an API key

Activate the Google PSE SearchProvider

Edit the Google PSE provider
Change:
```
    "active": false
```
to:
```
    "active": true
```
Or use the HTML form at the bottom of the page.
Update the query_mappings field with your Google PSE ID (cx parameter):
```
    "query_mappings": "cx=<your-Google-PSE-id>"
```
Update the credentials field with your Google API key, using the key= prefix:
```
    "credentials": "key=<your-Google-API-key>"
```
Click the PUT button to save the changes.
Reload SWIRL Galaxy—your new source will appear in the source selector.

Copy/Paste Install

If you have a SearchProvider JSON file, you can copy and paste it into the form at the bottom of the SearchProvider endpoint.

SWIRL API

Steps:

Go to http://localhost:8000/swirl/searchproviders/
Click the Raw data tab at the bottom of the page.
Paste the SearchProvider JSON (either a single record or a list of records).
Click the POST button.
SWIRL will confirm the new SearchProvider(s).

Bulk Loading

Use the swirl_load.py script to bulk-load SearchProviders.

Steps:

Open a terminal and navigate to your SWIRL home directory:
```
cd <swirl-home>
```

Run the following command:

python swirl_load.py SearchProviders/provider-name.json -u admin -p your-admin-password

The script will load all configurations from the specified file.
Visit http://localhost:8000/swirl/searchproviders/ to verify.

Example:

SWIRL SearchProviders List - Google PSE Example 1

Editing a SearchProvider

To edit a SearchProvider, append its id to the end of the /swirl/searchproviders URL.

For example:
http://localhost:8000/swirl/searchproviders/1/

SWIRL SearchProvider Instance - Google PSE

Available Actions:

DELETE the SearchProvider permanently.
Modify the configuration and click PUT to save changes.

Query Templating

Most SearchProviders require a query_template, which binds to query_mappings during the federation process.

For example, the original query_template for the MongoDB movie SearchProvider:

    "query_template": "{'$text': {'$search': '{query_string}'}}"

This format is a string, not valid JSON. The single quotes are required because the JSON itself uses double quotes.

Starting in SWIRL 3.2.0, MongoDB SearchProviders now use the query_template_json field, which stores the template as valid JSON:

"query_template_json": {
    "$text": {
        "$search": "{query_string}"
    }
}

Organizing SearchProviders with Active, Default, and Tags

SearchProviders have three properties that control their participation in queries:

Property	Description
Active	`true/false` – If `false`, the SearchProvider will not receive queries, even if specified in a `searchprovider_list`.
Default	`true/false` – If `false`, the SearchProvider will only be queried if explicitly listed in `searchprovider_list`.
Tags	List of strings grouping providers by topic. Tags can be used in `searchprovider_list`, as a `providers=` URL parameter, or as `tag:term` in a query.

Best Practices for SearchProvider Organization:

General-purpose providers should have "Default": true to be included in broad searches.
Topic-specific providers should have "Default": false and use "Tags": ["topic1", "topic2"].
Users can target specific providers using a mix of Tags, SearchProvider names, or IDs.

This ensures broad searches use the best general providers, while topic-specific searches can target precise data sources.

Query Mappings

SearchProvider query_mappings are key-value pairs that define how queries are structured for a given SearchProvider.

These mappings configure field replacements and query transformations that SWIRL's processors (such as AdaptiveQueryProcessor) use to adapt the query format to each provider's requirements.

Available `query_mappings` Options

Mapping Format	Description	Example
key = value	Replaces `{key}` in the `query_template` with `value`.	`"query_template": "{url}?cx={cx}&key={key}&q={query_string}","query_mappings": "cx=google-pse-key"`
DATE_SORT=url-snippet	Inserts the specified string into the URL when date sorting is enabled.	`"query_mappings": "DATE_SORT=sort=date"`
RELEVANCY_SORT=url-snippet	Inserts the specified string into the URL when relevancy sorting is enabled.	`"query_mappings": "RELEVANCY_SORT=sort=relevancy"`
PAGE=url-snippet	Enables pagination by inserting either `RESULT_INDEX` (absolute result number) or `RESULT_PAGE` (page number).	`"query_mappings": "PAGE=start=RESULT_INDEX"`
NOT=True	Indicates that the provider supports basic `NOT` operators.	`elon musk NOT twitter`
NOT_CHAR=-	Defines a character for `NOT` operators.	`elon musk -twitter`

Query Field Mappings

In query_mappings, keys enclosed in braces within query_template are replaced with mapped values.

Example Configuration

"url": "https://www.googleapis.com/customsearch/v1",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",
"query_processors": [
        "AdaptiveQueryProcessor"
    ],
"query_mappings": "cx=0c38029ddd002c006,DATE_SORT=sort=date,PAGE=start=RESULT_INDEX",

Example Query Output

At query execution time, this configuration generates:

https://www.googleapis.com/customsearch/v1?cx=0c38029ddd002c006&q=some_query_string

Key Configuration Guidelines:

The url field is specific to each SearchProvider and should contain static parameters that never change.
query_mappings allow dynamic replacements using query-time values.
The query_string is populated by SWIRL as described in the Developer Guide.

HTTP Request Headers

The optional http_request_headers field allows custom HTTP headers to be sent along with a query.

For example, the GitHub SearchProvider uses this to request enhanced search snippets, which are then mapped to SWIRL's body field:

"http_request_headers": {
    "Accept": "application/vnd.github.text-match+json"
},

"result_mappings": "title=name,body=text_matches[*].fragment, ..."

This feature ensures richer, more relevant search results by enabling source-specific header configurations.

Result Processors

Each SearchProvider can define its own Result Processing pipeline. A typical configuration looks like this:

"result_processors": [
    "MappingResultProcessor",
    "CosineRelevancyResultProcessor"
],

Enabling Relevancy Ranking

If Relevancy Ranking is required:

The CosineRelevancyResultProcessor must be the last item in the result_processors list.
The CosineRelevancyPostResultProcessor must be included in the Search.post_result_processors method, located in swirl/models.py.

For more details, refer to the Relevancy Ranking Guide.

Additional ResultProcessors

SWIRL provides other ResultProcessors that may be useful in specific cases. See the Developer Guide for more details.

Authentication & Credentials

The credentials property stores authentication information required by a SearchProvider.

Supported Authentication Formats

Key-Value Format** (Appended to URL)

Used when an API key is passed as a query parameter.

Example: Google PSE SearchProvider

"credentials": "key=your-google-api-key-here",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",

Bearer Token** (Sent in HTTP Header)

Supported by the RequestsGet and RequestsPost connectors.

Example: Miro SearchProvider

"credentials": "bearer=your-miro-api-token",

X-Api-Key Format** (Sent in HTTP Header)

"credentials": "X-Api-Key=<your-api-key>",

HTTP Basic/Digest/Proxy Authentication

Supported by RequestsGet, ElasticSearch, and OpenSearch connectors.

Example: Solr with Auth SearchProvider

"credentials": "HTTPBasicAuth('solr-username','solr-password')",

Other Authentication Methods

For advanced authentication techniques, consult the Developer Guide.

Response Mappings

SearchProvider response_mappings determine how each source's response is normalized into JSON.
They are processed by the Connector's normalize_response method.

Example: Google PSE Response Mappings

"response_mappings": "FOUND=searchInformation.totalResults,RETRIEVED=queries.request[0].count,RESULTS=items",

Response Mapping Options

Mapping	JSONPath Source	Required?	Example
FOUND	Total number of results available for the query (default: same as `RETRIEVED` if not specified)	No	`searchInformation.totalResults=FOUND`
RETRIEVED	Number of results returned for this query (default: length of `RESULTS` list)	No	`queries.request[0].count=RETRIEVED`
RESULTS	Path to the list of result items	Yes	`items=RESULTS`
RESULT	Path to the document (if result items are stored within a dictionary/wrapper)	No	`document=RESULT`

Proper response mappings ensure consistent search results across different sources.

Result Mappings

SearchProvider result_mappings define how JSON result sets from external sources are mapped to SWIRL's standard result schema. Each mapping follows JSONPath conventions.

Configuration Options

Use the following configuration options to override default SP behavior.

They must be placed in the "config" block.

Retrieval Augmented Generation (RAG)

The following configuration items change the RAG defaults for a single SearchProvider:

"swirl": { 
    "rag": {
        "swirl_rag_max_to_consider": <integer-max-to-consider>,
        "swirl_rag_fetch_timeout": <integer-rag-fetch-timeout>,
        "swirl_rag_score_inclusion_threshold": <float-rag-score-inclusion-threshold>,
        "swirl_rag_distribution_strategy": <rag-distribution-strategy>,
        "swirl_rag_inclusion_field": "<swirl_confidence_score|swirl_score>"
     }
}

The following are valid RAG distribution strategies that can be selected by swirl_rag_distribution_strategy:

distributed
roundrobin
sorted
roundrobinthreshold

For example:

"swirl": {
    "rag": {
        "swirl_rag_inclusion_field": "swirl_score",
        "swirl_rag_distribution_strategy": "sorted",
        "swirl_rag_score_inclusion_threshold": 2500,
        "swirl_rag_max_to_consider": 4,
        "swirl_rag_fetch_timeout": 1
    }
},

Page Fetching

The following configuration items allow modification of the page fetching defaults for a single SearchProvider:

"config": {
        "swirl": {
            "fetch_url_body": {
               "body_pagefetch_min_tokens": <min-tokens>,
               "body_pagefetch_token_length":  <token-length>,
               "body_pagefetch_fallback_token_length": <fallback-token-length>,
               "body_pagefetch_generation_method":"<generation-method>",
               "body_pagefetch_text_extract_timeout": <text-extraction-timeout>
             }
        }
    }

The following are valid generation methods that may be selected using body_pagefetch_generation_method:

TERM_COUNT
TERM_VECTOR

For example:

"config": {
        "swirl": {
            "fetch_url_body": {
               "body_pagefetch_min_tokens": 5,
               "body_pagefetch_token_length":64,
               "body_pagefetch_fallback_token_length":128,
               "body_pagefetch_generation_method":"TERM_COUNT",
               "body_pagefetch_text_extract_timeout":30
             }
        }
    }

Google Calendar

The following configuration items allow modification of the Google Calendar defaults:

"config": {
        "swirl": {
            "google_calendar": {
               "calendar_lookback_days": <lookback-days>,
               "calendar_lookahead_days": <lookahead-days>
            }
        }
    }

In both cases, specify the number of days. For example:

"config": {
        "swirl": {
            "google_calendar": {
               "calendar_lookback_days": 30,
               "calendar_lookahead_days": 30
            }
        }
    }

Default SWIRL Fields

Field Name	Description
author	Author of the item (not always reliable for web content).
body	Main content extracted from the result.
date_published	Original publication date (not always reliable for web content).
date_retrieved	Date and time SWIRL retrieved the result.
title	Title of the item.
url	URL of the result item.

Example: Google PSE Result Mapping

"result_mappings": "url=link,body=snippet,author=displayLink,cacheId,pagemap.metatags[*].['og:type'],pagemap.metatags[*].['og:site_name'],pagemap.metatags[*].['og:description'],NO_PAYLOAD"

Here, url=link and body=snippet map Google PSE result fields to SWIRL result fields.

XML to JSON Conversion

The requests.py connector automatically converts XML to JSON for mapping.

It also handles list-of-list responses, where the first list element contains field names.

Example:

[
    ["urlkey", "timestamp", "original", "mimetype", "statuscode"],
    ["today,swirl)/", "20221012214440", "http://swirl.today/", "text/html"]
]

This format is automatically converted into a structured JSON array.

Constructing URLs from Mappings

If a SearchProvider does not return full URLs, JSONPath syntax can construct them dynamically.

Example: Europe PubMed Central

"url='https://europepmc.org/article/{source}/{id}'"

Here, {source} and {id} are values from the JSON result, inserted into the URL dynamically.

Aggregating Field Values

To aggregate list values into a single string, use JSONPath syntax.

Example: Google PSE Metadata Aggregation

"pagemap.metatags[*].['og:type']"

This merges all og:type values from the metadata into a single result field.

Example: ArXiv Author Aggregation

"author[*].name"

This collects all author names into a single field.

Multiple Mappings

SWIRL allows multiple source fields to map to a single SWIRL field.

"result_mappings": "body=content|description,..."

If one field is populated, it maps to body.
If both fields contain data, the second field is moved to PAYLOAD as <swirl-field>_<source_field>.

Example Result Object:

{
    "swirl_rank": 1,
    "title": "What The Mid-Term Elections Mean For U.S. Energy",
    "url": "https://www.forbes.com/sites/davidblackmon/2022/11/13/what-the-mid-term-elections-mean-for-us-energy/",
    "body": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened...",
    "payload": {
        "body_description": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened..."
    }
}

Result Mapping Options

Mapping Format	Description	Example
swirl_key = source_key	Maps a field from the source provider to SWIRL.	`"body=_source.email"`
swirl_key = source_key1\|source_key2	Maps multiple fields; the first populated field is mapped, others go to PAYLOAD.	`"body=content\|description"`
swirl_key='template {variable}'	Formats multiple values into a single string.	`"'{x}: {y}'=title"`
source_key	Maps a field from the raw source result into PAYLOAD.	`"cacheId, _source.products"`
sw_urlencode	URL-encodes the specified value.	`"url=sw_urlencode(<hitId>)"`
sw_btcconvert	Converts Satoshi to Bitcoin.	`"sw_btcconvert(<fee>)"`
NO_PAYLOAD	Disables automatic copying of all source fields to PAYLOAD.	`"NO_PAYLOAD"`
FILE_SYSTEM	Treats the SearchProvider as a file system, increasing `body` weight in ranking.	`"FILE_SYSTEM"`
LC_URL	Converts `url` to lowercase.	`"LC_URL"`
BLOCK	Used in SWIRL's RAG processing; stores output in the info block of the result object.	`"BLOCK=ai_summary"`
DATASET	Formats columnar responses into a single result.	`"DATASET"`

Controlling `date_published` Display

As of SWIRL 2.1, different values can be mapped to date_published and date_published_display.

"result_mappings": "... date_published=foo.bar.date1,date_published_display=foo.bar.date2 ..."

This results in:

"date_published": "2010-01-01 00:00:00",
"date_published_display": "c2010"

Result Schema

The JSON result schema is defined in:

Result Mixers further process and merge data from multiple sources.

PAYLOAD Field

The PAYLOAD field stores all unmapped result data from the source.

Using `NO_PAYLOAD` Effectively

To exclude unnecessary fields from PAYLOAD:

Run a test query without NO_PAYLOAD to inspect raw fields.
Add specific mappings for the fields you need.
Enable "NO_PAYLOAD" to discard unmapped data.

SWIRL copies all source data to PAYLOAD by default unless NO_PAYLOAD is specified.

Preloaded SearchProviders

Activating a SearchProvider

Activating a Google Programmable Search Engine (PSE) SearchProvider

Create a Google Programmable Search Engine (PSE)

Create a Google API Key

Activate the Google PSE SearchProvider

Copy/Paste Install

Steps:

Bulk Loading

Steps:

Example:

Editing a SearchProvider

Available Actions:

Query Templating

Organizing SearchProviders with Active, Default, and Tags

Best Practices for SearchProvider Organization:

Query Mappings

Available query_mappings Options

Query Field Mappings

Example Configuration

Example Query Output

Key Configuration Guidelines:

HTTP Request Headers

Result Processors

Enabling Relevancy Ranking

Additional ResultProcessors

Authentication & Credentials

Supported Authentication Formats

Response Mappings

Example: Google PSE Response Mappings

Response Mapping Options

Result Mappings

Configuration Options

Retrieval Augmented Generation (RAG)

Page Fetching

Google Calendar

Default SWIRL Fields

Example: Google PSE Result Mapping

XML to JSON Conversion

Constructing URLs from Mappings

Aggregating Field Values

Multiple Mappings

Result Mapping Options

Controlling date_published Display

Result Schema

PAYLOAD Field

Using NO_PAYLOAD Effectively

Available `query_mappings` Options

Controlling `date_published` Display

Using `NO_PAYLOAD` Effectively