Table of Contents

SearchProviders Guide
Community Edition | Enterprise Edition


SWIRL queries may be subject to rate limits or throttling imposed by the sources being queried.

SearchProviders are the core of SWIRL, enabling easy connections to various data sources without writing any code.

Each SearchProvider is a JSON object. SWIRL includes preconfigured providers for sources like Elastic, Solr, PostgreSQL, BigQuery, NLResearch.com, Miro.com, Atlassian, and more.

SWIRL comes with active SearchProviders for arXiv.org, European PMC, and Google News that work "out of the box" if internet access is available.

Additionally, inactive SearchProviders for Google Web Search and SWIRL Documentation use Google Programmable Search Engine (PSE). These require a Google API key. See the SearchProvider Guide for setup details.

SearchProvider Example JSON

Preloaded SearchProviders

SearchProvider Description Notes
arxiv.json Searches the arXiv.org repository of scientific papers No authentication required
asana.json Searches tasks in Asana Requires Asana personal access token
atlassian.json Searches Atlassian Confluence Cloud, Jira Cloud, and Trello Requires a bearer token and/or Trello API key
blockchain-bitcoin.json Searches Blockchain.com for Bitcoin addresses and transactions Requires Blockchain.com API key
chatgpt.json OpenAI ChatGPT AI chatbot Requires OpenAI API key
company_snowflake.json Queries the Snowflake FreeCompanyResearch dataset Requires Snowflake username and password
crunchbase.json Searches organizations via Crunchbase API Requires Crunchbase API key
document_db.json SQLite3 document database Sample Data
elastic_cloud.json ElasticSearch (cloud version) Enron Email Dataset
elasticsearch.json ElasticSearch (local install) Enron Email Dataset
europe_pmc.json Searches EuropePMC.org for life sciences literature No authentication required
funding_db_bigquery.json BigQuery funding database Funding Dataset
funding_db_postgres.json PostgreSQL funding database Funding Dataset
funding_db_sqlite3.json SQLite3 funding database Funding Dataset
github.json Searches public repositories for Code, Commits, Issues, and PRs Requires GitHub bearer token
google_news.json Queries Google News No authentication required
google_pse.json Web search via Google Programmable Search Engine (PSE) Requires Google API key
hacker_news.json Queries Hacker News No authentication required
http_get_with_auth.json Generic HTTP GET with authentication Requires URL and credentials
http_post_with_auth.json Generic HTTP POST with authentication Requires URL and credentials
hubspot.json Searches the HubSpot CRM for Companies, Contacts, and Deals Requires API token with these scopes
internet_archive.json Queries the Internet Archive No authentication required
littlesis.json Queries LittleSis.org database of influential business and government figures No authentication required
microsoft.json Queries Microsoft 365 (Outlook, OneDrive, SharePoint, Teams) See the M365 Guide
miro.json Searches Miro.com boards Requires bearer token
movies_mongodb.json Queries MongoDB Atlas sample_mflix.movies dataset Requires MongoDB credentials
newsdata_io.json Searches Newsdata.io Requires API key
nlresearch.json Searches NLResearch.com premium content Requires credentials
open_sanctions.json Queries OpenSanctions.org Requires API key
opensearch.json OpenSearch 2.x Developer Guide
oracle.json Queries Oracle 23c Free (and earlier versions) Requires Oracle credentials
preloaded.json All preloaded SearchProviders Default in SWIRL
servicenow.json Searches ServiceNow Knowledge and Service Catalog Requires username and password
solr.json Queries Apache Solr (local install) Requires host, port, collection
solr_with_auth.json Secured Solr instance Requires credentials
youtrack.json Searches JetBrains YouTrack Requires bearer token

This table provides a high-level overview of the available SearchProviders. Detailed configurations can be found in the SearchProviders repository.

Activating a SearchProvider

To activate a preloaded SearchProvider, edit it and change:

    "active": false

to

    "active": true

Click the PUT button to save the change. You can also use the HTML Form at the bottom of the page for convenience.

SearchProvider HTML form

Activating a Google Programmable Search Engine (PSE) SearchProvider

SWIRL includes an inactive Google PSE configuration that allows searching the web or a defined "slice" of it.
Google PSE is not free and requires a valid Google API key.

Create a Google Programmable Search Engine (PSE)

  1. Go to Google Programmable Search Engine
  2. Click Get Started and log in with your Google account
  3. Follow the steps to create a PSE and note the cx parameter (your Google PSE ID)

Create a Google API Key

  1. Visit the Google API Custom Search overview
  2. Follow the instructions to generate an API key

Activate the Google PSE SearchProvider

  1. Edit the Google PSE provider
  2. Change:
        "active": false
    

    to:

        "active": true
    

    Or use the HTML form at the bottom of the page.

  3. Update the query_mappings field with your Google PSE ID (cx parameter):
        "query_mappings": "cx=<your-Google-PSE-id>"
    
  4. Update the credentials field with your Google API key, using the key= prefix:
        "credentials": "key=<your-Google-API-key>"
    
  5. Click the PUT button to save the changes.
  6. Reload SWIRL Galaxy—your new source will appear in the source selector.

Copy/Paste Install

If you have a SearchProvider JSON file, you can copy and paste it into the form at the bottom of the SearchProvider endpoint.

SWIRL API

Steps:

  1. Go to http://localhost:8000/swirl/searchproviders/
  2. Click the Raw data tab at the bottom of the page.
  3. Paste the SearchProvider JSON (either a single record or a list of records).
  4. Click the POST button.
  5. SWIRL will confirm the new SearchProvider(s).

Bulk Loading

Use the swirl_load.py script to bulk-load SearchProviders.

Steps:

  1. Open a terminal and navigate to your SWIRL home directory:
    cd <swirl-home>
    
  2. Run the following command:
    python swirl_load.py SearchProviders/provider-name.json -u admin -p your-admin-password
    
  3. The script will load all configurations from the specified file.
  4. Visit http://localhost:8000/swirl/searchproviders/ to verify.

Example:

SWIRL SearchProviders List - Google PSE Example 1 SWIRL SearchProviders List - Google PSE Example 2

Editing a SearchProvider

To edit a SearchProvider, append its id to the end of the /swirl/searchproviders URL.

For example:
http://localhost:8000/swirl/searchproviders/1/

SWIRL SearchProvider Instance - Google PSE

Available Actions:

  • DELETE the SearchProvider permanently.
  • Modify the configuration and click PUT to save changes.

Query Templating

Most SearchProviders require a query_template, which binds to query_mappings during the federation process.

For example, the original query_template for the MongoDB movie SearchProvider:

    "query_template": "{'$text': {'$search': '{query_string}'}}"

This format is a string, not valid JSON. The single quotes are required because the JSON itself uses double quotes.

Starting in SWIRL 3.2.0, MongoDB SearchProviders now use the query_template_json field, which stores the template as valid JSON:

"query_template_json": {
    "$text": {
        "$search": "{query_string}"
    }
}

Organizing SearchProviders with Active, Default, and Tags

SearchProviders have three properties that control their participation in queries:

Property Description
Active true/false – If false, the SearchProvider will not receive queries, even if specified in a searchprovider_list.
Default true/false – If false, the SearchProvider will only be queried if explicitly listed in searchprovider_list.
Tags List of strings grouping providers by topic. Tags can be used in searchprovider_list, as a providers= URL parameter, or as tag:term in a query.

Best Practices for SearchProvider Organization:

  • General-purpose providers should have "Default": true to be included in broad searches.
  • Topic-specific providers should have "Default": false and use "Tags": ["topic1", "topic2"].
  • Users can target specific providers using a mix of Tags, SearchProvider names, or IDs.

This ensures broad searches use the best general providers, while topic-specific searches can target precise data sources.

Query Mappings

SearchProvider query_mappings are key-value pairs that define how queries are structured for a given SearchProvider.

These mappings configure field replacements and query transformations that SWIRL's processors (such as AdaptiveQueryProcessor) use to adapt the query format to each provider's requirements.

Available query_mappings Options

Mapping Format Description Example
key = value Replaces {key} in the query_template with value. "query_template": "{url}?cx={cx}&key={key}&q={query_string}","query_mappings": "cx=google-pse-key"
DATE_SORT=url-snippet Inserts the specified string into the URL when date sorting is enabled. "query_mappings": "DATE_SORT=sort=date"
RELEVANCY_SORT=url-snippet Inserts the specified string into the URL when relevancy sorting is enabled. "query_mappings": "RELEVANCY_SORT=sort=relevancy"
PAGE=url-snippet Enables pagination by inserting either RESULT_INDEX (absolute result number) or RESULT_PAGE (page number). "query_mappings": "PAGE=start=RESULT_INDEX"
NOT=True Indicates that the provider supports basic NOT operators. elon musk NOT twitter
NOT_CHAR=- Defines a character for NOT operators. elon musk -twitter

Query Field Mappings

In query_mappings, keys enclosed in braces within query_template are replaced with mapped values.

Example Configuration

"url": "https://www.googleapis.com/customsearch/v1",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",
"query_processors": [
        "AdaptiveQueryProcessor"
    ],
"query_mappings": "cx=0c38029ddd002c006,DATE_SORT=sort=date,PAGE=start=RESULT_INDEX",

Example Query Output

At query execution time, this configuration generates:

https://www.googleapis.com/customsearch/v1?cx=0c38029ddd002c006&q=some_query_string

Key Configuration Guidelines:

  • The url field is specific to each SearchProvider and should contain static parameters that never change.
  • query_mappings allow dynamic replacements using query-time values.
  • The query_string is populated by SWIRL as described in the Developer Guide.

HTTP Request Headers

The optional http_request_headers field allows custom HTTP headers to be sent along with a query.

For example, the GitHub SearchProvider uses this to request enhanced search snippets, which are then mapped to SWIRL's body field:

"http_request_headers": {
    "Accept": "application/vnd.github.text-match+json"
},

"result_mappings": "title=name,body=text_matches[*].fragment, ..."

This feature ensures richer, more relevant search results by enabling source-specific header configurations.

Result Processors

Each SearchProvider can define its own Result Processing pipeline. A typical configuration looks like this:

"result_processors": [
    "MappingResultProcessor",
    "CosineRelevancyResultProcessor"
],

Enabling Relevancy Ranking

If Relevancy Ranking is required:

  1. The CosineRelevancyResultProcessor must be the last item in the result_processors list.
  2. The CosineRelevancyPostResultProcessor must be included in the Search.post_result_processors method, located in swirl/models.py.

For more details, refer to the Relevancy Ranking Guide.

Additional ResultProcessors

SWIRL provides other ResultProcessors that may be useful in specific cases. See the Developer Guide for more details.

Authentication & Credentials

The credentials property stores authentication information required by a SearchProvider.

Supported Authentication Formats

Key-Value Format** (Appended to URL)

Used when an API key is passed as a query parameter.

Example: Google PSE SearchProvider

"credentials": "key=your-google-api-key-here",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",

Bearer Token** (Sent in HTTP Header)

Supported by the RequestsGet and RequestsPost connectors.

Example: Miro SearchProvider

"credentials": "bearer=your-miro-api-token",

X-Api-Key Format** (Sent in HTTP Header)

"credentials": "X-Api-Key=<your-api-key>",

HTTP Basic/Digest/Proxy Authentication

Supported by RequestsGet, ElasticSearch, and OpenSearch connectors.

Example: Solr with Auth SearchProvider

"credentials": "HTTPBasicAuth('solr-username','solr-password')",

Other Authentication Methods

For advanced authentication techniques, consult the Developer Guide.

Response Mappings

SearchProvider response_mappings determine how each source's response is normalized into JSON.
They are processed by the Connector's normalize_response method.

Example: Google PSE Response Mappings

"response_mappings": "FOUND=searchInformation.totalResults,RETRIEVED=queries.request[0].count,RESULTS=items",

Response Mapping Options

Mapping JSONPath Source Required? Example
FOUND Total number of results available for the query (default: same as RETRIEVED if not specified) No searchInformation.totalResults=FOUND
RETRIEVED Number of results returned for this query (default: length of RESULTS list) No queries.request[0].count=RETRIEVED
RESULTS Path to the list of result items Yes items=RESULTS
RESULT Path to the document (if result items are stored within a dictionary/wrapper) No document=RESULT

Proper response mappings ensure consistent search results across different sources.

Result Mappings

SearchProvider result_mappings define how JSON result sets from external sources are mapped to SWIRL's standard result schema. Each mapping follows JSONPath conventions.

Default SWIRL Fields

Field Name Description
author Author of the item (not always reliable for web content).
body Main content extracted from the result.
date_published Original publication date (not always reliable for web content).
date_retrieved Date and time SWIRL retrieved the result.
title Title of the item.
url URL of the result item.

Example: Google PSE Result Mapping

"result_mappings": "url=link,body=snippet,author=displayLink,cacheId,pagemap.metatags[*].['og:type'],pagemap.metatags[*].['og:site_name'],pagemap.metatags[*].['og:description'],NO_PAYLOAD"

Here, url=link and body=snippet map Google PSE result fields to SWIRL result fields.

XML to JSON Conversion

The requests.py connector automatically converts XML to JSON for mapping.

It also handles list-of-list responses, where the first list element contains field names.

Example:

[
    ["urlkey", "timestamp", "original", "mimetype", "statuscode"],
    ["today,swirl)/", "20221012214440", "http://swirl.today/", "text/html"]
]

This format is automatically converted into a structured JSON array.

Constructing URLs from Mappings

If a SearchProvider does not return full URLs, JSONPath syntax can construct them dynamically.

Example: Europe PubMed Central

"url='https://europepmc.org/article/{source}/{id}'"

Here, {source} and {id} are values from the JSON result, inserted into the URL dynamically.

Aggregating Field Values

To aggregate list values into a single string, use JSONPath syntax.

Example: Google PSE Metadata Aggregation

"pagemap.metatags[*].['og:type']"

This merges all og:type values from the metadata into a single result field.

Example: ArXiv Author Aggregation

"author[*].name"

This collects all author names into a single field.

Multiple Mappings

SWIRL allows multiple source fields to map to a single SWIRL field.

"result_mappings": "body=content|description,..."
  • If one field is populated, it maps to body.
  • If both fields contain data, the second field is moved to PAYLOAD as <swirl-field>_<source_field>.

Example Result Object:

{
    "swirl_rank": 1,
    "title": "What The Mid-Term Elections Mean For U.S. Energy",
    "url": "https://www.forbes.com/sites/davidblackmon/2022/11/13/what-the-mid-term-elections-mean-for-us-energy/",
    "body": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened...",
    "payload": {
        "body_description": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened..."
    }
}

Result Mapping Options

Mapping Format Description Example
swirl_key = source_key Maps a field from the source provider to SWIRL. "body=_source.email"
swirl_key = source_key1|source_key2 Maps multiple fields; the first populated field is mapped, others go to PAYLOAD. "body=content|description"
swirl_key='template {variable}' Formats multiple values into a single string. "'{x}: {y}'=title"
source_key Maps a field from the raw source result into PAYLOAD. "cacheId, _source.products"
sw_urlencode URL-encodes the specified value. "url=sw_urlencode(<hitId>)"
sw_btcconvert Converts Satoshi to Bitcoin. "sw_btcconvert(<fee>)"
NO_PAYLOAD Disables automatic copying of all source fields to PAYLOAD. "NO_PAYLOAD"
FILE_SYSTEM Treats the SearchProvider as a file system, increasing body weight in ranking. "FILE_SYSTEM"
LC_URL Converts url to lowercase. "LC_URL"
BLOCK Used in SWIRL's RAG processing; stores output in the info block of the result object. "BLOCK=ai_summary"
DATASET Formats columnar responses into a single result. "DATASET"

Controlling date_published Display

As of SWIRL 2.1, different values can be mapped to date_published and date_published_display.

"result_mappings": "... date_published=foo.bar.date1,date_published_display=foo.bar.date2 ..."

This results in:

"date_published": "2010-01-01 00:00:00",
"date_published_display": "c2010"

Result Schema

The JSON result schema is defined in:

Result Mixers further process and merge data from multiple sources.

PAYLOAD Field

The PAYLOAD field stores all unmapped result data from the source.

Using NO_PAYLOAD Effectively

To exclude unnecessary fields from PAYLOAD:

  1. Run a test query without NO_PAYLOAD to inspect raw fields.
  2. Add specific mappings for the fields you need.
  3. Enable "NO_PAYLOAD" to discard unmapped data.

SWIRL copies all source data to PAYLOAD by default unless NO_PAYLOAD is specified.