Table of Contents
- Preloaded SearchProviders
- Activating a SearchProvider
- Activating a Google Programmable Search Engine (PSE) SearchProvider
- Copy/Paste Install
- Bulk Loading
- Editing a SearchProvider
- Query Templating
- Organizing SearchProviders with Active, Default, and Tags
- Query Mappings
- Query Field Mappings
- HTTP Request Headers
- Result Processors
- Authentication & Credentials
- Response Mappings
- Result Mappings
- Default SWIRL Fields
- Example: Google PSE Result Mapping
- Constructing URLs from Mappings
- Aggregating Field Values
- Multiple Mappings
- Result Mapping Options
- Result Schema
- PAYLOAD Field
SearchProviders Guide
Community Edition | Enterprise Edition
SWIRL queries may be subject to rate limits or throttling imposed by the sources being queried.
SearchProviders are the core of SWIRL, enabling easy connections to various data sources without writing any code.
Each SearchProvider is a JSON object. SWIRL includes preconfigured providers for sources like Elastic, Solr, PostgreSQL, BigQuery, NLResearch.com, Miro.com, Atlassian, and more.
SWIRL comes with active SearchProviders for arXiv.org, European PMC, and Google News that work "out of the box" if internet access is available.
Additionally, inactive SearchProviders for Google Web Search and SWIRL Documentation use Google Programmable Search Engine (PSE). These require a Google API key. See the SearchProvider Guide for setup details.
Preloaded SearchProviders
SearchProvider | Description | Notes |
---|---|---|
arxiv.json | Searches the arXiv.org repository of scientific papers | No authentication required |
asana.json | Searches tasks in Asana | Requires Asana personal access token |
atlassian.json | Searches Atlassian Confluence Cloud, Jira Cloud, and Trello | Requires a bearer token and/or Trello API key |
blockchain-bitcoin.json | Searches Blockchain.com for Bitcoin addresses and transactions | Requires Blockchain.com API key |
chatgpt.json | OpenAI ChatGPT AI chatbot | Requires OpenAI API key |
company_snowflake.json | Queries the Snowflake FreeCompanyResearch dataset | Requires Snowflake username and password |
crunchbase.json | Searches organizations via Crunchbase API | Requires Crunchbase API key |
document_db.json | SQLite3 document database | Sample Data |
elastic_cloud.json | ElasticSearch (cloud version) | Enron Email Dataset |
elasticsearch.json | ElasticSearch (local install) | Enron Email Dataset |
europe_pmc.json | Searches EuropePMC.org for life sciences literature | No authentication required |
funding_db_bigquery.json | BigQuery funding database | Funding Dataset |
funding_db_postgres.json | PostgreSQL funding database | Funding Dataset |
funding_db_sqlite3.json | SQLite3 funding database | Funding Dataset |
github.json | Searches public repositories for Code, Commits, Issues, and PRs | Requires GitHub bearer token |
google_news.json | Queries Google News | No authentication required |
google_pse.json | Web search via Google Programmable Search Engine (PSE) | Requires Google API key |
hacker_news.json | Queries Hacker News | No authentication required |
http_get_with_auth.json | Generic HTTP GET with authentication | Requires URL and credentials |
http_post_with_auth.json | Generic HTTP POST with authentication | Requires URL and credentials |
hubspot.json | Searches the HubSpot CRM for Companies, Contacts, and Deals | Requires API token with these scopes |
internet_archive.json | Queries the Internet Archive | No authentication required |
littlesis.json | Queries LittleSis.org database of influential business and government figures | No authentication required |
microsoft.json | Queries Microsoft 365 (Outlook, OneDrive, SharePoint, Teams) | See the M365 Guide |
miro.json | Searches Miro.com boards | Requires bearer token |
movies_mongodb.json | Queries MongoDB Atlas sample_mflix.movies dataset | Requires MongoDB credentials |
newsdata_io.json | Searches Newsdata.io | Requires API key |
nlresearch.json | Searches NLResearch.com premium content | Requires credentials |
open_sanctions.json | Queries OpenSanctions.org | Requires API key |
opensearch.json | OpenSearch 2.x | Developer Guide |
oracle.json | Queries Oracle 23c Free (and earlier versions) | Requires Oracle credentials |
preloaded.json | All preloaded SearchProviders | Default in SWIRL |
servicenow.json | Searches ServiceNow Knowledge and Service Catalog | Requires username and password |
solr.json | Queries Apache Solr (local install) | Requires host, port, collection |
solr_with_auth.json | Secured Solr instance | Requires credentials |
youtrack.json | Searches JetBrains YouTrack | Requires bearer token |
This table provides a high-level overview of the available SearchProviders. Detailed configurations can be found in the SearchProviders repository.
Activating a SearchProvider
To activate a preloaded SearchProvider, edit it and change:
"active": false
to
"active": true
Click the PUT
button to save the change. You can also use the HTML Form
at the bottom of the page for convenience.
Activating a Google Programmable Search Engine (PSE) SearchProvider
SWIRL includes an inactive Google PSE configuration that allows searching the web or a defined "slice" of it.
Google PSE is not free and requires a valid Google API key.
Create a Google Programmable Search Engine (PSE)
- Go to Google Programmable Search Engine
- Click Get Started and log in with your Google account
- Follow the steps to create a PSE and note the
cx
parameter (your Google PSE ID)
Create a Google API Key
- Visit the Google API Custom Search overview
- Follow the instructions to generate an API key
Activate the Google PSE SearchProvider
- Edit the Google PSE provider
- Change:
"active": false
to:
"active": true
Or use the HTML form at the bottom of the page.
- Update the
query_mappings
field with your Google PSE ID (cx
parameter):"query_mappings": "cx=<your-Google-PSE-id>"
- Update the
credentials
field with your Google API key, using thekey=
prefix:"credentials": "key=<your-Google-API-key>"
- Click the
PUT
button to save the changes. - Reload SWIRL Galaxy—your new source will appear in the source selector.
Copy/Paste Install
If you have a SearchProvider JSON file, you can copy and paste it into the form at the bottom of the SearchProvider endpoint.
Steps:
- Go to http://localhost:8000/swirl/searchproviders/
- Click the
Raw data
tab at the bottom of the page. - Paste the SearchProvider JSON (either a single record or a list of records).
- Click the
POST
button. - SWIRL will confirm the new SearchProvider(s).
Bulk Loading
Use the swirl_load.py
script to bulk-load SearchProviders.
Steps:
- Open a terminal and navigate to your SWIRL home directory:
cd <swirl-home>
- Run the following command:
python swirl_load.py SearchProviders/provider-name.json -u admin -p your-admin-password
- The script will load all configurations from the specified file.
- Visit http://localhost:8000/swirl/searchproviders/ to verify.
Example:
Editing a SearchProvider
To edit a SearchProvider, append its id
to the end of the /swirl/searchproviders
URL.
For example:
http://localhost:8000/swirl/searchproviders/1/
Available Actions:
- DELETE the SearchProvider permanently.
- Modify the configuration and click
PUT
to save changes.
Query Templating
Most SearchProviders require a query_template, which binds to query_mappings during the federation process.
For example, the original query_template
for the MongoDB movie SearchProvider:
"query_template": "{'$text': {'$search': '{query_string}'}}"
This format is a string, not valid JSON. The single quotes are required because the JSON itself uses double quotes.
Starting in SWIRL 3.2.0, MongoDB SearchProviders now use the query_template_json field, which stores the template as valid JSON:
"query_template_json": {
"$text": {
"$search": "{query_string}"
}
}
Organizing SearchProviders with Active, Default, and Tags
SearchProviders have three properties that control their participation in queries:
Property | Description |
---|---|
Active | true/false – If false , the SearchProvider will not receive queries, even if specified in a searchprovider_list . |
Default | true/false – If false , the SearchProvider will only be queried if explicitly listed in searchprovider_list . |
Tags | List of strings grouping providers by topic. Tags can be used in searchprovider_list , as a providers= URL parameter, or as tag:term in a query. |
Best Practices for SearchProvider Organization:
- General-purpose providers should have
"Default": true
to be included in broad searches. - Topic-specific providers should have
"Default": false
and use"Tags": ["topic1", "topic2"]
. - Users can target specific providers using a mix of Tags, SearchProvider names, or IDs.
This ensures broad searches use the best general providers, while topic-specific searches can target precise data sources.
Query Mappings
SearchProvider query_mappings
are key-value pairs that define how queries are structured for a given SearchProvider.
These mappings configure field replacements and query transformations that SWIRL's processors (such as AdaptiveQueryProcessor
) use to adapt the query format to each provider's requirements.
Available query_mappings
Options
Mapping Format | Description | Example |
---|---|---|
key = value | Replaces {key} in the query_template with value . | "query_template": "{url}?cx={cx}&key={key}&q={query_string}","query_mappings": "cx=google-pse-key" |
DATE_SORT=url-snippet | Inserts the specified string into the URL when date sorting is enabled. | "query_mappings": "DATE_SORT=sort=date" |
RELEVANCY_SORT=url-snippet | Inserts the specified string into the URL when relevancy sorting is enabled. | "query_mappings": "RELEVANCY_SORT=sort=relevancy" |
PAGE=url-snippet | Enables pagination by inserting either RESULT_INDEX (absolute result number) or RESULT_PAGE (page number). | "query_mappings": "PAGE=start=RESULT_INDEX" |
NOT=True | Indicates that the provider supports basic NOT operators. | elon musk NOT twitter |
NOT_CHAR=- | Defines a character for NOT operators. | elon musk -twitter |
Query Field Mappings
In query_mappings
, keys enclosed in braces within query_template
are replaced with mapped values.
Example Configuration
"url": "https://www.googleapis.com/customsearch/v1",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",
"query_processors": [
"AdaptiveQueryProcessor"
],
"query_mappings": "cx=0c38029ddd002c006,DATE_SORT=sort=date,PAGE=start=RESULT_INDEX",
Example Query Output
At query execution time, this configuration generates:
https://www.googleapis.com/customsearch/v1?cx=0c38029ddd002c006&q=some_query_string
Key Configuration Guidelines:
- The
url
field is specific to each SearchProvider and should contain static parameters that never change. query_mappings
allow dynamic replacements using query-time values.- The
query_string
is populated by SWIRL as described in the Developer Guide.
HTTP Request Headers
The optional http_request_headers
field allows custom HTTP headers to be sent along with a query.
For example, the GitHub SearchProvider uses this to request enhanced search snippets, which are then mapped to SWIRL's body
field:
"http_request_headers": {
"Accept": "application/vnd.github.text-match+json"
},
"result_mappings": "title=name,body=text_matches[*].fragment, ..."
This feature ensures richer, more relevant search results by enabling source-specific header configurations.
Result Processors
Each SearchProvider can define its own Result Processing pipeline. A typical configuration looks like this:
"result_processors": [
"MappingResultProcessor",
"CosineRelevancyResultProcessor"
],
Enabling Relevancy Ranking
If Relevancy Ranking is required:
- The
CosineRelevancyResultProcessor
must be the last item in theresult_processors
list. - The
CosineRelevancyPostResultProcessor
must be included in theSearch.post_result_processors
method, located inswirl/models.py
.
For more details, refer to the Relevancy Ranking Guide.
Additional ResultProcessors
SWIRL provides other ResultProcessors that may be useful in specific cases. See the Developer Guide for more details.
Authentication & Credentials
The credentials
property stores authentication information required by a SearchProvider.
Supported Authentication Formats
Key-Value Format** (Appended to URL)
Used when an API key is passed as a query parameter.
Example: Google PSE SearchProvider
"credentials": "key=your-google-api-key-here",
"query_template": "{url}?cx={cx}&key={key}&q={query_string}",
Bearer Token** (Sent in HTTP Header)
Supported by the RequestsGet
and RequestsPost
connectors.
Example: Miro SearchProvider
"credentials": "bearer=your-miro-api-token",
X-Api-Key Format** (Sent in HTTP Header)
"credentials": "X-Api-Key=<your-api-key>",
HTTP Basic/Digest/Proxy Authentication
Supported by RequestsGet
, ElasticSearch
, and OpenSearch
connectors.
Example: Solr with Auth SearchProvider
"credentials": "HTTPBasicAuth('solr-username','solr-password')",
Other Authentication Methods
For advanced authentication techniques, consult the Developer Guide.
Response Mappings
SearchProvider response_mappings
determine how each source's response is normalized into JSON.
They are processed by the Connector's normalize_response
method.
Example: Google PSE Response Mappings
"response_mappings": "FOUND=searchInformation.totalResults,RETRIEVED=queries.request[0].count,RESULTS=items",
Response Mapping Options
Mapping | JSONPath Source | Required? | Example |
---|---|---|---|
FOUND | Total number of results available for the query (default: same as RETRIEVED if not specified) | No | searchInformation.totalResults=FOUND |
RETRIEVED | Number of results returned for this query (default: length of RESULTS list) | No | queries.request[0].count=RETRIEVED |
RESULTS | Path to the list of result items | Yes | items=RESULTS |
RESULT | Path to the document (if result items are stored within a dictionary/wrapper) | No | document=RESULT |
Proper response mappings ensure consistent search results across different sources.
Result Mappings
SearchProvider result_mappings
define how JSON result sets from external sources are mapped to SWIRL's standard result schema. Each mapping follows JSONPath conventions.
Default SWIRL Fields
Field Name | Description |
---|---|
author | Author of the item (not always reliable for web content). |
body | Main content extracted from the result. |
date_published | Original publication date (not always reliable for web content). |
date_retrieved | Date and time SWIRL retrieved the result. |
title | Title of the item. |
url | URL of the result item. |
Example: Google PSE Result Mapping
"result_mappings": "url=link,body=snippet,author=displayLink,cacheId,pagemap.metatags[*].['og:type'],pagemap.metatags[*].['og:site_name'],pagemap.metatags[*].['og:description'],NO_PAYLOAD"
Here, url=link
and body=snippet
map Google PSE result fields to SWIRL result fields.
XML to JSON Conversion
The requests.py
connector automatically converts XML to JSON for mapping.
It also handles list-of-list responses, where the first list element contains field names.
Example:
[
["urlkey", "timestamp", "original", "mimetype", "statuscode"],
["today,swirl)/", "20221012214440", "http://swirl.today/", "text/html"]
]
This format is automatically converted into a structured JSON array.
Constructing URLs from Mappings
If a SearchProvider does not return full URLs, JSONPath syntax can construct them dynamically.
Example: Europe PubMed Central
"url='https://europepmc.org/article/{source}/{id}'"
Here, {source}
and {id}
are values from the JSON result, inserted into the URL dynamically.
Aggregating Field Values
To aggregate list values into a single string, use JSONPath syntax.
Example: Google PSE Metadata Aggregation
"pagemap.metatags[*].['og:type']"
This merges all og:type
values from the metadata into a single result field.
Example: ArXiv Author Aggregation
"author[*].name"
This collects all author names into a single field.
Multiple Mappings
SWIRL allows multiple source fields to map to a single SWIRL field.
"result_mappings": "body=content|description,..."
- If one field is populated, it maps to
body
. - If both fields contain data, the second field is moved to PAYLOAD as
<swirl-field>_<source_field>
.
Example Result Object:
{
"swirl_rank": 1,
"title": "What The Mid-Term Elections Mean For U.S. Energy",
"url": "https://www.forbes.com/sites/davidblackmon/2022/11/13/what-the-mid-term-elections-mean-for-us-energy/",
"body": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened...",
"payload": {
"body_description": "Leaders in U.S. domestic energy sectors should expect President Joe Biden to feel emboldened..."
}
}
Result Mapping Options
Mapping Format | Description | Example |
---|---|---|
swirl_key = source_key | Maps a field from the source provider to SWIRL. | "body=_source.email" |
swirl_key = source_key1|source_key2 | Maps multiple fields; the first populated field is mapped, others go to PAYLOAD. | "body=content|description" |
swirl_key='template {variable}' | Formats multiple values into a single string. | "'{x}: {y}'=title" |
source_key | Maps a field from the raw source result into PAYLOAD. | "cacheId, _source.products" |
sw_urlencode | URL-encodes the specified value. | "url=sw_urlencode(<hitId>)" |
sw_btcconvert | Converts Satoshi to Bitcoin. | "sw_btcconvert(<fee>)" |
NO_PAYLOAD | Disables automatic copying of all source fields to PAYLOAD. | "NO_PAYLOAD" |
FILE_SYSTEM | Treats the SearchProvider as a file system, increasing body weight in ranking. | "FILE_SYSTEM" |
LC_URL | Converts url to lowercase. | "LC_URL" |
BLOCK | Used in SWIRL's RAG processing; stores output in the info block of the result object. | "BLOCK=ai_summary" |
DATASET | Formats columnar responses into a single result. | "DATASET" |
Controlling date_published
Display
As of SWIRL 2.1, different values can be mapped to date_published
and date_published_display
.
"result_mappings": "... date_published=foo.bar.date1,date_published_display=foo.bar.date2 ..."
This results in:
"date_published": "2010-01-01 00:00:00",
"date_published_display": "c2010"
Result Schema
The JSON result schema is defined in:
Result Mixers further process and merge data from multiple sources.
PAYLOAD Field
The PAYLOAD field stores all unmapped result data from the source.
Using NO_PAYLOAD
Effectively
To exclude unnecessary fields from PAYLOAD:
- Run a test query without
NO_PAYLOAD
to inspect raw fields. - Add specific mappings for the fields you need.
- Enable
"NO_PAYLOAD"
to discard unmapped data.
SWIRL copies all source data to PAYLOAD by default unless NO_PAYLOAD
is specified.