Table of Contents

Admin Guide

This document applies to all SWIRL Editions.

Configuring SWIRL AI Connect

Configuring the Environment

SWIRL uses django-environ to load important values such as hostname from a file called .env.

The file .env.dist contains expected defaults. If no .env file is created, then the install.sh script copy this file to .env prior to startup.

SECRET_KEY=your-secret-key
ALLOWED_HOSTS=localhost
PROTOCOL=http
SWIRL_EXPLAIN=True
SQL_ENGINE=django.db.backends.sqlite3
SQL_DATABASE=db.sqlite3
SQL_USER=user
SQL_PASSWORD=password
SQL_HOST=localhost
SQL_PORT=5432
MICROSOFT_CLIENT_ID=''
MICROSOFT_CLIENT_SECRET=''
MICROSOFT_REDIRECT_URI=''
OPENAI_API_KEY=

To configure a SWIRL server to listen on a particular port, hostname, via HTTPS, etc., modify the .env file and then restart SWIRL. There should never be a .env file in the SWIRL repo, and when updating SWIRL to a new version, no migration of these settings should be needed. They remain in .env.

The SWIRL_EXPLAIN item determines if SWIRL will show the explain structure or not.

The SECRET_KEY is actually a salt used by Django. SWIRL recommends generating a new one for production use.

Creating a SWIRL Super User

To start over with a new database, delete or rename the db.sqlite3 file. Then run this command:

python swirl.py setup

This will create a new, blank database. To create a Super User, run the following command:

python manage.py createsuperuser --email admin@example.com --username admin

Changing the Super User Password

If you already have an admin user, you can change the password with this command:

python manage.py changepassword admin

If you select a password that is too simple, Django will object. For more information see: django-admin and manage.py

Using the Django Admin

To use the Django Admin to change the Super User Password:

  1. Go to the SWIRL homepage. For example http://localhost:8000/swirl

  2. Click the CHANGE PASSWORD link at the top left:

Django Admin Change Password

  1. This should bring up the password change screen:

Django Admin Change Password

Don't forget to click CHANGE MY PASSWORD at the end.

Adding Normal Users

You may use the Django Admin UI to add users:

http://localhost:8000/admin/

Django Admin - Users

This isn't necessary if using OpenID Connect to provision users. Please refer to the AI Connect Guide for more information.

Permissioning Normal Users

There are four permissions – add, change, delete, and view – for each of the core SWIRL objects: SearchProviders, Search, Result, and Query Transform.

Django Admin - Permissions


The following table shows how to configure these for various scenarios:

Scenario SearchProvider Search Results Query Transform
Admin ALL ALL ALL ALL
Search ONLY NONE Add    
Result ONLY NONE NONE View  
Search & View Results NONE Add, View Add, View Add, View
Manage Search including re-run NONE ALL ALL ALL
SearchProvider Admin ALL Add View View

Object Ownership

SearchProvider, Search, Result, and Query Transform objects are owned by, and private to, the Django user who creates them.

Shared SearchProviders and Query Transformations

SWIRL supports shared SearchProviders (v. 1.7) and Query Transformations (v. 2.0). These defaults to "false" for all Users, except the Django Super User (admin) which defaults to "true". This makes it easy to add users without having to duplicate SearchProviders or Query Transformations.

For installations with a large number of users, create groups with the desired permissions first, then assign each User to the appropriate group.

Deploying SWIRL for Production Use

The SWIRL application is designed to be deployed behind a reverse-proxy. There are many reasons for this:

  • Scalability: a reverse-proxy allows for the horizontal scaling of SWIRL. By deploying multiple SWIRL application VMs behind a reverse-proxy, the incoming connection demands can be handled by the reverse-proxy then disseminated to a pool of backend SWIRL servers. If demand for SWIRL increases, additional VMs can be provisioned on-demand. These VMs can also be turned down when the traffic drops below certain thresholds to save on hosting costs.

  • Security: Offloading the SSL/TLS overhead to a dedicated public endpoint such as a reverse-proxy alleviates the CPU load on the application server that serves up SWIRL.

  • Performance: Using a reverse-proxy separates the task of serving up static content from the application server. This makes it possible to deploy SWIRL with a content delivery network (CDN) which places static content close to the end user for a faster SWIRL experience.

  • Availability: a reverse-proxy adds resiliency to any setup by spreading traffic across a backend pool of SWIRL servers. The reverse-proxy monitors the pool of available backend servers and detects and removes failed servers from a pool of available ones.

Popular reverse-proxy projects/products include HA Proxy, Nginx, Azure Application Gateway and AWS Application Load Balancer.

Contact support to discuss this topic anytime.

Upgrading SWIRL

Please contact support for instructions on upgrading Docker containers!

  1. Update the swirl-search repository:
    git pull
    
  2. Run the install script:
    ./install.sh
    
  3. Setup SWIRL:
    python swirl.py setup
    
  4. If upgrading Galaxy UI, execute:
    ./install-ui.sh
    
  5. Restart SWIRL:
    python swirl.py restart
    

Consult the release notes for more information on each release.

Configuring SWIRL

SWIRL uses the following configuration items, defined in the swirl_server/settings.py:

Configuration Item Explanation Example
CELERY_BEATS_SCHEDULE Defines the schedule for the Search Expiration Service and Search Subscriber Service See the linked sections.
SWIRL_DEFAULT_QUERY_LANGUAGE Determines which stopword dictionary is loaded SWIRL_DEFAULT_QUERY_LANGUAGE = 'english'
SWIRL_TIMEOUT The number of seconds to wait until declaring federation complete, and terminating any connectors that haven't responded SWIRL_TIMEOUT = 10
SWIRL_SUBSCRIBE_WAIT The number of seconds to wait before timing out and reporting an error when updating a search SWIRL_SUBSCRIBE_WAIT = 20
SWIRL_DEDUPE_FIELD The field to use when detecting and removing duplicates with the DedupeByFieldPostResultProcessor SWIRL_DEDUPE_FIELD = 'url'
SWIRL_DEDUPE_SIMILARITY_MINIMUM The minimum similarity score that constitutes a duplicate, when detecting and removing duplicates with the DedupeBySimilarityPostResultProcessor SWIRL_DEDUPE_SIMILARITY_MINIMUM = 0.95
SWIRL_DEDUPE_SIMILARITY_FIELDS A list of the fields to use when determining the similarity between documents when detecting and removing duplicates with the DedupeBySimilarityPostResultProcessor SWIRL_DEDUPE_SIMILARITY_FIELDS = ['title', 'body']
SWIRL_RELEVANCY_CONFIG Defines the relevancy score weights for important fields See below
SWIRL_MAX_MATCHES Configures the maximum number of matches for any given result, before being cut-off. This helps protect against favoring very long articles. SWIRL_MAX_MATCHES = 5
SWIRL_MIN_SIMILARITY Configures the minimum threshold at which a query hit in a result will be scored. Lower values will increase recall but lower precision. SWIRL_MIN_SIMILARITY = 0.54
SWIRL_EXPLAIN Configures the delivery of relevancy explain structures, in Mixed responses SWIRL_EXPLAIN = false

Example SWIRL_RELEVANCY_CONFIG

SWIRL_RELEVANCY_CONFIG = {
    'title': {
        'weight': 1.5
    },
    'body': {
        'weight': 1.0
    },
    'author': {
        'weight': 1.0
    }
}

Note that all configuration names must be UPPER_CASE per the django settings convention.

Search Expiration Service

The Expirer service can automatically delete Search and their associated (linked) Result objects after a specified period of time - to ensure SWIRL doesn't retain everything ever searched.

Service Frequency

Although you may specify various expiration settings for SWIRL Search and associated Result objects, the service to expire them runs on a regular schedule.

By default, this service runs every hour. The frequency is defined in the Django settings:

CELERY_BEAT_SCHEDULE = {
    # Executes every hour
    'expire': { 
         'task': 'expirer', 
         'schedule': crontab(minute=0,hour='*'),
        },          
}

Temporary changes can also be made via the Django Console here:

http://localhost:8000/admin/django_celery_beat/crontabschedule/ 

Django console crontab page

SWIRL AI Connect, Enterprise Edition, supports a 5 minute service expiration schedule. Please contact support for more information.

If you change the crontab entry in the database and don't change the CELERY_BEAT_SCHEDULE as well, that schedule will be restored if/when you restart SWIRL.

Search Subscriber Service

When one or more Search objects have the subscribe property set to True, SWIRL will periodically update that Search.

By default, the Subscriber service runs every four hours. The frequency is defined in the Django settings:

CELERY_BEAT_SCHEDULE = {
    # Executes every four hours
    'subscribe': { 
         'task': 'subscriber', 
         'schedule': crontab(minute=0,hour='*/4'),   # minute='*/10'
        },          
}

Temporary changes can also be made via the Django Console here:

http://localhost:8000/admin/django_celery_beat/crontabschedule/

Django console crontab page

If you change the crontab entry in the database and don't change the CELERY_BEAT_SCHEDULE as well, that schedule will be restored if/when you restart SWIRL.

Service Startup & Daemonization

Using swirl.py

For normal operations, use swirl.py to start, stop or restart services. It is located in SWIRL's install directory (along with manage.py).

  • To start services:
python swirl.py start

One or more services may be specified, e.g.:

python swirl.py start celery-beats
  • To check the status of SWIRL:
python swirl.py status

SWIRL will report the current running services, and their pids:

__S_W_I_R_L__2_._6______________________________________________________________

Service: django...RUNNING, pid:1620
Service: celery-worker...RUNNING, pid:1625

  PID TTY           TIME CMD
 1620 ttys000    0:13.26 /Users/erikspears/.pyenv/versions/3.11.5/bin/python3.11 /Users/erikspears/.pyenv/versions/3.11.5/bin/daphne -b 0.0.0.0 -p 8000 swirl_server.asgi:application
 1625 ttys000    0:40.70 /Users/erikspears/.pyenv/versions/3.11.5/bin/python3.11 /Users/erikspears/.pyenv/versions/3.11.5/bin/celery -A swirl_server worker --loglevel INFO

Command successful!
  • To terminate services:
python swirl.py stop
  • To restart services:
python swirl.py restart

One or more services may be specified, e.g.:

python swirl.py restart celery-worker consumer
  • To get help:
python swirl.py help

Customizing

The services invoked by swirl.py are defined in swirl/services.py.

Modify the list to start celery-beats automatically.

Managing Django Users

Django Admin

Most users can be managed through the Django Admin, which is located here:

http://localhost:8000/admin/

View this Youtube Video on Django Administration for more information.

To Change a User's Password

python manage.py changepassword <user_name>

Management Tools

Django Console

Django has a built-in web UI for managing users, groups, crontabs, and more.

Django console

The URL is: http://localhost:8000/admin/

View these YouTube Videos on how to use Django Admin for more information.

Django dbshell

Django has a built-in shell for managing the database. You can run it in the swirl-search directory as follows:

./manage.py dbshell

Wiping the Database

python manage.py flush

All SWIRL objects will be deleted, once you confirm.

You must create a new SWIRL Super User after doing this.

sqlite-web

The open source sqlite-web project offers a solid web GUI!

pip install sqlite-web
sqlite_web my_database.db                   # makes it run locally http://localhost:8080/
sqlite_web --host 0.0.0.0 my_database.db    # run it on the lan

Don't forget to give it the path to db.sqlite3 in swirl-search when you invoke it.

Database Migration

If you change most anything in swirl/models.py, you will have to perform a database migration. Detailing this process is beyond the scope of this document. For the most part, all you have to do is:

python swirl.py migrate

For more information: https://docs.djangoproject.com/en/4.0/topics/migrations/

  • Migration is usually simple/easy if you are just adding fields or changing defaults
  • It's a good idea to delete existing data before doing anything drastic like changing the name of an id or relationship - use sqlite-web (see above)!
  • If things go wrong:
    • Delete db.sqlite3
    • Delete all swirl/migrations/
    • Run: python manage.py flush
    • Then, repeat this process

Don't forget to create a SWIRL Super User after flushing the database!

Configuring Django

There are many values you can configure in swirl_server/settings.py.

These include:

  • Hostname
  • Protocol
# PUT the FQDN first in the list below
ALLOWED_HOSTS = ['localhost']
HOSTNAME = ALLOWED_HOSTS[0]
PROTOCOL = 'http'

The FQDN that SWIRL should listen on must be the first entry in the ALLOWED_HOST list.

  • Time Zone
TIME_ZONE = 'US/Eastern'
...
CELERY_TIMEZONE = "US/Eastern"
CELERY_TIME_ZONE = "US/Eastern"
  • Celery Beats

The configuration for Celery-Beats is also here, in case you are using the Search Expiration Service or the Search Subscription Service:

CELERY_BEAT_SCHEDULE = {
    # Executes every hour
    'expire': { 
         'task': 'expirer', 
         'schedule': crontab(minute=0,hour='*'),
        },          
}
  • Database Provider
DATABASES = {
    "default": {
        "ENGINE": os.environ.get("SQL_ENGINE", "django.db.backends.sqlite3"),
        "NAME": os.environ.get("SQL_DATABASE", BASE_DIR / "db.sqlite3"),
        "USER": os.environ.get("SQL_USER", "user"),
        "PASSWORD": os.environ.get("SQL_PASSWORD", "password"),
        "HOST": os.environ.get("SQL_HOST", "localhost"),
        "PORT": os.environ.get("SQL_PORT", "5432"),
    }
}

To configure PostgreSQL as the Django back-end:

  1. Install PostgreSQL (if not already installed)
  2. Ensure that pg_config from the PostgreSQL distribution is in the PATH and runs from the command line
  3. Install the psycopg2 driver:
    pip install psycopg2
    

    Then follow the appropriate section of Configuring Django Database Back-Ends.

  4. Uncomment the PostgreSQL Connector in the following modules:
    • swirl.connectors.__init__.py
      # uncomment this to enable PostgreSQL
      # from swirl.connectors.postgresql import PostgreSQL
      
    • swirl.models.py
      CONNECTOR_CHOICES = [
       ('ChatGPT', 'ChatGPT Query String'),
       ('RequestsGet', 'HTTP/GET returning JSON'),
       ('RequestsPost', 'HTTP/POST returning JSON'),
       ('Elastic', 'Elasticsearch Query String'),
       ('OpenSearch', 'OpenSearch Query String'),
       # Uncomment the line below to enable PostgreSQL
       # ('PostgreSQL', 'PostgreSQL'),
       ('BigQuery', 'Google BigQuery'),
       ('Sqlite3', 'Sqlite3'),
       ('M365OutlookMessages', 'M365 Outlook Messages'),
       ('M365OneDrive', 'M365 One Drive'),
       ('M365OutlookCalendar', 'M365 Outlook Calendar'),
       ('M365SharePointSites', 'M365 SharePoint Sites'),
       ('MicrosoftTeams', 'Microsoft Teams'),
      ]
      

Configuring Celery & Redis

Celery is used to execute a SWIRL metasearch. It uses Redis as a result back-end for asynchronous operation. Both of these systems must be configured correctly.

Celery is configured in at least three locations. They must be the same!

  1. swirl_server/celery.py
    app = Celery('swirl_server', 
              broker='redis://localhost:6379/0', 
              backend='redis://localhost:6379/0')
    

    If this is setup correctly, you should see the backend setting appear when you run Celery from the command line:

    > transport:   amqp://guest:**@localhost:6379//
    - ** ---------- .> results:     rpc://
    
  2. swirl_server/settings.py (Django settings):
# Celery Configuration Options
CELERY_TIMEZONE = 'US/Eastern'
CELERY_TIME_ZONE = 'US/Eastern'
CELERY_TASK_TRACK_STARTED = True
CELERY_TASK_TIME_LIMIT = 30 * 60
CELERY_BEAT_SCHEDULE = {
    # Executes every hour
    'expire': {
         'task': 'expirer',
         'schedule': crontab(minute=0,hour='*'),
        },
    # Executes every four hours
    'subscribe': {
         'task': 'subscriber',
         'schedule': crontab(minute=0,hour='*/4'),   # minute='*/10'
        },
}
CELERY_BROKER_URL = 'redis://localhost:6379/0'
# CELERY_BROKER_URL = 'amqp://guest:guest@localhost:6379//'

CELERY_RESULT_BACKEND = 'redis://localhost:6379/0'
# CELERY_RESULT_BACKEND='rpc://'

The settings.py file also contains the configuration for the Search Expiration Service and the Search Subscription Service.

Security

The Django Secret Key

In swirl_server/settings.py, there is a configuration item for a SECRET_KEY. This is not really a big deal. If you change it, active users will have to login again. That's it.

To change the one that is in the repo:

python -c "import secrets; print(secrets.token_urlsafe())"

Read more about the Django Secret Key

SWIRL User & Group Support

You can use Django's built-in authentication support, which adds User and Group objects, or implement your own. The following sections detail how to access these.

URL Explanation
/swirl/users/ List User objects; create a new one using the form at bottom
/swirl/users/id/ Retrieve a User object; destroy it using the Delete button; edit it using the form at bottom
URL Explanation
/swirl/groups/ List Group objects; create a new one using the form at bottom
/swirl/groups/id/ Retrieve a Group object; destroy it using the Delete button; edit it using the form at bottom

You can also edit these tables using the Django Console:

Django console user object

For more information, see: User authentication in Django