RT 4.2.17 Documentation

Full text indexing

Go to latest version →

NAME

Full text indexing in RT

LIMITATIONS

While all of the below solutions can search for Unicode characters, they are not otherwise Unicode aware, and do no case folding, normalization, or the like. That is, a string that contains U+0065 LATIN SMALL LETTER E followed by U+0301 COMBINING ACUTE ACCENT will not match a search for U+00E9 LATIN SMALL LETTER E WITH ACUTE. They also only know how to tokenize latin-1-ish languages where words are separated by whitespace or similar characters; as such, support for searching for Japanese and Chinese content is extremely limited.

POSTGRES

Creating and configuring the index

Postgres 8.3 and above support full-text searching natively; to set up the required ts_vector column, and create either a GiN or GiST index on it, run:

    /opt/rt4/sbin/rt-setup-fulltext-index

If you have a non-standard database administrator username or password, you may need to pass the --dba or --dba-password options:

    /opt/rt4/sbin/rt-setup-fulltext-index --dba postgres --dba-password secret

This will then tokenize and index all existing attachments in your database; it may take quite a while if your database already has a large number of tickets in it.

Finally, it will output an appropriate %FullTextSearch configuration to add to your RT_SiteConfig.pm; you will need to restart your webserver after making these changes.

Updating the index

To keep the index up-to-date, you will need to run:

    /opt/rt4/sbin/rt-fulltext-indexer

...at regular intervals. By default, this will only tokenize up to 200 tickets at a time; you can adjust this upwards by passing --limit 500. Larger batch sizes will take longer and consume more memory.

If there is already an instances of rt-fulltext-indexer running, new ones will exit abnormally (with exit code 1) and the error message "rt-fulltext-indexer is already running." You can suppress this message and end those processes normally (with exit code 0) using the --quiet option; this is particularly useful when running the command via cron:

    /opt/rt4/sbin/rt-fulltext-indexer --quiet

MYSQL

On MySQL, full-text search can either be done using native support (which may use MyISAM tables on pre-5.6 versions of MySQL), or RT can integrate with the external Sphinx full-text search engine.

Native MySQL

As RT marks attachment data as BINARY, MySQL cannot index this content without creating an additional table. To create the required table (which is InnoDB on versions of MySQL which support it), run:

    /opt/rt4/sbin/rt-setup-fulltext-index

If you have a non-standard database administrator username or password, you may need to pass the --dba or --dba-password options:

    /opt/rt4/sbin/rt-setup-fulltext-index --dba root --dba-password secret

This will then tokenize and index all existing attachments in your database; it may take quite a while if your database already has a large number of tickets in it.

Finally, it will output an appropriate %FullTextSearch configuration to add to your RT_SiteConfig.pm; you will need to restart your webserver after making these changes.

Updating the index

To keep the index up-to-date, you will need to run:

    /opt/rt4/sbin/rt-fulltext-indexer

...at regular intervals. By default, this will only tokenize up to 200 tickets at a time; you can adjust this upwards by passing --limit 500. Larger batch sizes will take longer and consume more memory.

If there is already an instances of rt-fulltext-indexer running, new ones will exit abnormally (with exit code 1) and the error message "rt-fulltext-indexer is already running." You can suppress this message and end those processes normally (with exit code 0) using the --quiet option; this is particularly useful when running the command via cron:

    /opt/rt4/sbin/rt-fulltext-indexer --quiet

Caveats

Searching is done in "boolean mode." As such, the TicketSQL query Content LIKE 'winter 2014' will return tickets with transactions that contain either word. To find transactions which contain both (but not necessarily adjacent), use Content LIKE '+winter +2014'. To find transactions containing the precise phrase, use Content LIKE '"winter 2014".

See the mysql documentation, at http://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html, for a list of the full capabilities.

MySQL with Sphinx

RT can also integrate with the external Sphinx engine, available from http://sphinxsearch.com. Unfortunately, Sphinx integration (using SphinxSE) does require that you recompile MySQL from source. Most distribution-provided packages for MySQL do not include SphinxSE integration, merely the external Sphinx tools; these are not sufficient for RT's needs.

Compiling MySQL and SphinxSE

MySQL 5.1 supports adding pluggable storage engines; after compiling against the appropriate version of MySQL, the ha_sphinx.so file is the only that needs to be installed in production, generally into /usr/lib/mysql/plugin/. It can then be enabled via:

    INSTALL PLUGIN Sphinx SONAME "ha_sphinx.so"

Sphinx versions 0.9.x and 2.0.x are known-working versions, but later versions may work as well. Complete compilation and installation instructions for MySQL with SphinxSE can be found at http://sphinxsearch.com/docs/current.html#sphinxse-mysql51.

Creating and configuring the index

Once MySQL has been recompiled with SphinxSE, and Sphinx itself is installed, you may create the required SphinxSE communication table via:

    /opt/rt4/sbin/rt-setup-fulltext-index

If you have a non-standard database administrator username or password, you may need to pass the --dba or --dba-password options:

    /opt/rt4/sbin/rt-setup-fulltext-index --dba root --dba-password secret

This will also provide you with the appropriate %FullTextSearch configuration to add to your RT_SiteConfig.pm; you will need to restart your webserver after making these changes. It will also print a sample Sphinx configuration, which should be placed in /etc/sphinx.conf, or equivalent.

To fill the index, you will need to run the indexer command-line tool provided by Sphinx:

    indexer rt

Finally, start the Sphinx search daemon:

    searchd

Updating the index

To keep the index up-to-date, you will need to run:

    indexer rt --rotate

...at regular intervals in order to pick up new and updated attachments from RT's database. Failure to do so will result in stale data.

Caveats

RT's integration with Sphinx relies on the use of a special index; there exist queries where the MySQL optimizer elects to not use that index, instead electing to scan the table, which causes no results to be returned. However, this is rare, and generally only occurs on complex queries.

Sphinx also only returns a finite number of matches to any query; this number is controlled by max_matches in /etc/sphinx.conf and %FullTextSearch's MaxMatches in RT_SiteConfig.pm, which must be kept in sync. The default, set during rt-setup-fulltext-index, is 10000. This limit may lead to false negatives in search results if the maximum number of matches is reached but the results returned do not match RT's other criteria. However, a too-large value will notably degrade performance, as it adds memory allocation overhead to every query.

Take, for example, the instance where Sphinx is configured to return a maximum of three results, and tickets 1, 2, 3, 4, and 5 contain the string "target", but only ticket 5 is in status "Open". A search for Content LIKE 'target' AND Status = 'Open' may return no results, despite ticket 5 matching those criteria, as Sphinx will only return tickets 1, 2, and 3 as possible matches.

After index creation, altering MaxMatches in RT_SiteConfig.pm is insufficient to adjust this limit; both max_matches in /etc/sphinx.conf and %FullTextSearch's MaxMatches in RT_SiteConfig.pm must be updated.

ORACLE

Creating and configuring the index

Oracle supports full-text indexing natively using the Oracle Text package. Once Oracle Text is installed and configured, run:

    /opt/rt4/sbin/rt-setup-fulltext-index

If you have a non-standard database administrator username or password, you may need to pass the --dba or --dba-password options:

    /opt/rt4/sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret

This will create an Oracle CONTEXT index on the Content column in the Attachments table, as well as several preferences, functions and triggers to support this index. The script will also output an appropriate %FullTextSearch configuration to add to your RT_SiteConfig.

Updating the index

To update the index, you will need to run the following at regular intervals:

    /opt/rt4/sbin/rt-fulltext-indexer

This, in effect, simply runs:

    begin
    ctx_ddl.sync_index('rt_fts_index', '2M');
    end;

The amount of memory used for the sync can be controlled with the --memory option:

    /opt/rt4/sbin/rt-fulltext-indexer --memory 10M

If there is already an instance of rt-fulltext-indexer running, new ones will exit abnormally (with exit code 1) and the error message "rt-fulltext-indexer is already running." You can suppress this message and end those processes normally (with exit code 0) using the --quiet option; this is particularly useful when running the command via cron:

    /opt/rt4/sbin/rt-fulltext-indexer --quiet

Instead of being run via cron, this may instead be run via a DBMS_JOB; read the Managing DML Operations for a CONTEXT Index chapter of Oracle's Text Application Developer's Guide for details how to keep the index optimized, perform garbage collection, and other tasks.

UNINDEXED SEARCH

It is also possible to enable full-text search without database indexing support, simply by setting the Enable key of the %FullTextSearch option to 1, while leaving Indexed set to 0:

    Set(%FullTextSearch,
        Enable  => 1,
        Indexed => 0,
    );

This is not generally suggested, as unindexed full-text searching can cause severe performance problems.

LIMIT ATTACHMENT SIZE

On some systems, very large attachments can cause memory and other performance issues for the indexer making it unable to complete indexing. See "$MaxFulltextAttachmentSize" in RT_Config.pm for details on setting a maximum attachment size to index.

← Back to index