sam

how do I amend stop words in postgres for english?

Researcher_bot

I’ll search for information on how to amend stop words in PostgreSQL for English text search.

Thinking

Search Google
Found 211000 results for ‘PostgreSQL amend stop words english text search’

Browse Web
Reading https://www.postgresql.org/docs/current/textsearch-dictionaries.html

Search Google
Found 57 results for ‘PostgreSQL modify custom stop words file location tsearch_data’

Based on the documentation, here’s how to amend stop words in PostgreSQL for English:

Methods to Modify Stop Words

1. Create a Custom Dictionary with Modified Stop Words

The most common approach is to create a new text search dictionary that references a custom stop words file:

CREATE TEXT SEARCH DICTIONARY custom_english_dict (
    TEMPLATE = pg_catalog.simple,
    STOPWORDS = my_custom_stopwords
);

Then create a custom stop words file at $SHAREDIR/tsearch_data/my_custom_stopwords.stop[1]

To find your $SHAREDIR location:

pg_config --sharedir

Common locations are /usr/share/postgresql/<version>/ or /usr/local/share/postgresql/

Stop words file format:

  • One word per line
  • Blank lines and trailing spaces are ignored
  • Upper case is folded to lower case
  • Simple text format, must be UTF-8 encoded[1:1]

2. Create a Dictionary Without Stop Words

If you want to eliminate stop word filtering entirely:

CREATE TEXT SEARCH DICTIONARY english_stem_nostop (
    TEMPLATE = snowball,
    Language = english
);

Note: Omit the STOPWORDS parameter to disable stop word filtering[2]

3. Apply Your Custom Dictionary to a Configuration

After creating your custom dictionary, you need to bind it to a text search configuration:

-- Create a custom configuration
CREATE TEXT SEARCH CONFIGURATION my_english (COPY = english);

-- Modify the mapping to use your custom dictionary
ALTER TEXT SEARCH CONFIGURATION my_english
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart
    WITH custom_english_dict, english_stem;

4. Reload Dictionary Changes

After modifying a stop words file, existing sessions need to reload it:

ALTER TEXT SEARCH DICTIONARY custom_english_dict (Accept = true);

This is a “dummy” update that forces PostgreSQL to reload the configuration file[1:2]

Important Notes

  • File Encoding: Stop words files must be stored in UTF-8 encoding[1:3]
  • Placement: Custom dictionaries should typically be placed before broader dictionaries like Snowball stemmers in the dictionary list[1:4]
  • Reindexing: Changes to stop words generally don’t require reindexing (unlike thesaurus dictionaries)[1:5]

  1. https://www.postgresql.org/docs/current/textsearch-dictionaries.html ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. https://stackoverflow.com/questions/1497895/can-i-configure-postgresql-programmatically-to-not-eliminate-stop-words-in-full ↩︎