FTS#

pyinaturalist_convert.fts

Build and search a full text search tables for taxa and observations using FTS5.

Extra dependencies: sqlalchemy (only for building the database; not required for searches)

Taxon Autocomplete#

TaxonAutocompleter works similarly to the API endpoint get_taxa_autocomplete(), which powers the taxon autocomplete feature on inaturalist.org:

../_images/inat-taxon-autocomplete.png

Build database with all taxa from GBIF archive:

>>> from pyinaturalist_convert import (
...     aggregate_taxon_db, enable_logging, load_dwca_tables, load_fts_taxa
... )

>>> # Optional, but recommended:
>>> enable_logging()
>>> load_dwca_tables()
>>> aggregate_taxon_db()

>>> # Load FTS table for all languages (Defaults to English names only):
>>> load_fts_taxa(language='all')

Note

Running aggregate_taxon_db() will result in more accurate search rankings based on taxon counts, but will take a couple hours to complete.

Search taxa:

>>> from pyinaturalist_convert import TaxonAutocompleter

>>> ta = TaxonAutocompleter()

>>> # Search by scientific name
>>> ta.search('aves')
[
    Taxon(id=3, name='Aves'),
    Taxon(id=1043988, name='Avesicaria'),
    ...,
]

>>> # Or by common name
>>> ta.search('frill')
[
    Taxon(id=56447, name='Acid Frillwort'),
    Taxon(id=614339, name='Antilles Frillfin'),
    ...,
]

>>> # Or by common name in a specific language
>>> ta.search('flughund', language='german')

Observation Autocomplete#

ObservationAutocompleter adds additional observation search features not available in the web UI.

Query all of your own observations:

>>> from pyinaturalist import iNatClient

>>> client = iNatClient()
>>> observations = client.observations.search(user_id='my_username').all()

Create table and index observations:

>>> from pyinaturalist_convert import create_observation_fts_table, index_observation_text

>>> create_observation_fts_table()
>>> index_observation_text(observations)

Search observations:

>>> from pyinaturalist_convert import ObservationAutocompleter

>>> obs_autocompleter = ObservationAutocompleter
>>> obs_autocompleter.search('test')
[
    (12345, 'test description text'),
    (67890, 'test comment text'),
]

Main classes and functions:

ObservationAutocompleter

Observation autocomplete search.

TaxonAutocompleter

Taxon autocomplete search.

TextField

create_taxon_fts_table

Create a SQLite FTS5 table for taxonomic names

create_observation_fts_table

Create a SQLite FTS5 table for observation text

index_observation_text

Index observation text in FTS table: descriptions, places, comments, and identification comments.

load_fts_taxa

Create full text search tables for taxonomic names.

class pyinaturalist_convert.fts.ObservationAutocompleter(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), limit=10, truncate_match_chars=50)#

Bases: object

Observation autocomplete search. Runs full text search on observation descriptions, comments, identification comments, and place names.

Parameters:
  • db_path (Union[Path, str]) – Path to SQLite database; uses platform-specific data directory by default

  • limit (int) – Maximum number of results to return per query. Set to -1 to disable.

  • truncate_match_chars (int) – Truncate matched text to this many characters. Set to -1 to disable.

search(q, fields=None)#

Search for observations by text.

Parameters:
  • q (str) – Search query

  • fields (Optional[List[TextField]]) – Specific text fields to search (description, comment, identification, and/or place). If not specified, all fieldswill be searched.

Return type:

List[Tuple[int, str]]

Returns:

Tuples of (observation_id, truncated_matched_text)

class pyinaturalist_convert.fts.TaxonAutocompleter(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), limit=10)#

Bases: object

Taxon autocomplete search. Runs full text search on taxon scientific and common names.

Parameters:
  • db_path (Union[Path, str]) – Path to SQLite database; uses platform-specific data directory by default

  • limit (int) – Maximum number of results to return per query. Set to -1 to disable.

search(q, language='en')#

Search for taxa by scientific and/or common name.

Parameters:
  • q (str) – Search query

  • language (str) – Language code for common names

Return type:

List[Taxon]

Returns:

Taxon objects (with ID and name only)

class pyinaturalist_convert.fts.TextField(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)#

Bases: Enum

COMMENT = 2#
DESCRIPTION = 1#
IDENTIFICATION = 3#
PLACE = 4#
pyinaturalist_convert.fts.create_observation_fts_table(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Create a SQLite FTS5 table for observation text

Parameters:

db_path (Union[Path, str]) – Path to SQLite database

pyinaturalist_convert.fts.create_taxon_fts_table(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Create a SQLite FTS5 table for taxonomic names

Parameters:

db_path (Union[Path, str]) – Path to SQLite database

pyinaturalist_convert.fts.index_observation_text(observations, db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Index observation text in FTS table: descriptions, places, comments, and identification comments. Replaces any previously indexed text associated with these observations.

Parameters:
pyinaturalist_convert.fts.load_fts_taxa(csv_dir=PosixPath('/home/docs/.local/share/pyinaturalist/inaturalist-taxonomy.dwca'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), counts_path=PosixPath('/home/docs/.local/share/pyinaturalist/taxon_counts.parquet'), languages=('english',))#

Create full text search tables for taxonomic names. Requires SQLite FTS5 extension and the iNat taxonomy DwC-A archive.

Parameters:
  • csv_dir (Union[Path, str]) – Directory containing extracted CSV files

  • db_path (Union[Path, str]) – Path to SQLite database

  • counts_path (Union[Path, str]) – Path to previously calculated taxon counts (from aggregate_taxon_db())

  • languages (Iterable[str]) – List of common name languages to load, or ‘all’ to load everything

pyinaturalist_convert.fts.optimize_fts_table(table, db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Some final cleanup after loading a text search table