Taxonomy

pyinaturalist_convert.taxonomy

Helper utilities for navigating tabular taxonomy data as a tree and adding additional derived information to it.

Extra dependencies:
  • polars

  • sqlalchemy

Example:

>>> from pyinaturalist_convert import load_dwca_tables, aggregate_taxon_db
>>> load_dwca_tables()
>>> aggregate_taxon_db()

Main functions:

aggregate_taxon_db

Add aggregate and hierarchical values to the taxon database:

get_observation_taxon_counts

Get taxon counts based on GBIF export (exact rank counts only, no aggregate counts)

class pyinaturalist_convert.taxonomy.LoggerProgress

Bases: object

Base class for progress display. Just logs messages to a logger, with placeholders for progress bars.

advance(name, amount=1)
log(message)
start(total)
start_task(name, total, description='')
stop()
class pyinaturalist_convert.taxonomy.RichProgress

Bases: LoggerProgress

Container for multiprocessing queues used for progress reporting.

advance(name, amount=1)

Advance progress for a task.

log(message)

Send a log message to the progress display.

start(total=1)

Start the progress display process.

start_task(name, total, description='')

Register a new task with the progress display.

stop()

Stop the progress display process.

pyinaturalist_convert.taxonomy.aggregate_taxon_db(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), backup_path=PosixPath('/home/docs/.local/share/pyinaturalist/taxon_aggregates.parquet'), common_names_path=PosixPath('/home/docs/.local/share/pyinaturalist/inaturalist-taxonomy.dwca/VernacularNames-english.csv'), max_workers=None, progress_bars=True)

Add aggregate and hierarchical values to the taxon database:

  • Ancestor IDs

  • Child IDs

  • Iconic taxon ID

  • Aggregated observation taxon counts

  • Aggregated leaf taxon counts

  • Common names

Requires GBIF datasets to be downloaded and processed first.

Parameters:
  • db_path (Path | str) – Path to SQLite database

  • backup_path (Path | str) – Path to save a minimal copy of aggregate values

  • common_names_path (Path | str) – Path to a CSV file containing taxon common names. See the DwC-A taxonomy dataset for available languages.

  • max_workers (Optional[int]) – Max worker processes for parallel aggregation (None = cpu_count)

  • progress_bars (bool) – Show detailed progress bars in addition to log output

Return type:

DataFrame

pyinaturalist_convert.taxonomy.get_observation_taxon_counts(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))

Get taxon counts based on GBIF export (exact rank counts only, no aggregate counts)

Return type:

dict[int, int]

pyinaturalist_convert.taxonomy.update_taxon_agg(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), agg_path=PosixPath('/home/docs/.local/share/pyinaturalist/taxon_aggregates.parquet'))

Update an existing taxon database with new aggregate values

Return type:

DataFrame