Taxonomy#

pyinaturalist_convert.taxonomy

Helper utilities for navigating tabular taxonomy data as a tree and adding additional derived information to it.

Extra dependencies:
  • pandas

  • sqlalchemy

Example:

>>> from pyinaturalist_convert import load_dwca_tables, aggregate_taxon_db
>>> load_dwca_tables()
>>> aggregate_taxon_db()

Main functions:

aggregate_taxon_db

Add aggregate and hierarchical values to the taxon database:

get_observation_taxon_counts

Get taxon counts based on GBIF export (exact rank counts only, no aggregage counts)

pyinaturalist_convert.taxonomy.aggregate_taxon_db(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), counts_path=PosixPath('/home/docs/.local/share/pyinaturalist/taxon_counts.parquet'), common_names_path=PosixPath('/home/docs/.local/share/pyinaturalist/inaturalist-taxonomy.dwca/VernacularNames-english.csv'), progress_bars=True)#

Add aggregate and hierarchical values to the taxon database:

  • Ancestor IDs

  • Child IDs

  • Iconic taxon ID

  • Aggregated observation taxon counts

  • Aggregated leaf taxon counts

  • Common names

Requires GBIF datasets to be downloaded and processed first.

Parameters:
  • db_path (Union[Path, str]) – Path to SQLite database

  • counts_path (Union[Path, str]) – Path to optionally save a copy of observation taxon counts

  • common_names_path (Union[Path, str]) – Path to a CSV file containing taxon common names. See the DwC-A taxonomy dataset for available languages.

  • progress_bars (bool) – Show detailed progress bars in addition to log output

Return type:

DataFrame

pyinaturalist_convert.taxonomy.get_observation_taxon_counts(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Get taxon counts based on GBIF export (exact rank counts only, no aggregage counts)

Return type:

Dict[int, int]