DwC-A#

pyinaturalist_convert.dwca

Download and convert the iNaturalist GBIF and taxonomy datasets from DwC-A to SQLite.

Extra dependencies: sqlalchemy

Example: Download everything and load into a SQLite database:

>>> from pyinaturalist_convert import load_dwca_tables
>>> load_dwca_tables()

Note

By default, data is saved in the recommended platform-specific data directory, for example ~\AppData\Local\ on Windows, or ~/.local/share/ on Linux. Use the db_path argument to use a different location.

Main functions:

load_dwca_tables

Download observation and taxonomy archives and load into a SQLite database.

load_dwca_observations

Create or update an observations SQLite table from the GBIF DwC-A archive.

load_dwca_taxa

Create or update a taxonomy SQLite table from the GBIF DwC-A archive

pyinaturalist_convert.dwca.download_dwca_observations(dest_dir=PosixPath('/home/docs/.local/share/pyinaturalist'))#

Download and extract the DwC-A research-grade observations dataset. Reuses local data if it already exists and is up to date.

Example to load into a SQLite database (using the sqlite3 shell, from bash):

export DATA_DIR="$HOME/.local/share/pyinaturalist"
sqlite3 -csv $DATA_DIR/observations.db ".import $DATA_DIR/gbif-observations-dwca/observations.csv observations"
Parameters:

dest_dir (Union[Path, str]) – Alternative download directory

pyinaturalist_convert.dwca.download_dwca_taxa(dest_dir=PosixPath('/home/docs/.local/share/pyinaturalist'))#

Download and extract the DwC-A taxonomy dataset. Reuses local data if it already exists and is up to date.

Parameters:

dest_dir (Union[Path, str]) – Alternative download directory

pyinaturalist_convert.dwca.load_dwca_observations(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/gbif-observations-dwca/observations.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), progress=None)#

Create or update an observations SQLite table from the GBIF DwC-A archive. This keeps only the most relevant subset of columns available in the archive, in a format consistent with API results and other sources.

To load everything as-is, see load_full_dwca_observations().

pyinaturalist_convert.dwca.load_dwca_tables(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Download observation and taxonomy archives and load into a SQLite database.

As of 2022-05, this will require about 42GB of free disk space while loading, and the final database will be around 8GB.

Parameters:

db_path (Union[Path, str]) – Path to SQLite database

pyinaturalist_convert.dwca.load_dwca_taxa(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/inaturalist-taxonomy.dwca/taxa.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), column_map={'id': 'id', 'parentNameUsageID': 'parent_id', 'references': 'reference_url', 'scientificName': 'name', 'taxonRank': 'rank'}, progress=None)#

Create or update a taxonomy SQLite table from the GBIF DwC-A archive

pyinaturalist_convert.dwca.load_full_dwca_observations(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/gbif-observations-dwca/observations.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))#

Create an observations SQLite table from the GBIF DwC-A archive, using all columns exactly as they appear in the archive.

This requires the sqlite3 executable to be installed on the system, since its .import command is by far the fastest way to load this.