DwC-A¶
pyinaturalist_convert.dwca
Download and convert the iNaturalist GBIF and taxonomy datasets from DwC-A to SQLite.
Extra dependencies: sqlalchemy
Example: Download everything and load into a SQLite database:
>>> from pyinaturalist_convert import load_dwca_tables
>>> load_dwca_tables()
Note
By default, data is saved in the recommended platform-specific data directory, for example
~\AppData\Local\ on Windows, or ~/.local/share/ on Linux. Use the db_path
argument to use a different location.
Note
As of 2026, this process will require about 200GB of free disk space while loading, and the final database will be about 37GB.
Main functions:
Download observation and taxonomy archives and load into a SQLite database. |
|
Create or update an observations SQLite table from the GBIF DwC-A archive. |
|
Create or update a taxonomy SQLite table from the GBIF DwC-A archive |
- pyinaturalist_convert.dwca.download_dwca_observations(dest_dir=PosixPath('/home/docs/.local/share/pyinaturalist'))¶
Download and extract the DwC-A research-grade observations dataset. Reuses local data if it already exists and is up to date.
Example to load into a SQLite database (using the sqlite3 shell, from bash):
export DATA_DIR="$HOME/.local/share/pyinaturalist" sqlite3 -csv $DATA_DIR/observations.db ".import $DATA_DIR/gbif-observations-dwca/observations.csv observations"
- pyinaturalist_convert.dwca.download_dwca_taxa(dest_dir=PosixPath('/home/docs/.local/share/pyinaturalist'))¶
Download and extract the DwC-A taxonomy dataset. Reuses local data if it already exists and is up to date.
- pyinaturalist_convert.dwca.load_dwca_observations(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/gbif-observations-dwca/observations.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), progress=None)¶
Create or update an observations SQLite table from the GBIF DwC-A archive. This keeps only the most relevant subset of columns available in the archive, in a format consistent with API results and other sources.
To load everything as-is, see
load_full_dwca_observations().
- pyinaturalist_convert.dwca.load_dwca_tables(db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))¶
Download observation and taxonomy archives and load into a SQLite database.
- pyinaturalist_convert.dwca.load_dwca_taxa(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/inaturalist-taxonomy.dwca/taxa.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'), column_map={'id': 'id', 'parentNameUsageID': 'parent_id', 'references': 'reference_url', 'scientificName': 'name', 'taxonRank': 'rank'}, progress=None)¶
Create or update a taxonomy SQLite table from the GBIF DwC-A archive
- pyinaturalist_convert.dwca.load_full_dwca_observations(csv_path=PosixPath('/home/docs/.local/share/pyinaturalist/gbif-observations-dwca/observations.csv'), db_path=PosixPath('/home/docs/.local/share/pyinaturalist/observations.db'))¶
Create an observations SQLite table from the GBIF DwC-A archive, using all columns exactly as they appear in the archive.
This requires the
sqlite3executable to be installed on the system, since its.importcommand is by far the fastest way to load this.