Converters#

pyinaturalist_convert.converters

Base utilities for converting observation data to common formats.

Extra dependencies by format:

Excel: pandas, openpyxl
Feather, Parquet: pandas, pyarrow
HDF5: pandas, tables

Examples:

Get some observations:

>>> from pyinaturalist import iNatClient
>>> client = iNatClient()
>>> observations = client.observations.search(user_id='my_username').all()

Convert to multiple formats:

>>> from pyinaturalist_convert import *
>>>
>>> to_csv(observations, 'my_observations.csv')
>>> to_excel(observations, 'my_observations.xlsx')
>>> to_feather(observations, 'my_observations.feather')
>>> to_hdf(observations, 'my_observations.hdf')
>>> to_json(observations, 'my_observations.json')
>>> to_parquet(observations, 'my_observations.parquet')

Load back into Observation objects:

>>> observations = read('my_observations.csv')
>>> observations = read('my_observations.xlsx')
>>> observations = read('my_observations.feather')
>>> observations = read('my_observations.hdf')
>>> observations = read('my_observations.json')
>>> observations = read('my_observations.parquet')

Export functions:

`to_csv`	Convert observations to CSV
`to_excel`	Convert observations to an Excel spreadsheet (xlsx)
`to_feather`	Convert observations into a Feather file
`to_hdf`	Convert observations into a HDF5 file
`to_json`	Convert observations into a JSON file
`to_parquet`	Convert observations into a Parquet file

Import and helper functions:

`read`	Load observations from any of the following file formats:
`to_dataframe`	Convert observations into a pandas DataFrame
`to_dataset`	Convert observations to a generic tabular dataset.
`to_dicts`	Convert any supported input type into a observation (or other record type) dicts
`to_observations`	Convert any supported input type into Observation objects.
`to_taxa`	Convert any supported input type into Taxon objects

pyinaturalist_convert.converters.flatten_observations(observations, tabular=False, semitabular=False)#

Flatten nested dict attributes, for example {"taxon": {"id": 1}} -> {"taxon.id": 1}

Parameters:

semitabular (bool) – Accept one level of nested collections, for formats that can handle them (like parquet)
tabular (bool) – Drop all collections that can’t be flattened (for CSV)

pyinaturalist_convert.converters.read(filename)#

Load observations from any of the following file formats:

JSON
CSV (exported from pyinaturalist-convert)
CSV (exported from iNaturalist export tool)
Feather
HDF5
Parquet
Excel

Return type:: List[Observation]

pyinaturalist_convert.converters.to_csv(observations, filename=None)#: Convert observations to CSV

pyinaturalist_convert.converters.to_dataframe(observations)#: Convert observations into a pandas DataFrame

pyinaturalist_convert.converters.to_dataset(observations)#

Convert observations to a generic tabular dataset. This can be converted to any of the formats supported by tablib.

Return type:: Dataset

pyinaturalist_convert.converters.to_dicts(value)#

Convert any supported input type into a observation (or other record type) dicts

Return type:: Iterable[Dict]

pyinaturalist_convert.converters.to_excel(observations, filename)#: Convert observations to an Excel spreadsheet (xlsx)

pyinaturalist_convert.converters.to_feather(observations, filename)#: Convert observations into a Feather file

pyinaturalist_convert.converters.to_hdf(observations, filename)#: Convert observations into a HDF5 file

pyinaturalist_convert.converters.to_json(observations, filename)#: Convert observations into a JSON file

pyinaturalist_convert.converters.to_observations(value)#

Convert any supported input type into Observation objects. Input types include:

Return type:: Iterable[Observation]

pyinaturalist_convert.converters.to_parquet(observations, filename)#: Convert observations into a Parquet file

pyinaturalist_convert.converters.to_taxa(value)#

Convert any supported input type into Taxon objects

Return type:: Iterable[Taxon]

pyinaturalist_convert.converters.write(content, filename, mode='w')#: Write converted observation data to a file, creating parent dirs first