NEWS
taxodist 0.5.0 (2026-06-02)
New features
taxobase: Added a built-in reference dataset containing 50 taxonomic clades.
The dataset includes pre-computed distance matrices, lineage paths, search
outputs, and coverage vectors. It acts as a complete offline fallback, allowing
package examples, vignettes, and unit tests to run instantly without requiring
a network connection to The Taxonomicon.
Bug fixes
- Fixed a bug where any taxon whose name contains the substring "planet" (e.g. Periplaneta) was incorrectly discarded by the astronomical homonym filter in
get_taxonomicon_id() and taxo_search(). The filter now uses word boundaries (\bplanet\b) so that genus names containing "planet" as an infix are no longer mistakenly treated as non-biological entries.
- Fixed a lineage parsing bug affecting genera whose Taxonomicon page does not link back to the genus itself (e.g. Periplaneta, which only links to its constituent species). The lineage is now correctly truncated before child species entries, and the genus name is appended if absent.
- Added
"Epifamily" to the bare rank token filter so that unprefixed rank labels are not retained as spurious lineage nodes.
- Fixed a lineage parsing bug where the superscript type-marker
ᵀ (U+1D40) immediately following a taxon name prevented word-boundary matching in the cutoff search, causing the lineage to overshoot the target node and incorrectly include descendant taxa as ancestors. This silently affected any type genus or type species (i.e. the majority of genera), and was particularly visible in deep clades such as Dinosauria where several child clades were being parsed as ancestor nodes. Higher-rank taxa (clades, orders, families — anything above genus level) were disproportionately affected.
- Fixed a lineage parsing bug where navigation header text prepended to the
#divPageContent block could contain the target taxon name (e.g. in a cited reference title), causing the lineage cutoff to land in the junk header rather than the actual taxonomic tree. Affected taxa received a garbage two- or three-line lineage and were silently discarded by the "Biota" membership check in get_taxonomicon_id(). Drosophila (the fly genus, ID 28940) was one such case.
taxodist 0.4.0 (2026-05-25)
New features
cache_info(): inspects the current session cache, reporting the number of
cached lineages and IDs, total memory used, and the names of all taxa whose
lineages are stored. Returns a named list invisibly for programmatic use.
focal_distances(): computes taxonomic distances from a single focal taxon
to all members of a community vector. Returns a sorted data frame with
columns taxon, distance, mrca, and mrca_depth, making it easy to
identify the closest and most distant relatives of a focal species in a
community context.
Bug fixes
- Fixed a lineage parsing bug where taxa with
Subphylum and Infraphylum
rank prefixes (e.g., Craniata, Vertebrata) were being incorrectly discarded.
The dagger symbol (†) is now stripped before rank prefixes are removed,
ensuring that prefixed names like Subphylum † Craniata are correctly
parsed to Craniata rather than falling through as bare rank tokens.
Subphylum and Infraphylum have been removed from bare_ranks accordingly.
- Fixed parsing of
auct. author annotations (e.g., Subphylum Craniata auct.)
that were preventing rank-prefixed names from being correctly cleaned.
- Fixed intermediate clades being silently dropped from lineages when their
author strings contained ampersands, hyphens (e.g., Cavalier-Smith), or
trailing commas without a year (e.g.,
Romeriida Gauthier,). The author
removal pipeline now handles these patterns correctly.
- Fixed a bug where the dagger extinction marker (
†) appearing between a
rank prefix and a taxon name (e.g., Family † Dromaeosauridae) caused the
taxon to be lost entirely. Dagger removal now precedes all other cleaning
steps.
taxodist 0.3.0 (2026-05-05)
New features
taxo_search(): Added a new interactive search function that queries The Taxonomicon and returns a tidy data frame of all available IDs, ranks, and authors for a given taxon name.
- Direct ID Support:
get_lineage() and all distance functions (e.g., taxo_distance(), distance_matrix()) now accept Taxonomicon numeric IDs. This provides a fallback for computing distances when homonyms or historical ranks cannot be disambiguated by name alone.
Network Handling
- Added a 30-second timeout to all HTTP requests to prevent the R session from hanging indefinitely when The Taxonomicon servers are overloaded.
- Network failures, offline servers, or bad HTTP statuses now immediately emit a clear, informative warning to the user rather than failing silently.
Bug fixes
- Extended rank-prefix filter to remove additional bare rank tokens that were
being retained as spurious lineage nodes:
Subgenus, Section, Division,
Subdivision, Supercohort, Infracohort, Subsection, Candidatus,
Parvphylum, Branch, and Go to (a navigation artefact from The
Taxonomicon page layout).
- Duplicate lineage nodes originating from data-quality issues in The
Taxonomicon (e.g.
Uropygi appearing twice in Thelyphonida, Myxomycetes
appearing twice in Physarum) are now collapsed to a single occurrence.
A warning is emitted when deduplication occurs so the user is aware of the
upstream data issue.
get_taxonomicon_id() now collects all biological matches for a taxon name
before returning, and emits a warning when multiple valid biological entries
are found in The Taxonomicon. This surfaces homonym ambiguity (e.g. Nereis
matching both a polychaete worm and a butterfly genus) that was previously
silent.
get_taxonomicon_id() now properly follows taxonomic redirects and removes interface noise (N|T|P...) from warnings.
- When multiple biological entries are found (e.g.
Bacteria), the warning now lists all available numeric IDs without duplicates, allowing users to make an informed choice.
- Made lineage parsing more robust by strictly truncating any philosophical or pre-basal nodes (e.g.
organisms) appearing before Biota.
- Added validation to
get_lineage_by_id() to silently return NULL for non-numeric strings, preventing upstream server fallback errors.
- Fixed a lineage parsing issue where the parent
Genus was erroneously truncated from species-level queries. Distances and Most Recent Common Ancestors (MRCAs) between congeneric species are now computed accurately.
- Fixed an issue where taxonomic author names enclosed in double quotes (e.g.,
"Redtenbacher, 1906") were evading the lineage cleaning pipeline and being erroneously retained as part of the clade name.
taxodist 0.2.0 (2026-04-21)
New functions
taxo_cluster(): hierarchical clustering of taxa by taxonomic distance.
taxo_ordinate(): ordination (PCoA) of taxa in taxonomic distance space.
save_cache(): serialises the session lineage cache to an .rds file.
load_cache(): restores a previously saved cache, avoiding redundant network requests.
taxo_path(): returns the full node-by-node path between two taxa as a
tidy data frame, ascending from taxon A to their MRCA and descending to
taxon B.
Documentation
- Added a new vignette (
Statistical Applications of taxodist) demonstrating
the integration of taxodist with ape (tree plotting) and vegan
(taxonomic distinctness, Mantel tests, and PERMANOVA).
Minor improvements
- Added
vegan to Suggests in DESCRIPTION.
- Documented the compatibility of
distance_matrix() output with vegan
functions (taxondive(), mantel(), adonis2()).
- Documented the conversion of
taxo_cluster() results into phylo objects
using ape::as.phylo().
Bug fixes
- Fixed incorrect MRCA computation caused by unnamed crown nodes (
[crown])
being matched across different lineages as if they were the same ancestor.
- Fixed parsing of Taxonomicon rank prefixes (
Cohort, Subcohort,
Magnorder, Grandorder, Parvorder, Legion) that were being retained
as bare rank names instead of the actual clade names (e.g., Placentalia,
Boreoeutheria, Galloanserae).
taxodist 0.1.0 (2026-03-23)
- Initial release
- Compute taxonomic distances between any two taxa using The Taxonomicon
- Session-level caching for lineage data to minimize network requests
- Functions:
taxo_distance(), mrca(), distance_matrix(), closest_relative(),
compare_lineages(), shared_clades(), is_member(), filter_clade(),
lineage_depth(), check_coverage(), clear_cache()