Grouping Strategies
This page provides detailed documentation for each molecular structure grouping strategy available in chemsmart.
RMSD-based Strategies
rmsd (Basic RMSD)
Simple Kabsch RMSD alignment without atom reordering.
chemsmart run grouper -f conformers.xyz rmsd
Default threshold: 0.5 Å
This is the fastest RMSD method but assumes atoms are already in the same order. Best for structures from the same source with consistent atom ordering.
hrmsd (Hungarian RMSD)
RMSD with optimal atom mapping using the Hungarian algorithm (J. Chem. Inf. Model. 2014, 54, 518).
chemsmart run grouper -f conformers.xyz hrmsd
Default threshold: 0.5 Å
Finds the optimal one-to-one mapping between atoms of the same element type that minimizes RMSD. Useful when atom ordering may differ.
spyrmsd
RMSD calculation using the spyrmsd library with symmetry handling (J. Cheminformatics 2020, 12, 49).
chemsmart run grouper -f conformers.xyz spyrmsd
Default threshold: 0.5 Å
Handles molecular symmetry and equivalent atom permutations using graph isomorphism.
irmsd (Invariant RMSD)
Invariant RMSD that considers molecular symmetry and equivalent atom permutations (J. Chem. Inf. Model. 2025, 65, 4501).
chemsmart run grouper -f conformers.xyz irmsd
chemsmart run grouper -f conformers.xyz -T 0.3 irmsd --inversion on
Default threshold: 0.125 Å
Strategy-specific Options
Option |
Type |
Description |
|---|---|---|
|
choice |
Coordinate inversion checking: auto (default), on, off |
Note: Requires the external irmsd package installed in a separate conda environment with numpy>=2.0. See
installation instructions in the README.
pymolrmsd
PyMOL-based RMSD alignment using the align command (Schrödinger, L. The PyMOL Molecular Graphics Development
Component, Version 1.8; 2015).
chemsmart run grouper -f conformers.xyz pymolrmsd
Default threshold: 0.5 Å
Uses PyMOL’s powerful alignment algorithm. Note that PyMOL only supports single-threaded operation (-np 1).
Fingerprint-based Strategies
tfd (Torsion Fingerprint Deviation)
Compare conformers based on their torsion angle fingerprints (J. Chem. Inf. Model. 2012, 52, 1499).
chemsmart run grouper -f conformers.xyz tfd
chemsmart run grouper -f conformers.xyz -T 0.2 tfd --no-use-weights
chemsmart run grouper -f conformers.xyz tfd --max-dev spec
Default threshold: 0.1
Particularly effective for flexible molecules where conformational differences are primarily due to bond rotations.
Strategy-specific Options
Option |
Type |
Description |
|---|---|---|
|
bool |
Use torsion weights in TFD calculation (default: True) |
|
choice |
Normalization method: equal (default) or spec |
tanimoto
Group structures using Tanimoto similarity with molecular fingerprints.
chemsmart run grouper -f conformers.xyz tanimoto
chemsmart run grouper -f conformers.xyz -T 0.85 tanimoto --fingerprint-type morgan
Default threshold: 0.9
Strategy-specific Options
Option |
Type |
Description |
|---|---|---|
|
choice |
Fingerprint type (default: rdkit) |
Available fingerprint types:
rdkit/rdk: RDKit topological fingerprintsmorgan: Morgan circular fingerprints (ECFP-like)maccs: MACCS keysatompair: Atom pair fingerprintstorsion: Topological torsion fingerprintsusr: Ultrafast Shape Recognition (3D shape-based)usrcat: USR with Credo Atom Types (3D shape-based)
Energy-based Strategies
energy
Group molecules based on their energy differences.
chemsmart run grouper -f conformers.xyz energy
chemsmart run grouper -f conformers.xyz -T 0.5 energy
Default threshold: 1.0 kcal/mol
Note: All molecules must have energy information. For xyz files, energy is read from the comment line. For log files, Gibbs free energy is calculated as SCF energy + thermal correction.
Other Strategies
formula
Group molecules by their molecular formula.
chemsmart run grouper -f molecules.xyz formula
This is a quick way to separate different compounds in a mixed set.
connectivity
Group molecules by their molecular connectivity/topology.
chemsmart run grouper -f molecules.xyz connectivity
Groups molecules that have the same bond connectivity graph, regardless of 3D geometry.
isomorphism
Group molecules based on graph isomorphism using RDKit.
chemsmart run grouper -f molecules.xyz isomorphism
Similar to connectivity but uses RDKit’s isomorphism checking.
Complete Linkage
All threshold-based grouping strategies use complete linkage clustering, which ensures that all members within a group are within the threshold distance of each other. This prevents the “chaining effect” where dissimilar structures could end up in the same group through intermediate structures.