.. _chemdraw-organometallic: ####################################### ChemDraw Organometallic Complex Files ####################################### CHEMSMART can read organometallic complexes drawn in ChemDraw (``.cdx`` and ``.cdxml``) and generate 3D structures suitable for quantum chemistry calculations with Gaussian or ORCA. .. note:: Organometallic complex support requires RDKit and, for binary ``.cdx`` files, Open Babel. See :ref:`installation requirements ` below. .. warning:: **Always inspect the auto-generated 3D structures from difficult organometallic species.** The 3D structures produced for organometallic complexes with ring ligands (Cp, Cp\*, η6-arene, fused indenyl, etc.) are **initial-guess geometries** generated by a rule-based heuristic. While they are suitable starting points for DFT geometry optimisation, they may contain: - incorrect bond angles or torsions for unusual coordination environments - approximate metal–ring distances that differ from the true equilibrium geometry - imperfect hydrogen positions, particularly for fused or bridged ring systems Use a molecular viewer (e.g. PyMOL, Avogadro, GaussView) to verify the structure when needed. ***** Why ***** Organometallic complexes present several challenges when read from ChemDraw files: - RDKit raises ``Can't kekulize mol`` errors for aromatic ligands coordinated to a metal centre. - RDKit raises ``UFFTYPER: Unrecognized atom type`` errors for transition metals. - ChemDraw stores η5/η6 hapticity using ``NodeType="MultiAttachment"`` phantom atoms connected to the metal via ``Display="Dash"`` bonds. RDKit reads each such node as a real carbon atom, producing spurious CH₃ groups attached to the metal. - ChemDraw can store aromatic ligands (e.g. Cp, benzene) as **separate fragments** that need to be combined with the metal fragment before 3D coordinates can be generated. CHEMSMART handles all of these cases automatically. However, as much as our codes try to generate the right structures, some complicated organometallic complexes with unusal ligands may not be interpreted perfectly based on chemdraw drawings. ********************************* Supported Organometallic Inputs ********************************* The following types of organometallic complexes drawn in ChemDraw are supported (other types may be supported but have not been rigorously tested): - Transition-metal complexes with **η5-cyclopentadienyl (Cp)** ligands, including **Cp\*** (pentamethylcyclopentadienyl) - Transition-metal sandwich complexes (e.g. titanocene, nickelocene, ferrocene) - Transition-metal complexes with **η6-arene** (e.g. benzene) ligands (e.g. bis-benzene iridium, rhodium) - Ansa-bridged complexes (two ring ligands connected by a bridging atom, e.g. O-bridged bisindenyl) - General transition-metal complexes with ancillary phosphine, amine, carbonyl, halide, or alkyl ligands - Mixed complexes combining aromatic and non-aromatic ligands Example usage: .. code:: bash # Submit a Gaussian optimization for a ferrocene-like complex chemsmart sub -s server gaussian -p project -f ferrocene.cdxml -c 0 -m 1 opt # Binary CDX format (requires Open Babel) chemsmart sub -s server gaussian -p project -f complex.cdx -c 0 -m 1 opt # Multi-molecule file: select the second molecule chemsmart sub -s server gaussian -p project -f complexes.cdxml -i 2 -c 2 -m 3 opt # Multi-molecule file: select all molecule chemsmart sub -s server gaussian -p project -f complexes.cdxml -i : -c 2 -m 3 opt .. _chemdraw-organometallic-requirements: ************** Requirements ************** +------------------+------------------------------------------+-----------------------------------+ | Dependency | Purpose | Required for | +==================+==========================================+===================================+ | RDKit | Parse ``.cdxml``; sanitize molecules | Both ``.cdx`` and ``.cdxml`` | +------------------+------------------------------------------+-----------------------------------+ | Open Babel CLI | Convert binary ``.cdx`` to SDF for RDKit | ``.cdx`` files only | | (``obabel``) | | | +------------------+------------------------------------------+-----------------------------------+ If ``obabel`` is not installed and a ``.cdx`` file is provided, CHEMSMART raises a ``ValueError`` with instructions to install Open Babel or re-save the file as ``.cdxml``. ************** How It Works ************** CHEMSMART applies the following pipeline when reading ChemDraw files containing ring ligands: #. **CDXML preprocessing – strip MultiAttachment nodes** – For ``.cdxml`` files, the XML is parsed directly to remove ``NodeType="MultiAttachment"`` atoms and their ``Display="Dash"`` bonds *before* RDKit reads the file. These are ChemDraw drawing artefacts that represent η5/η6 hapticity graphically; they are not real atoms. Real ligands (e.g. methyl groups bonded to the metal) are preserved because they use ordinary single bonds without the ``MultiAttachment`` flag. #. **Parse without sanitization** – ``sanitize=False`` is passed to RDKit (``MolsFromCDXMLFile``) to avoid kekulization errors during initial parsing. For ``.cdx`` files, Open Babel converts the binary format to SDF first. #. **Update property cache** – ``mol.UpdatePropertyCache(strict=False)`` is called on every molecule to avoid *pre-condition violation* errors before any further processing. #. **Combine metal and ligand fragments** – For Ir/Rh-type benzene complexes, ChemDraw stores a small metal stub fragment separately from the free benzene ring fragments. CHEMSMART detects this pattern and merges the fragments into a single molecule. Any residual degree-1 carbon stubs on the metal (from the original ChemDraw representation) are removed before merging. #. **Normalize metal bonds** – Aromatic bond flags on any bond involving a metal atom are removed (converted to single bonds). RDKit does not support aromatic bonds to metal centres. #. **Add η5 coordination bonds for Cp-type rings** – For each 5-membered all-carbon ring that is not yet bonded to the metal, one single bond is added from the metal to an anchor ring carbon. The ring is simultaneously de-aromatized to an alternating single/double bond pattern (SINGLE–DOUBLE–SINGLE–DOUBLE–SINGLE) so that every ring carbon is sp2 with exactly one hydrogen. This applies to both pure Cp rings and fused rings (e.g. indenyl). #. **Add η6 coordination bonds for arene rings** – For each 6-membered all-carbon benzene ring not yet bonded to the metal, one single bond is added from the metal to an anchor ring carbon. The bond pattern around the anchor is set so that the anchor carbon retains one hydrogen (total valence 3: two ring bonds + one metal bond). #. **Selective sanitization** – Kekulization is skipped for molecules that contain metals to avoid ``Can't kekulize mol`` errors. All other sanitization steps (valence check, ring detection, etc.) are applied normally. #. **Add hydrogens and generate initial 3D coordinates** – Explicit hydrogens are added with ``AddHs``, then 3D coordinates are generated with ``EmbedMolecule`` (ETKDG). #. **Rigid-body ring repositioning** – After ETKDG embedding, each η5/η6 ring system is moved as a **rigid body** to the correct haptocentric geometry: #. For fused ring systems (e.g. indenyl = Cp fused to benzene), all atoms of the fused system are collected by BFS expansion and moved together. #. A stacking axis is computed from the centroids of the two ring systems (sandwich) or from the ETKDG metal position (half-sandwich/mono-hapto cases). #. Each ring is rotated so its plane is **perpendicular to the stacking axis** (Rodrigues rotation). #. Each ring centroid is translated to ``metal_position ± ideal_distance × axis``: - η5-Cp ring: ideal metal–centroid distance = **2.0 Å** - η6-arene ring: ideal metal–centroid distance = **1.75 Å** #. The metal atom is placed at the midpoint between the repositioned ring centroids. #. Bridge atoms (e.g. the O atom in ansa complexes) are placed at the midpoint of their bonded ring atoms' new positions. #. **MMFF geometry refinement** – MMFF94 force-field optimisation is attempted to refine bond lengths, angles, and hydrogen positions. MMFF does not have parameters for most transition metals, so optimisation may fail silently; in that case the rigid-body geometry is kept. *********************************** Extracting Ring-Ligand Structures *********************************** The sections below show concrete examples of how to extract and use 3D structures from ChemDraw files that contain ring ligands. These are the structures that we have used for testing our codes. Titanocene Dimethyl (TiCp₂Me₂) ============================== A ChemDraw file containing TiCp₂Me₂ (two Cp rings and two methyl groups on Ti) can be processed directly: .. code:: bash # Extract the first molecule and submit a Gaussian optimization chemsmart sub -s server gaussian -p project -f ti_complexes.cdxml -i 1 -c 0 -m 1 opt B3LYP/def2-SVP CHEMSMART will: #. Strip the MultiAttachment phantom atoms ChemDraw uses to draw the Cp–Ti hapticity lines. #. Reconnect each Cp ring to Ti via a single η5 anchor bond with alternating Cp ring bond orders. #. Add the two real Ti–CH₃ methyl groups (which survive stripping because they use ordinary bonds). #. Reposition the two Cp rings above and below Ti at the correct 2.0 Å centroid distance. #. Embed and refine with MMFF. Ferrocene / Nickelocene (Sandwich Cp₂M) ======================================= For sandwich complexes with two Cp rings and no other ligands: .. code:: bash chemsmart sub -s server gaussian -p project -f ferrocene.cdxml -c 0 -m 1 opt B3LYP/def2-SVP The two Cp rings are placed above and below the metal with D5h-like symmetry (eclipsed, as a starting point). Bis-Benzene Iridium / Rhodium Complexes ======================================= For η6-arene complexes (two benzene ligands above and below the metal): .. code:: bash chemsmart sub -s server gaussian -p project -f bis_benz_ir.cdxml -c 0 -m 1 opt B3LYP/def2-SVP CHEMSMART combines the benzene ring fragments with the metal stub, sets alternating bond orders so that every ring carbon retains one hydrogen, and repositions the rings at 1.75 Å from the metal centroid. Ansa-Bisindenyl Iron Complex (O-Bridged) ======================================== For ansa complexes where two indenyl ligands are connected by a bridging atom: .. code:: bash chemsmart sub -s server gaussian -p project -f ansa_fe.cdxml -c 0 -m 1 opt B3LYP/def2-SVP The BFS-based fused-ring collection moves each entire indenyl (9 carbons: Cp ring + fused benzene ring) as a single rigid body, and the O bridge atom is placed at the midpoint of its two ring-neighbour carbons' new positions. .. tip:: After extraction, always inspect the 3D structure in a molecular viewer before submitting: .. code:: bash # View the extracted structure locally (requires PyMOL) chemsmart mol view -f ansa_fe.cdxml ********************** Current Restrictions ********************** .. warning:: The following restrictions apply to the current organometallic complex support. **Always verify auto-generated structures before submitting quantum chemistry calculations**, especially for complexes with multiple ring ligands, fused ring systems, or unusual bridging groups. Coordinate Accuracy for Ring Ligands The rigid-body repositioning algorithm places rings at ideal centroid distances (2.0 Å for η5-Cp, 1.75 Å for η6-arene) and orients them perpendicular to the metal–centroid axis. These are reasonable starting geometries, but they do not account for: - ring tilting (common in bent-sandwich complexes such as TiCp₂Me₂) - metal–ring distance variation with oxidation state or spin state - non-eclipsed ring orientations (staggered vs. eclipsed Cp rings in sandwich complexes) A **DFT geometry optimisation** must always be performed before using the structure for energy analysis. Fused Ring Systems (Indenyl, Fluorenyl) Fused ring systems (e.g. indenyl = Cp fused to benzene, fluorenyl = Cp fused to two benzene rings) are moved as rigid bodies. The internal geometry of the fused system is from ETKDG and is generally correct, but the O–C bond lengths in ansa bridges are approximate (~1.4 Å after MMFF) and may require further DFT refinement. η5 Coordination Representation Cp and Cp\* η5 coordination is represented with a **single metal–carbon σ-bond** to one anchor ring carbon. The connectivity is a structural approximation that allows RDKit to build a valid molecular graph and generate 3D coordinates. The bond order has no electronic structure meaning. η6 Arene Coordination η6 metal–arene coordination is represented with a single metal–carbon anchor bond, with the remaining ring carbons having no explicit bond to the metal. The 3D positioning (metal above ring centroid) is geometrically correct, but no formal M–C bonds exist for the other five carbons. Multi-Hapto Ligands Beyond Cp/Benzene Higher-order hapticity ligands (η7-cycloheptatrienyl, η8-cyclooctatetraene, etc.) and non-carbon η-donors are not explicitly handled and may produce errors or incomplete structures. Force-Field Optimization The MMFF94 force field does not have parameters for most transition metals. MMFF optimisation is attempted but silently skipped on failure. The rigid-body repositioned geometry is kept in that case. Charge and Multiplicity Charge and multiplicity of organometallic complexes are **not** inferred from the ChemDraw file. You must always specify them explicitly with ``-c`` and ``-m``: .. code:: bash # Titanocene dimethyl: charge 0, singlet (d0 Ti(IV)) chemsmart sub -s server gaussian -p project -f ti_cpx.cdxml -c 0 -m 1 opt # Iron(II) complex with overall charge 2+ and singlet multiplicity chemsmart sub -s server gaussian -p project -f fecpx.cdxml -c 2 -m 1 opt Incorrect charge or multiplicity will lead to a failed quantum chemistry calculation. Multi-Metal and Unusual ChemDraw Layouts The fragment-combination step uses the heuristic that a small metal-containing fragment followed immediately by aromatic ring fragments should be merged. This may not work correctly for: - very unusual ChemDraw drawing layouts - multi-metal (e.g. dinuclear) systems - complexes where the metal and ring ligands are widely separated in the ChemDraw page If the extracted structure is clearly wrong (e.g. disconnected fragments, wrong atom count), re-draw the complex in ChemDraw with a more standard layout and try again. ********** Examples ********** Ferrocene Derivative (Cp₂Fe) ============================ Draw the complex in ChemDraw (or use an existing ``.cdxml`` file) and run: .. code:: bash chemsmart sub -s server gaussian -p project -f ferrocene.cdxml -c 0 -m 1 opt B3LYP/def2-SVP Half-Sandwich Complex ===================== For a half-sandwich complex such as [CpFe(CO)₂Cl]: .. code:: bash chemsmart sub -s server gaussian -p project -f half_sandwich.cdxml -c 0 -m 2 opt B3LYP/def2-SVP .. tip:: For open-shell transition-metal complexes, always verify the multiplicity. A d⁵ Fe(III) centre in a weak-field environment typically has multiplicity 6 (high-spin), whereas in a strong-field environment it may be 2 (low-spin). ********** See Also ********** - :doc:`molecule-input-formats` – all supported input formats - :doc:`gaussian-cli-options` – available Gaussian calculation options - :doc:`orca-cli-options` – available ORCA calculation options