Navigation

Examples of Fingerprint and Descriptors

Fingerprints or descriptors are an abstract representation of certain structural features of a molecule. These descriptors may represent a structural key within a molecule. This might be as simple as a count of a particular atom type, S, N etc, or halogen, or sp3. It might be the presence of a particular ring system e.g. Phenyl, Pyridyl, Naphthyl, or a functional group e.g. Amide, Ester, Amine. It might be a calculated property Hydrogen Bond donor, Polar Surface area, LogP. Fingerprints are more abstract than a structural key but have the advantage of being more general since they do not represent pre-defined patterns.

Unlike a structural key with its pre-defined patterns, the patterns for a molecule’s fingerprint are generated from the molecule itself. The fingerprinting algorithm examines the molecule and generates the fingerprint based on a set of rules.

Fingerprints

Path-based fingerprints FP2, a path-based fingerprint which indexes small molecule fragments based on linear segments of up to 7 atoms. A molecule structure is analysed to identify linear fragments of length from 1-7 atoms. Single atom fragments of C, N, and O are ignored. A fragment is terminated when the atoms form a ring. For each of these fragments the atoms, bonding and whether they constitute a complete ring is recorded and saved in a set so that there is only one of each fragment type. Chemically identical versions, (i.e. ones with the atoms listed in reverse order and rings listed starting at different atoms) are identified and only a single canonical fragment is retained. Each remaining fragment is assigned a hash number from 0 to 1020 which is used to set a bit in a 1024 bit vector.

Atom Pairs, shortest path between all pairs of atoms ref

Topological Torsion, the TT consists of four consecutive bonded non-hydrogen atoms ref

Extended Connectivity Fingerprints (ECFPs, or “Circular Fingerprints”) DOI offer a number of advantages over other schemes. There is an excellent description of ECFP in a recent blog post.

There are also fingerprints using SMARTS patterns.These generate a topological fingerprint for a molecule using a series of pre-defined structural patterns FP4,

Some examples of descriptors that can be calculated using MOE

a_aro Number of aromatic atoms.
a_count Number of atoms (including implicit hydrogens).
This is calculated as the sum of (1 + hi) over all non-trivial atoms i.
a_heavy Number of heavy atoms #{Zi | Zi > 1}.
a_ICM Atom information content (mean).
This is the entropy of the element distribution in the molecule (including implicit hydrogens but not lone pair pseudo-atoms).
Let ni be the number of occurrences of atomic number i in the molecule.
Let pi = ni / n where n is the sum of the ni.
The value of a_ICM is the negative of the sum over all i of pi log pi.
a_IC Atom information content (total).
This is calculated to be a_ICM times n.
a_nH Number of hydrogen atoms (including implicit hydrogens).
This is calculated as the sum of hi over all non-trivial atoms i plus the number of non-trivial hydrogen atoms.
a_nB Number of boron atoms: #{Zi | Zi = 5}.
a_nC Number of carbon atoms: #{Zi | Zi = 6}.
a_nN Number of nitrogen atoms: #{Zi | Zi = 7}.
a_nO Number of oxygen atoms: #{Zi | Zi = 8}.
a_nF Number of fluorine atoms: #{Zi | Zi = 9}.
a_nP Number of phosphorus atoms: #{Zi | Zi = 15}.
a_nS Number of sulfur atoms: #{Zi | Zi = 16}.
a_nCl Number of chlorine atoms: #{Zi | Zi = 17}.
a_nBr Number of bromine atoms: #{Zi | Zi = 35}.
a_nI Number of iodine atoms: #{Zi | Zi = 53}.
b_1rotN Number of rotatable single bonds.
Conjugated single bonds are not included (e.g. ester and peptide bonds).
b_ar Number of aromatic bonds.
b_count Number of bonds (including implicit hydrogens).
This is calculated as the sum of (di/2 + hi) over all non-trivial atoms i.
b_double Number of double bonds.
Aromatic bonds are not considered to be double bonds.
b_heavy Number of bonds between heavy atoms.
b_rotN Number of rotatable bonds.
A bond is rotatable if it has order 1, is not in a ring, and has at least two heavy neighbors.
b_single Number of single bonds (including implicit hydrogens).
Aromatic bonds are not considered to be single bonds.
b_triple Number of triple bonds.
Aromatic bonds are not considered to be triple bonds.
chiral The number of chiral centers.
lip_acc The number of O and N atoms.
lip_don The number of OH and NH atoms.
lip_druglike One if and only if lip_violation < 2 otherwise zero.
lip_violation The number of violations of Lipinski’s Rule of Five nmol The number of molecules (connected components).
opr_brigid The number of rigid bonds from [Oprea 2000].
opr_leadlike One if and only if opr_violation < 2 otherwise zero.
opr_nring The number of ring bonds from [Oprea 2000].
opr_nrot The number of rotatable bonds from [Oprea 2000].
opr_violation The number of violations of Oprea’s lead-like test [Oprea 2000].
rings The number of rings.
Pharmacophore type counts a_acc Number of hydrogen bond acceptor atoms (not counting acidic atoms but counting atoms that are both hydrogen bond donors and acceptors such as -OH).
a_acid Number of acidic atoms.
a_base Number of basic atoms.
a_don Number of hydrogen bond donor atoms (not counting basic atoms but counting atoms that are both hydrogen bond donors and acceptors such as -OH).
a_hyd Number of hydrophobic atoms.
vsa_acc Approximation to the sum of VDW surface areas (√Ö2) of pure hydrogen bond acceptors (not counting acidic atoms and atoms that are both hydrogen bond donors and acceptors such as -OH).
vsa_acid Approximation to the sum of VDW surface areas of acidic atoms (√Ö2).
vsa_base Approximation to the sum of VDW surface areas of basic atoms (√Ö2).
vsa_don Approximation to the sum of VDW surface areas of pure hydrogen bond donors (not counting basic atoms and atoms that are both hydrogen bond donors and acceptors such as -OH) (√Ö2).
vsa_hyd Approximation to the sum of VDW surface areas of hydrophobic atoms (√Ö2).
vsa_other Approximation to the sum of VDW surface areas (√Ö2) of atoms typed as “other”.
vsa_pol Approximation to the sum of VDW surface areas (√Ö2) of polar atoms (atoms that are both hydrogen bond donors and acceptors), such as -OH Charge based ASA+ Water accessible surface area of all atoms with positive partial charge (strictly greater than 0).
ASA- Water accessible surface area of all atoms with negative partial charge (strictly less than 0).
ASA_H Water accessible surface area of all hydrophobic (|qi|<0.2) atoms.
ASA_P Water accessible surface area of all polar (|qi|>=0.2) atoms.
DASA Absolute value of the difference between ASA+ and ASA-.
CASA+ Positive charge weighted surface area, ASA+ times max { qi > 0 } [Stanton 1990].
CASA- Negative charge weighted surface area, ASA- times max { qi < 0 } [Stanton 1990].
DCASA Absolute value of the difference between CASA+ and CASA- [Stanton 1990].
dipole Dipole moment calculated from the partial charges of the molecule.

Examples available using ChemAxon cxcalc

atomcount, composition, dotdisconnectedformula, dotdisconnectedisotopeformula, elemanal, elementalanalysistable, exactmass, formula, icomposition, iformula, isotopecomposition, isotopeformula, mass

Charge atomicpolarizability, atompol, averagemolecularpolarizability, averagepol, avgpol, axxpol, ayypol, azzpol, charge, formalcharge, ioncharge, molecularpolarizability, molpol, oen, orbitalelectronegativity, pol, polarizability, tholepolarizability, tpol, tpolarizability

Conformation conformers, hasvalidconformer, leconformer, lowestenergyconformer, moldyn, moleculardynamics

Geometry aliphaticatom, aliphaticatomcount, aliphaticbondcount, aliphaticringcount, aliphaticringcountofsize, angle, aromaticatom, aromaticatomcount, aromaticbondcount, aromaticringcount, aromaticringcountofsize, asa, asymmetricatom, asymmetricatomcount, asymmetricatoms, balabanindex, bondcount, bondtype, carboaliphaticringcount, carboaromaticringcount, carboringcount, chainatom, chainatomcount, chainbond, chainbondcount, chiralcenter, chiralcentercount, chiralcenters, connected, connectedgraph, cyclomaticnumber, dihedral, distance, distancedegree, dreidingenergy, eccentricity, fragmentcount, fusedaliphaticringcount, fusedaromaticringcount, fusedringcount, hararyindex, heteroaliphaticringcount, heteroaromaticringcount, heteroringcount, hindrance, hyperwienerindex, largestatomringsize, largestringsize, largestringsystemsize, maximalprojectionarea, maximalprojectionradius, maximalprojectionsize, minimalprojectionarea, minimalprojectionradius, minimalprojectionsize, molecularsurfacearea, msa, plattindex, polarsurfacearea, psa, randicindex, ringatom, ringatomcount, ringbond, ringbondcount, ringcount, ringcountofatom, ringcountofsize, ringsystemcount, ringsystemcountofsize, rotatablebond, rotatablebondcount, shortestpath, smallestatomringsize, smallestringsize, smallestringsystemsize, stereodoublebondcount, stericeffectindex, sterichindrance, szegedindex, topanal, topologyanalysistable, vdwsa, volume, wateraccessiblesurfacearea, wienerindex, wienerpolarity

Isomers canonicaltautomer, dominanttautomerdistribution, doublebondstereoisomercount, doublebondstereoisomers, generictautomer, majortautomer, moststabletautomer, stereoisomercount, stereoisomers, tautomercount, tautomers, tetrahedralstereoisomercount, tetrahedralstereoisomers

Markush Enumerations enumerationcount, enumerations, markushenumerationcount, markushenumerations, randommarkushenumerations

Partitioning logd, logp

Protonation averagemicrospeciescharge, chargedistribution, isoelectricpoint, majormicrospecies, majorms, microspeciesdistribution, msdistr, pi, pka

Other acc, acceptor, acceptorcount, acceptormultiplicity, acceptorsitecount, acceptortable, accsitecount, aromaticelectrophilicityorder, aromaticnucleophilicityorder, canonicalresonant, chargedensity, don, donor, donorcount, donormultiplicity, donorsitecount, donortable, donsitecount, electrondensity, electrophilicityorder, electrophiliclocalizationenergy, frameworks, hbda, hbonddonoracceptor, hmochargedensity, hmoelectrondensity, hmoelectrophilicityorder, hmoelectrophiliclocalizationenergy, hmohuckel, hmohuckeleigenvalue, hmohuckeleigenvector, hmohuckelorbitals, hmohuckeltable, hmolocalizationenergy, hmonucleophilicityorder, hmonucleophiliclocalizationenergy, hmopienergy, huckel, huckeleigenvalue, huckeleigenvector, huckelorbitals, huckeltable, localizationenergy, msacc, msdon, nucleophilicityorder, nucleophiliclocalizationenergy, pichargedensity, pienergy, refractivity, resonantcount, resonants, totalchargedensity

Examples from OpenBabel

name [Name]
formula [Formula]
molweight [Molecular Weight]
exact
mass [Isotopic Mass]
canonicalSMILES [String]
num
atoms [Number]
numbonds [Number]
num
residues [Number]
sequence [Residue Sequence]
num_rings [Number of Rings (by SSSR)]
logP [Number (octanol-water partition)]
PSA [Number (topological polar surface area)]
MR [Number (molar refractivity)

Examples from MayaChemTools

MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings, van der Waals MolecularVolume [ Ref 93 ], RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, LogP and Molar Refractivity (SLogP and SMR), Topological Polar Surface Area (TPSA), Fraction of SP3 carbons (Fsp3Carbons) and SP3 carbons (Sp3Carbons), MolecularComplexity 

Examples from Filter-it

element rules topological property rules ATOMS
CARBONS
HETEROATOMS
HETERO
CARBONRATIO
HALIDES
HALIDE
FRACTION
BONDS
ROTATABLEBONDS
RIGID
BONDS
FLEXIBILITY
CHIRALCENTERS
HBOND
ACCEPTORS
HBONDDONORS
LIPINSKI
ACCEPTORS
LIPINSKIDONORS
FORMAL
CHARGES
TOTALFORMALCHARGE
RINGS
ATOMSINSMALLESTRING
ATOMS
INLARGESTRING
RINGFRACTION
AROMATIC
RINGS
ATOMSINSMALLESTAROMATICRING
ATOMSINLARGESTAROMATICRING
AROMATICRINGFRACTION
AROMATICOVERTOTALRINGFRACTION
NONAROMATICRINGS
ATOMS
INSMALLESTNONAROMATICRING
ATOMS
INLARGESTNONAROMATICRING
NONAROMATIC
RINGFRACTION
RINGSYSTEMS
ATOMS
INSMALLESTRINGSYSTEM
ATOMSINLARGESTRINGSYSTEM
RINGSYSTEM
FRACTION
RINGSINSMALLESTRINGSYSTEM
RINGS
INLARGESTRINGSYSTEM
SIDECHAINS
ATOMSINSMALLESTSIDECHAIN
ATOMS
INLARGESTSIDECHAIN
SIDECHAINFRACTION
CORES
ATOMS
INCORE
CORE
FRACTION
BRIDGES
ATOMSINSMALLESTBRIDGE
ATOMS
INLARGESTBRIDGE
BRIDGEFRACTION
physical property rules
MOLWT
LOGP
LOGS
TPSA
ANDREWS
ENERGY
LIGANDEFFICIENCY
fragment rules
Andrews Energy
Ligand Efficiency
ADMET
SCORE
LIPINSKI_VIOLATIONS
ABSORPTION

PaDEL

PaDEL‐Descriptor is a software for calculating molecular descriptors and fingerprints DOI. The software currently calculates 797 descriptors (663 1D, 2D descriptors, and 134 3D descriptors) and 10 types of fingerprints. 

Descriptor classDescriptor typeaNumber of descriptorsCalculation speed (mol/s)b
ALOGP31084
APol124,738
Aromatic atoms count116,878
Aromatic bonds count116,336
Atom count132127
Autocorrelation (charge)56215
Autocorrelation (mass)5777
Autocorrelation (polarizability)5741
BCUT6653
Bond count56014
BPol123,060
Carbon types920,327
Chi chain10310
Chi cluster8439
Chi path16310
Chi path cluster6347
Eccentric connectivity index111,611
Atom type electrotopological state482289
Fragment complexity127,400
Hbond acceptor count116,126
Hbond donor count116,384
Kappa shape indices32413
Largest chain110,088
Largest Pi system113,804
Longest aliphatic chain110,233
Mannhold LogP116,328
McGowan volume1546
Molecular distance edge192194
Molecular linear free energy relation6300
Petitjean number110,131
Ring count341757
Rotatable bonds count111,471
Rule of five1807
Topological polar surface area13535
Vertex adjacency information (magnitude)126,160
Weight124,532
Weighted path5513
Wiener numbers210,755
XlogP1910
Zagreb index123,593
Charged partial surface area29309
Gravitational index910,012
Length over breadth28459
Moment of inertia79894
Petitjean shape index28213
WHIM (atomic masses)178229
WHIM (atomic polarizabilities)178150
WHIM (Mulliken atomic electronegativities)178290
WHIM (unit weights)178396
WHIM (van der Waals volumes)178072
CDK fingerprint1024203
CDK extended fingerprint1024189
CDK graph only fingerprint1024223
Estate fingerprint79276
MACCS fingerprint166170
Pubchem fingerprint88156
Substructure fingerprint307116
Substructure fingerprint count307113
Klekota‐Roth fingerprint486012
Klekota‐Roth fingerprint count486012

Nathan Brown has written an excellent book on In silico Medicinal Chemistry. https://doi.org/10.1039/9781782622604