Chemical Identification

CAS

One of the best pieces of evidence that chemistry is a sloppy field is the existence of CAS numbers. This stands for "Chemical Abstracts Service". The problem with CAS numbers is that the mapping between chemical and number is established and controlled by a proprietary non-public group, the American Chemical Society. This makes it impossible for poorly funded groups and individuals to work with large quantities of these numbers.

Even though most chemists use these numbers most of the time, I believe it is wrong to do so and they should be avoided categorically. It is especially insidious when important government safety regulations refer to chemicals exclusively by this closed system designed for insiders.

SMILES

The correct way to refer to chemicals is by their SMILES string.

Here is a thorough reference from Molsoft. ICM can generate models based on SMILES strings.

build smiles 'O=C(OCCC)c1cc(O)c(O)c(O)c1'

Simplified Molecular Input Line Entry System SMILES is a way to represent the graph of chemical structures, i.e. what atoms are present and how do they connect to each other. The SMILES string is obtained by listing the nodes encountered in a depth-first tree traversal after the graph is turned into a spanning tree by breaking any cycles. Where cycles have been broken, numbers are used to indicate the breaks. Parentheses are used to indicate points of branching on the tree. Implicit hydrogens are included.

  • [] - Elements with no implicit hydrogens, so [O] is an oxygen atom alone while water is simply O. This seems to work for groups too. Only B ,C, N, O, P, S, F, Cl, Br, and I can be used without brackets.

  • () - Branches.

  • = - Double bond.

  • # - Triple bond.

  • $ - Quadruple bond.

  • % - Means there are more than 9 bonds and the next N numbers are the same label.

  • / - Stereochemistry. cis vs. trans (across)

  • \ - Stereochemistry.

  • @ - Stereochemistry, tetrahedral.

Isotopes are written [14c] for Carbon-14.

SDF

Although SMILES is rarely insufficient to convey what is needed about a molecule, there are more verbose formats which are extremely common. Although SDF is an embarrassingly terrible data format, it is nearly universal in biochemistry.

SDF stands for "Structure Data File" and is an extension of the "molfile" format. The relationship is described in this Wikipedia article. Here is a proper definition of the format.

Here are the main important points of an SDF file.

benzene                                 <= Molecule name.
ACD/Labs0812062058                      <= User/Program/Date/Who knows?
                                        <= Comment. Required. Often blank.
 6  6  0  0  0  0  0  0  0  0  1 V2000  <= 6 atoms, 6 bonds, Version 2k or 3k.
   1.9050   -0.7932    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   1.9050   -2.1232    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   0.7531   -0.1282    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   0.7531   -2.7882    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  -0.3987   -0.7932    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  -0.3987   -2.1232    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
 2  1  1  0  0  0  0                    \= Atom block. Enumerates atoms.
 3  1  2  0  0  0  0                    <= The bond block.
 4  2  2  0  0  0  0                       Describes atom relationships.
 5  3  1  0  0  0  0
 6  4  1  0  0  0  0
 6  5  2  0  0  0  0
M  STY  1   1 SUP                       <= Properties block. (S group type)
M  SLB  1   1   1                          (S group label) No properties required.
M  SAL   1  4  26  27  28  29              (S group atom list) Examples only.
M  SBL   1  1   1                          (S group bond list)
M  SMT   1 CF3                             (S group subscript)
M  SBV   1   1    0.0000   -0.8250         (Super atom bond and vector)
M  END                                  <= End of the M section block
>  <Vendor>                             <= Data header.
The Big Energy Corp.                    <= Data for that property.
                                        <= Separate with blank line.
>  <Molecular Weight>                   <= As many properties as you need.
78.1118
$$$$                                    <= Record separator.
The fields of the atom block.
  • X, Y, Z coordinates

  • Periodic table chemical symbol

  • 1- Mass difference (maybe unnecessary)

  • 2- Charge (maybe superfluous and optional)

  • 3- Stereo parity (often ignored)

  • 4- Hydrogen count

  • 5- "Stereo care" flag (wastes 3 bytes for one mostly useless bit)

  • 6- Valence or number of bonds to this atom (redundant, optional)

  • 7- HO indicator (redundant, ignored)

  • 8- Not used!

  • 9- Not used!

  • 10- "Atom mapping number" (no more information)

  • 11- "Inversion retention flag" (0,1, or 2)

  • 12- "Exact change flag" (1 bit 0 or 1)

The fields of the bond block.
  • First atom’s position in atom block.

  • Second atom’s position in atom block.

  • Type (1=single,2=double,3=triple,4=aromatic,5=1or2,6=1or4,7=2or4,8=any)

  • Stereo (0=not,1=up,4=either,3=cis/trans,6=down; pointy end is first atom).

  • Not used!

  • Bond topology (0=either,1=ring,2=chain)

  • Reacting center status (0=unmarked,1=center,-1=not,2=no change, 4=bond made/broken,8=bond order changes, 12=4&8,5=4&1,9=8&1,13=12&1)

Acidity

A molecule or ion capable of donating a proton. A proton can be thought of as a hydrogen ion H+.

pH is approximately the negative of the logarithm to base 10 of the molar concentration, measured in units of moles per liter, of hydrogen ions. A pH of less than 7 is acidic, greater is basic.

pH= Power Hydrogen.

Proper pH for a swimming pool is 7.35, the pH of the human eye. This is slightly basic.

Acid
  • Release H+ ions in aqueous solution

  • Lower pH than 7

  • Blue paper turns red

Base
  • Release hydroxide ions in aqueous solution

  • Strong bases often end in "hydroxide", e.g. lithium hydroxide, sodium hydroxide, potassium hydroxide, and calcium hydroxide

  • Red paper turns blue

  • Includes soaps

  • Higher pH than 7

Table 1. Typical pH

11.0

Ammonia

Very basic

10.5

Milk of magnesia

8.3

Baking soda

7.4

Human blood

7

Water

Neutral

6.6

Milk (Cows?)

5.5

Human skin

4.5

Tomatoes

4.4

Sour milk

3.0

Apples

2.2

Vinegar

1.0

Battery Acid

0

Hydrochloric Acid

Very Acidic

Salts

When acids and bases react, they form water and a salt.

Inorganic

Phosphate

A phosphate is salt of phosphoric acid. It is characterized by a group based on a phosphorous atom surrounded by 4 oxygen atoms in a tetrahedral arrangement. 3 of the oxygens are negatively charged and the other has a double bond to the phosphorous. Adding phosphates (phosphorylation) and removing them (dephosphorylation) from proteins are key energy managing strategies for living things.

Phosphates usually pick up hydrogens or they are negatively charged. They can also pick up other things such as sodium producing mono-, di-, and trihttps://en.wikipedia.org/wiki/Sodium_phosphates[sodium phosphate].

Organic

Hydroxyl

Hydroxyl = a chemical unit consisting of a hydrogen attached to an oxygen which is attached to something else. I believe these can be found on free floating amino acids where they would otherwise be attached to a protein. Serine, Threonine, and Tyrosine all seem to have hydroxy groups in the main part of the molecule too.

Free radicals seem to be a related topic.

Alcohols

Alcohol = If an -OH hydroxyl is bonded to a carbon in a molecule that contains only carbon and hydrogen, then the molecule is, by virtue of the -OH, an alcohol. Alcohols tend to end in ol like ethanol (human intoxicating), ethylene glycol (antifreeze), diethylene glycol (infamous contaminant), methanol (fuel), menthol (analgesic).

Drinking alcohol is ethanol and is an acyclic chain of two single bond carbons and a hydroxyl with all the proper hydrogens.

Esters

Esters are derived from an acid (organic or inorganic) in which at least one -OH (hydroxyl) group is replaced by an -O-alkyl (alkoxy) group. Usually, esters are derived from a carboxylic acid and an alcohol.

The key features of an ester is a carbon with the following connections. * Double bond to oxygen. * Single bond to some other moiety (often labeled R). Optionally just hydrogen. * Single bond to an oxygen which itself is bound to some other moiety (R'). Or just -1 charged oxygen.

Some esters.

  • Glycerides - fatty acid esters of glycerol, main biological lipid class (animal fat and vegetable oil).

  • Fragrances and pheromones - often low molecular weight esters.

  • Phosphoesters - backbone of DNA.

  • Nitrate esters - explosive (e.g. nitroglycerin).

  • Polyesters - plastics made of monomers linked with esters.

An esterase is a hydrolase enzyme that splits esters into an acid and an alcohol in a chemical reaction with water called hydrolysis. The one I’ve heard most about is acetylcholinesterase which catalyzes the breakdown of acetylcholine and of some other choline esters that function as neurotransmitters. Neurotransmitters are of interest to migraine pathology.

Polycyclic Aromatic Hydrocarbon

Lattices of rings of carbon and hydrogen (only). Like chicken wire. Neutral, nonpolar. Can be produced by incomplete combustion of organic matter. Can be found in tar an asphalt runoff. Also found in smog, marine oil spills.

Simplest is naphthalene which just has two rings.

Believed to be carcinogenic.

Methyl

A Methyl group involves one carbon atom bonded to three hydrogen atoms. The fourth carbon bond can attach to some other part of the molecule. If that is just a fourth hydrogen then the molecule is methane.

Methylation and demethylation is common and involves moving the methyl group to another compound.

If the methyl group is attached to an -OH it becomes methanol.

Butyl

A Butyl group is a four carbon set in various configurations. A normal butyl group is something like: R/\/\ Where the R is a radical with 4 carbons tailing off it. A tert-butyl group is similar except the carbons are in a tetrahedron with the radical linking to one of the non central carbons.

Saturation

Saturated compounds are composed of carbon chains that only contain single bonds. Unsaturated molecules can contain double and triple carbon bonds. Alkene and alkyne are unsaturated.

Alkane

An alkane consists of hydrogen and carbon atoms arranged in an acyclic tree structure in which all the carbon-carbon bonds are single. When all carbon bonds are single, the molecule is called a saturated hydrocarbon. The simplest alkane is methane which is just a carbon atom with 4 hydrogens (CH4).

Alkanes sometimes are called paraffin but paraffin may not always automatically mean Alkane now. If the backbone has more than about 17 carbons, it is usually called a wax. They are not very reactive and have relatively little biological activity.

Oil and natural gas contain alkanes (and a bunch of other sludge).

Table 2. Alkanes

1 carbons

Methane

2 carbons

Ethane

3 carbons

Propane

4 carbons

Butane

5 carbons

Pentane

6 carbons

Hexane

7 carbons

ptane

8 carbons

Octane

There are linear and branched alkanes. There are also cyclic alkanes, called cycloalkanes which are like regular alkanes but they form loops or cycles.

Table 3. Cycloalkane bond angles

3 carbons

Cyclopropane

60

4 carbons

Cyclobutane

90

5 carbons

Cyclopentane

108

6 carbons

Cyclohexane

120

Alkene

An unsaturated hydrocarbon that contains at least one carbon-carbon double bond. The simplest is a pair of double bond carbons each with 2 hydrogens, C2H4 or ethylene or ethene. Alkenes sometimes are called olefin and olefine and eth.

Propylene, isobutylene, and vinyl (aka ethenyl) are alkenes.

Ethyl

An ethyl group is a pair of carbon atoms double bonded optionally attached to additional atoms. Something like: R-CH2CH3

Alkyne

An unsaturated hydrocarbon containing at least one carbon-carbon triple bond. The simplest exemplar is ethyne, a.k.a. acetylene which is just two carbons bound together with a triple bond and a hydrogen on each, C2H2.

Biological Chemistry

Cytosol

Cellular fluid is called Cytosol. Rough concentrations also here. Cytoplasm intercellular fluid containing organelles; cytosol is the cytoplasm minus the organelles.

Composition of cytosol
Knowable things
  • water concentration

  • Ph

  • Atomic proportions?

  • ?

Things I can think of that may be floating around

Cytosol References

  • Macromolecular crowding: obvious but underappreciated 2001 source similar

Chemicals

UV Filters

  • Benzophenone

  • Para-amino Benzoic Acid (PABA)

  • Zinc Oxide

  • Titanium Oxide

Antioxidants

Note the cautionary story of Linus Pauling who wrongly popularized the anti-oxidant mania, especially vitamin C and vitamin E.

  • Tocopherols (Vitamin E) - "A potential confounding factor is the form of Vitamin E used in these studies. As explained earlier, synthetic, racemic mixtures of Vitamin E isomers are not bioequivalent to natural, non-racemic mixtures, yet are widely used academically and commercially."

  • Ascorbic Acid (Vitamin C)

  • Erythorbic Acid

  • Butylated Hydroxyanisole

  • Butylated Hydroxytoluene

  • Sodium Citrate

  • Lecithin

  • Propyl Gallate - Estrogen antagonist? Carcinogenesis?

Antimicrobials

  • Sodium Benzoate

  • Benzoic Acid

  • Potassium Sorbate

  • Sorbic Acid

  • Natamycin

  • Triclosan

  • Triclocarban

  • Hexachlorophene

  • Acetic Acid

  • Sodium Chloride

  • Calcium Propionate

  • Imidazolidinyl Urea

  • Methylchloroisothiazolinone

  • Lactic Acid

  • Sodium Nitrate

  • Sodium Nitrite

  • DMDM Hydantoin

  • Glycols

  • 2-bromo-2-nitropropane-1,3-diol

Sugars

  • Glucose

  • Fructose

  • Sucrose

  • Galactose

  • Mannose

  • Dextrose

  • HFCS

  • Corn Syrup

  • Honey

  • Aspartame

  • Saccharine

  • Neotame

  • Sucralose

  • Acesulfame Potassium

  • Invert sugar

  • Xylitol

  • Tagatose

  • Maltitol

  • Maltose

  • Trehalose

  • Lactose

  • Hydrogenated Starch Hydrosylate

Buffers

  • Sodium Citrate

  • Aminomethyl Propanol (AMP)

  • Tetrasodium Pyrophosphate

  • Phosphoric Acid

Chelating/Sequestering Agents

  • Ethylene Diamine Tetra Acetic Acid (EDTA)

  • Diethylene Triamine Pentaacetic Acid

  • Phosphoric Acid (again?)

  • Tetrasodium Etidronate

  • Sodium Carbonate

Alcohols

  • Ethanol

  • Stearyl Alcohol

  • Cetyl Alcohol

  • Glycerin

  • Menthol

Waxes

  • Esters

Flavorings

  • Acetic Acid

  • Citric Acid

  • Lactic Acid

  • Stearic Acid

  • Phosphoric Acid

  • Fumaric Acid

  • Tartaric Acid

  • Methyl Vanillin

  • Ethyl Vanillin

  • Denatonium Benzoate

  • Vanilla

  • Monosodium Glutamate

Salts

  • Potassium Chloride

Fats

  • Tristearin

  • Trilinolein

  • Olestra

  • Salatrim

  • Guar Gum

  • Locust Bean Gum

  • Xanthan Gum

Colors

  • Annatto

  • Beta-carotene

  • Carmine

  • Saffron

  • Turmeric

  • Titanium Dioxide

  • Allura Red

  • Indigo

  • Caseinate

  • Ferrous Gluconate

  • D&C Green No. 5

  • D&C Red No. 33

  • D&C Violet No. 2

  • FD&C Yellow No. 5

  • Ext. D&C Violet No. 2

Moisture Control

  • Glycerin

  • Sorbitol

  • Sodium PCA

  • Mannitol

  • Propylene Glycol

  • Butylene Glycol

Emulsifiers

  • Lecithin

  • Phosphoric Acid

  • Sorbitan Monostearate

  • Polysorbate 80

  • Glycerol Monostearate (Mono- and Diglycerides)

Stabilizers and Thickeners

  • Sodium Caseinate

  • Calcium Caseinate

  • Polyethylene Glycol (PEG)

  • Polypropylene Glycol (PPG)

  • Lecithin

  • Methylcellulose

  • Sodium Carboxymethylcellulose

  • Xylenesulfonates

  • Agar

  • Gelatin

  • Pectin

  • Alginates

  • Starch and Modified Starch

  • Carrageenan

  • Guar Gum

  • Locust Bean Gum

  • Brominated Vegetable Oil

  • Gum Arabic

  • Xanthan Gum

Dough Conditioners and Whipping Agents

  • Sodium Stearoyl Lactylate

  • Calcium Stearoyl Lactylate

  • Sodium Stearoyl Fumarate

  • Potassium Bromate

  • Tetrasodium Pyrophosphate

  • Fumaric Acid

Stimulants

  • Caffeine

  • Theobromine

  • Ephedrine

Medicinals

  • Salicylic Acid

  • Sulfur

  • Resorcinol

  • Sodium Bicarbonate

  • Hydroquinone

  • Potassium Nitrate

  • Benzocaine

  • Tramadol

  • Acetylsalicylic Acid (Aspirin)

  • Acetaminophen

  • Ibuprofen

  • Naproxen

  • Allantoin

  • Menthol

  • Methyl Salicylate, Ethyl Salicylate, Glycol Salicylate

  • Camphor

  • Methyl Nicotinate, Benzyl Nicotinate

  • Capsaicin

Bleaching

  • Sodium Hypochlorite

  • Calcium Hypochlorite

  • Hydrogen Peroxide

  • Benzoyl Peroxide

  • Borax

  • Sodium Perborate

  • Sodium Carbonate Peroxide

Surfactants

  • Ammonium Lauryl Sulfate

  • Sodium Lauryl Sarcosinate

  • Lauryl Glucoside

  • Cocamidopropyl Betaine

  • Sodium Dodecylbenzenesulfonate

  • Sodium Isethionate

Foam Stabilizers

  • Cocamide MEA, Cocamide DEA, Cocamide TEA

  • Tetrasodium Pyrophosphate

Conditioners

  • Cetyl Alcohol

  • Cetrimonium Chloride

  • Silicones

  • Panthenol

Propellants

  • Nitrous Oxide

  • Isobutane

  • Dimethyl Ether

Polymers and Glue

  • Vinyl Acetate

  • Vinyl Alcohol

  • Methacrylate

Abrasives

  • Hydrated Silica

  • Stannous Fluoride

  • Sodium Fluoride

  • Sodium Monofluorophosphate

Unknown (Found In Perfume)

  • Ethylhexyl Methoxycinnamate

  • Limonene Besides as a fragrance (oranges) used both as a flavoring and an insecticide.

  • Butyl Methoxydibenzoylmethane

  • Ethylhexyl Salicylate

  • Linalool

  • Hexyl Cinnamal

  • Citronellol

  • Hydroxycitronellal Perfume odorant, not much known.

  • Coumarin

  • Butylphenyl Methylpropional

  • Alpha-Isomethyl Ionone

  • Citral

  • Geraniol

  • Isoeugenol

  • Benzyl Benzoate

  • Tromethamine

  • Benzyl Alcohol

  • Tetramethylhydroxypiperidinol Citrate (TRIS)

  • Benzyl Salicylate

  • Hydroxyisohexyl 3-Cyclohexene

  • Carboxaldehyde

  • Farnesol

  • Cinnamyl Alcohol

  • Anthranilate

Rocket Fuel

Titan II rockets used a hypergolic (mix and it ignites) fuel. It was a 50-50 blend of hydrazine and unsymmetrical dimethyl hydrazine (brand name: Aerozine 50). The oxidizer was nitrogen tetroxide.

Drugs and Pharmacology

  • agonist - ligand that binds to the main (orthosteric) binding site and causes the protein’s effect to occur.

  • antagonist - ligand that binds to the main binding site and causes the protein’s effect not to occur. A.k.a blockers. They occupy the binding site but in a way that does not trigger the protein’s action. Calcium channel blockers are an example. Question: are antagonists more likely to be smaller than agonists?

  • inverse agonist - ligand that binds to the main binding site and causes it to do the opposite of what the normal agonist would do.

allosteric modulator - Allosteric modulators bind to a site distinct from that of the orthosteric agonist binding site. Usually they induce a conformational change within the protein structure. Indirectly influences (modulates) the effects of an agonist or inverse agonist at a target protein.

  • orthosteric binding site - the main binding site for the main agonist.

  • positive allosteric modulator (PAM) - a.k.a. allosteric enhancer, induces an amplification of the orthosteric agonist’s effect, either by enhancing the binding affinity or the functional efficacy of the orthosteric agonist for the target protein.

  • negative allosteric modulator (NAM) - attenuates the effects of the orthosteric ligand, but is inactive in the absence of the orthosteric ligand.

  • silent allosteric modulator (SAM) - occupies the allosteric binding site but are functionally neutral.