1

I have a periodic table which I coded in java and i'm working on creating simple compounds (only binary compounds) and returning information about them, but I'm having trouble with turning a chemical formula into the IUPAC name. I have it setup so that a person can click on an element on the periodic table and then click on another element, and then a window would pop up displaying possible compounds that could be created from those two elements. These compounds would be displayed as chemical formulas such as CO2 or CH4. I want to be able to convert the chemical formula of the compound that someone picked into the iupac name so it can be displayed with the other information.

I attempted to use the chemistry development kit, but I honestly don't have much of an idea of how to use it and even if it can be applicable in fixing my issue. It makes me put in the bond types manually, which seems like a ton more work than I should be bargaining for.

For example: CO2 would output Carbon Dioxide and CH4 would output Methane.

bimbob
  • 21
  • 5
  • 2
    This question might work better on the chemistry Stack Exchange site. That being said, the rules for mapping e.g. `CH4` to `methane`, `C2H6` to `ethane`, etc., are not so straightforward. You might need to use a database here to store all the mappings. – Tim Biegeleisen Jan 01 '19 at 01:39
  • I essentially have a chemical formula, with only two different elements (quantity of each known) and just want to convert it to its iupac name. I kind of wanted to avoid a database, considering how large it would have to be, but if there are no other solutions I guess I could use one. – bimbob Jan 01 '19 at 01:56

3 Answers3

3

It makes me put in the bond types manually, which seems like a ton more work than I should be bargaining for.

This is unavoidable. IUPAC names are based on the structural properties of a compound, not its formula. Most nontrivial chemical formulae will have numerous possible structural isomers -- for instance, C5H12 is the formula for n-pentane, methylbutane, and 2,2-dimethylpropane. There's no way to pick one of these names without knowing the structure first.

As an alternative, you may want to consider writing a tool for converting from a structural formula (e.g, in the SMILES format) to IUPAC names. This isn't a trivial task either, but it's at least feasible to do algorithmically.

  • I knew I would use my chemistry degrees for something eventually +1. – Tim Biegeleisen Jan 01 '19 at 02:13
  • I looked at SMILES earlier, but wouldn't I run into the same issue? The only real information I would be starting with was two elements and their quantities, I'm fairly certain, that as long as I'm outputting a correct name for the given formula, my (highschool) chemistry teacher won't care. I just want to have one name which correlates with the formula. But, I don't have the bonds and their locations, so I think SMILES format would also be annoying? – bimbob Jan 01 '19 at 02:16
  • SMILES _does_ tell you where the bonds are. Adjacent atoms are single-bonded by default; special characters like `=` and `#` are used for to denote double or triple bonds. –  Jan 01 '19 at 02:37
1

It really sounds like you need a database:

create table compound ( 
    first_chemical VARCHAR,
    first_amount INT,
    second_chemical VARCHAR,
    second_amount INT,
    name VARCHAR
)

and use it like

INSERT INTO compound VALUES('H', 2, 'O', 1, 'Water')

Then you can do something like

SELECT * FROM compound WHERE first_element = ? AND second_element = ?
corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • 2
    It's a complete non-solution. IUPAC is a systematic naming system. Its _entire purpose_ is to make it possible to name chemicals based on a set of (complex) rules, rather than looking those names up in a table -- which is what you're essentially suggesting here. –  Jan 01 '19 at 01:43
  • @duskwuff First, I highly disagree with you that it's a "complete non-solution" - second, a database's entire purpose is to perform exactly this kind of logic. So unless IUPAC provides a Java binding library (which I haven't been able to find one) then I don't really see what alternative there is. – corsiKa Jan 01 '19 at 01:46
  • I really didn't want to create a massive database full of iupac names, it seemed inefficient to me, because then I would have to search it every time a compound is created. – bimbob Jan 01 '19 at 01:54
  • @bimbob yes you would have to search it, however using indexes you'll find that to be very, very fast. In fact, if your "massive" database only has 10,000 or so compounds, even without indexes it will be very, very fast. – corsiKa Jan 01 '19 at 01:55
  • @duskwuff I have studied chemistry in the past, and the thing is, there are so any rules for IUPAC naming, that the prefixes, suffixes, etc., themselves are probably so large that we should store them in a database. Agreed that we don't need to store every IUPAC name, but the rules, yes, they probably belong in a database. – Tim Biegeleisen Jan 01 '19 at 01:57
  • @corsiKa 10,000 compounds is nothing. Many chemical databases run to the hundreds of millions of compounds. –  Jan 01 '19 at 01:57
  • I guess it wouldn't be as slow as I imagined, but I just quickly tried to find an existing list of binary compounds (like formula and iupac name) and couldn't find anything. Where do you suggest I should search? – bimbob Jan 01 '19 at 01:59
  • @bimbob most of such lists are commercial and cost substantial amounts of money - a quick search shows an online api called ChemSpider that has databases it uses, perhaps one of their sources has a free version of their database? – corsiKa Jan 01 '19 at 02:06
  • @TimBiegeleisen Perhaps! But that database would need a much different structure than the one proposed here, and even then the problem of converting chemical to structural formulae remains unaddressed. –  Jan 01 '19 at 02:12
  • @duskwuff Of course it does. That isn't the problem OP has. OP wants to click on two elements and have the potential binary compounds show up. You simply cannot create a formula from two elements. Clicking on O and H would show OH, H2O, H2O2, etc. The structure I propose is exactly suited to that purpose, and yes it must be populated manually (or extracted from someone else who populated it manually.) That fundamental problem doesn't change with IUPAC magic, because IUPAC would call H2O Dihydrogen Oxide, and most people would expect water instead. – corsiKa Jan 01 '19 at 04:36
  • @corsiKa That seems inconsistent with the use case described in the last line of the question: "For example: CO2 would output Carbon Dioxide". –  Jan 01 '19 at 04:39
  • @duskwuff It absolutely does not. The systematically generated version of CH4 is not Methane. It is the preferred name. The systematically generated name is Carbane. – corsiKa Jan 01 '19 at 04:46
  • @corsiKa You may be confusing that with another special case. "Meth-" is used consistently for all single-carbon chains: methane, methanol, methyl groups… –  Jan 01 '19 at 04:55
0

It's gonna take a (really) long time to program an API that uses IUPAC nomenclature. There is, however, a way of obtaining the proper chemical name for a chemical formula that does not require you to spend a lifetime creating an API. This is a really dirty workaround, but it does work. You can use the JSoup library to perform a chemical name search through this website. It sends an HTTP post request, parses the results, and returns a string array with the search results. It's really messy, and probably considered sinful to most programmers, but it works.

public static String[] searchIUPACName(String chemicalFormula) throws IOException {
    org.jsoup.nodes.Document doc = org.jsoup.Jsoup.connect("http://www.endmemo.com/chem/chemsearch.php")
            .data("Search", "Search").data("name", chemicalFormula).data("sel", "f").post();
    org.jsoup.select.Elements elements = doc.getElementById("note").getElementsByClass("cmline");
    if (elements.isEmpty())
        return new String[] { "No results" };
    String[] names = new String[elements.size() - 1];
    for (int i = 1; i < elements.size(); i++) {
        names[i - 1] = elements.get(i).getElementsByClass("cmname").get(0).getElementsByTag("a").get(0).text();
    }
    return names;
}

However, like duskwuff said, IUPAC names are based on the structural properties of a compound, not its formula. So you can get a chemical name, but it's not necessarily the proper IUPAC name.

Cardinal System
  • 2,749
  • 3
  • 21
  • 42
  • I guess I'll try this out, but it definitely seems like something any other programmer would frown upon lol. My chemistry teacher doesn't program and has no idea about coding in general, so i'm sure she won't care. (This is a creativity project, where you pick how you want to explain some aspect of chemistry) – bimbob Jan 01 '19 at 03:03
  • @bimbob just make sure that you have internet access at your school. If you do not have internet access, then the program will not be able to send the HTTP request. – Cardinal System Jan 01 '19 at 16:57