1

I'm new to Python and Pandas, so I would be really glad if someone could help me in this matter. My question is the following:

If I have a .txt file with a set of reactions as strings (R1, R2...). Each reaction has compounds (A,B,C,D...) with their respective stoichiometric coefficients (1, 2, 3...) such as:

R1: A + 2B + C <=> D

R2: A + B <=> C

How can I create a data frame in python in the format of a stoichiometric matrix (compounds as rows X reactions as columns) like this:

  R1 R2
A -1 -1 
B -2 -1
C -1  1
D  1  0

Observation: Compounds on the left side of the equation should have negative stoichiometric values while the ones on the right should be positive

Thanks =D

Microdot
  • 37
  • 6
  • 1
    How are your reaction data stored? As strings in a txt file, or is there already some kind of structure to it? Also, compound C in R2 should be +1, right? – WolfgangK Apr 18 '18 at 11:08
  • Thanks @WolfgangK. The reactions are stored as strings in a txt file and I've just corrected the C coefficient in R2. – Microdot Apr 18 '18 at 12:28
  • 1
    This may or may not be applicable to what you are doing but you might check out [cobrapy](https://github.com/opencobra/cobrapy). It's a nice way to create and manage a stoichiometric matrix for a reaction network. It also has solvers and algorithms for analysis. – noslenkwah Apr 18 '18 at 15:35

1 Answers1

1

Try this:

import pandas as pd
import re  # regular expressions

def coeff_comp(s):
    # Separate stoichiometric coefficient and compound
    result = re.search('(?P<coeff>\d*)(?P<comp>.*)', s)
    coeff = result.group('coeff')
    comp = result.group('comp')
    if not coeff:
        coeff = '1'                          # coefficient=1 if it is missing
    return comp, int(coeff)

equations = ['R1: A + 2B + C <=> D', 'R2: A + B <=> C']  # some test data
reactions_dict = {}                          # results dictionary

for equation in equations:
    compounds = {}                           # dict -> compound: coeff 
    eq = equation.replace(' ', '')  
    r_id, reaction = eq.split(':')           # separate id from chem reaction
    lhs, rhs = reaction.split('<=>')         # split left and right hand side
    reagents = lhs.split('+')                # get list of reagents
    products = rhs.split('+')                # get list of products
    for reagent in reagents:
        comp, coeff = coeff_comp(reagent)
        compounds[comp] = - coeff            # negative on lhs
    for product in products:
        comp, coeff = coeff_comp(product)
        compounds[comp] = coeff              # positive on rhs
    reactions_dict[r_id] = compounds         

# insert dict into DataFrame, replace NaN with 0, let values be int
df = pd.DataFrame(reactions_dict).fillna(value=0).astype(int)

The output looks like

   R1  R2
A  -1  -1
B  -2  -1
C  -1   1
D   1   0
WolfgangK
  • 953
  • 11
  • 18