0

I have a set of 4 simultaneous equations:

0.059z = x
0.06w = y
z+w = 8093
x+y = 422

All the solutions I've found so far seem to be for equations that have all the variables present in each equation, then convert to matrices and use the solve function.

Is there an easier way to do this in R or Python using the equations in their original form?

Also, how can I ensure that only positive numbers are returned in the solution?

Hope this makes sense...many thanks for the help

CDJB
  • 14,043
  • 5
  • 29
  • 55
Keno
  • 133
  • 2
  • 10
  • 1
    have you seen `sympy`? it's a nice python module for doing symbolic computation of this sort – Sam Mason Nov 28 '19 at 13:40
  • Thanks for this. I tried it but I get a result with negative values, which doesn't work for the problem I'm trying to solve. – Keno Nov 28 '19 at 14:07
  • The problem: reviewing a medical study that gives summary data and I'm trying to calculate the original data. In this example: Patients <65 years vs >65 years old, each with infection vs no infection. Total patients = 8093. Total infections = 422. 5.9% of the <65 year old group are infected. 6% of >65 year group is infected. – Keno Nov 28 '19 at 14:13
  • sympy is for doing arbitrary symbolic computation, if you're getting negative values back then you've not specified your constraints correctly. maybe edit the question to show where you've got to – Sam Mason Nov 28 '19 at 14:23
  • But really, the easiest and fastest solution is to write this as a matrix – user8408080 Nov 28 '19 at 14:30

3 Answers3

2

You can use sympy for this:

from sympy import symbols, linsolve, Eq
x,y,z,w = symbols('x y z w')
linsolve([Eq(0.059*z, x), Eq(0.06*w, y), Eq(z+w, 8093), Eq(x+y, 422)], (x, y, z, w))

Output:

enter image description here

Regarding your comments about negative values - there is only one solution to the system of equations, and it has negative values for y and w. If there was more than one solution, sympy would return them, and you could filter the solutions from there to only positive values.

CDJB
  • 14,043
  • 5
  • 29
  • 55
2

In R, maybe you try it like below:

library(rootSolve)
library(zeallot)

model <- function(v){
  c(x,y,z,w) %<-% v
  return(c(0.059*z-x, 0.06*w-y, z+w-8093, x+y-422))
}
res <- multiroot(f = model, start = c(0,0,0,0))

then you can get the solution res as

> res
[1]   3751.22  -3329.22  63580.00 -55487.00
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

there are a few things going on here. first as CDJB notes: if there were any positive solutions then sympy would find them. I searched for those numbers and found this paper which suggests you should be using 7088 instead of 8093. we can do a quick sanity check:

def pct(value):
    return f"{value:.1%}"

print(pct(422 / 8093))  # ~5.2%
print(pct(422 / 7088))  # ~6.0%

confirming that you're going to struggle averaging ~5.9% and ~6.0% towards ~5.2%, and explaining the negative solutions in the other answers. further, these are presumably counts so all your variables also need to be whole numbers.

once this correct denominator is used, I'd comment that there are many solutions (11645 by my count) e.g:

cases = [1, 421]
pop = [17, 7071]

rates = [pct(c / p) for c, p in zip(cases, pop)]

gives the appropriate output, as does:

cases = [2, 420]
pop = [34, 7054]

this is because the data was rounded to two decimal places. you probably also don't want to use either of the above, they're just the first two valid solutions I got.

we can define a Python function to enumerate all solutions:

from math import floor, ceil

def solutions(pop, cases, rate1, rate2, err):
    target = (pct(rate1), pct(rate2))
    for pop1 in range(1, pop):
        pop2 = pop - pop1
        c1_lo = ceil(pop1 * (rate1 - err))
        c1_hi = floor(pop1 * (rate1 + err))

        for c1 in range(c1_lo, c1_hi+1):
            c2 = cases - c1
            if (pct(c1 / pop1), pct(c2 / pop2)) == target:
                yield c1, c2, pop1, pop2

all_sols = list(solutions(7088, 422, 0.059, 0.060, 0.0005))

which is where I got my count of 11645 above from.

not sure what to suggest with this, but you could maybe do a bootstrap to see how much your statistic varies with different solutions. another option would be to do a Bayesian analysis which would let you put priors over the population sizes and hence cut this down a lot.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • Thanks so much for this - can't believe you tracked down the original article just with the limited data I gave here! It also occurred to me that it's impossible to get an average of 5.2% from 2 groups with rates above 5.2%. The 7088 in the report is for the multivariate analysis - my original question is for the univariate analysis. There must be an error in the reporting of the study as the univariate table shows the 5.2% (based on the 8093). Thanks for you help with this – Keno Dec 02 '19 at 14:38
  • no probs; google makes that easy! didn't notice that 5.2% at the top of table 1a. at a guess they've biased things by keeping all the SSI records while excluding some non-SSI records because of missing values – Sam Mason Dec 02 '19 at 15:16