You're on the right track with those equations. You just need to consider each of the four possible policies in turn: (slow, slow), (fast, slow), (slow, fast), (fast, fast).
Consider (slow, fast):
From a) you have already seen J*(cool) = 40.
J*(warm) = 10 + 0.9 * (0.875 * J*(warm) + 0.125 * J*(off))
J*(warm) = 10 + 0.9 * (0.875 * J*(warm) + 0.125 * 0)
J*(warm) = 47.06
For (slow, slow):
Again J*(cool) is independent of your action in the warm state so J*(cool) = 40.
J*(warm) = 4 + 0.9 * (0.5 * J*(cool) + 0.5 * J*(warm))
J*(warm) = 4 + 0.9 * (0.5 * 40 + 0.5 * J*(warm))
J*(warm) = 40
And for (fast, fast):
This time the value of being in the warm state is independent of the cool action and is J*(warm) = 47.06, from above.
J*(cool) = 10 + 0.9 * (0.25 * J*(cool) + 0.75 * J*(warm))
J*(cool) = 10 + 0.9 * (0.25 * J*(cool) + 0.75 * 47.06)
J*(cool) = 53.89
Lastly (fast, slow):
This is the hardest case, but we have two equations and two unknowns so we can solve using simultaneous equations.
J*(cool) = 10 + 0.9 * (0.25 * J*(cool) + 0.75 * J*(warm))
J*(warm) = 4 + 0.9 * (0.5 * J*(cool) + 0.5 * J*(warm))
J*(warm) = (4 + 0.45 * J*(cool))/0.55
J*(cool) = 10 + 0.9 * (0.25 * J*(cool) + 0.75 * (4 + 0.45 * J*(cool))/0.55)
J*(cool) = 66.94
J*(warm) = 62.04
As we can see the highest value that can be obtained if we start in the warm state is 62.04. The highest value starting in cool is 66.94. Both of these occur when our policy is (fast, slow), ie fast in cool, slow in warm, hence this is the optimal policy.
As it turns out it is not possible to have a policy that is optimal is you start in state A but not optimal if you start in state B. It is also worth noting that for these types of infinite time horizon MDPs, you can prove that the optimal policy will always be stationary, that is if it is optimal to take the slow action in the cool state at time 1, it will be optimal to take the slow action for all times.
Finally, in practice the number of states and actions are much larger than in this question and more advanced techniques, such as value iteration, policy iteration or dynamic programming are typically required.