You write:
the model doesn't give satisfactory results.
But what you mean is that the model isn't giving you the results you expect / want. i.e., you want the model to pick out periods the NBER has labeled as "Recessions", but the Markov switching model is simply finding the parameters which maximize the likelihood function for the data.
(The rest of the post shows results that are taken from this Jupyter notebook: http://nbviewer.jupyter.org/gist/ChadFulton/a5d24d32ba3b7b2e381e43a232342f1f)
(I'll also note that I double-checked these results using E-views, and it agrees with Statsmodels' output almost exactly).
The raw dataset is the growth rate (log difference * 100) of real GNP; the Hamilton dataset versus one found on the Federal Reserve Economic Database are shown here, with grey bars indicating NBER-dated recessions:

In this case, the model is an AR(4) on the growth rate of real GNP, with a regime-specific intercept; the model allows two regimes. The idea is that "recessions" should correspond to a low (or negative) average growth rate and expansions should correspond to a higher average growth rate.
Model 1: Hamilton's dataset: Maximum likelihood estimation of parameters
From the model applied to Hamilton's (1989) dataset, we get the following estimated parameters:
Markov Switching Model Results
================================================================================
Dep. Variable: Hamilton No. Observations: 131
Model: MarkovAutoregression Log Likelihood -181.263
Date: Sun, 02 Apr 2017 AIC 380.527
Time: 19:52:31 BIC 406.404
Sample: 04-01-1951 HQIC 391.042
- 10-01-1984
Covariance Type: approx
Regime 0 parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.3588 0.265 -1.356 0.175 -0.877 0.160
Regime 1 parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.1635 0.075 15.614 0.000 1.017 1.310
Non-switching parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
sigma2 0.5914 0.103 5.761 0.000 0.390 0.793
ar.L1 0.0135 0.120 0.112 0.911 -0.222 0.249
ar.L2 -0.0575 0.138 -0.418 0.676 -0.327 0.212
ar.L3 -0.2470 0.107 -2.310 0.021 -0.457 -0.037
ar.L4 -0.2129 0.111 -1.926 0.054 -0.430 0.004
Regime transition parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
p[0->0] 0.7547 0.097 7.819 0.000 0.565 0.944
p[1->0] 0.0959 0.038 2.542 0.011 0.022 0.170
==============================================================================
and the time series of the probability of operating in regime 0 (which here corresponds to a negative growth rate, i.e. a recession) looks like:

Model 2: Updated dataset: Maximum likelihood estimation of parameters
Now, as you saw, we can instead fit the model using the "updated" dataset (which looks pretty much like the original dataset), to get the following parameters and regime probabilities:
Markov Switching Model Results
================================================================================
Dep. Variable: GNPC96 No. Observations: 131
Model: MarkovAutoregression Log Likelihood -188.002
Date: Sun, 02 Apr 2017 AIC 394.005
Time: 20:00:58 BIC 419.882
Sample: 04-01-1951 HQIC 404.520
- 10-01-1984
Covariance Type: approx
Regime 0 parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -1.2475 3.470 -0.359 0.719 -8.049 5.554
Regime 1 parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 0.9364 0.453 2.066 0.039 0.048 1.825
Non-switching parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
sigma2 0.8509 0.561 1.516 0.130 -0.249 1.951
ar.L1 0.3437 0.189 1.821 0.069 -0.026 0.714
ar.L2 0.0919 0.143 0.645 0.519 -0.187 0.371
ar.L3 -0.0846 0.251 -0.337 0.736 -0.577 0.408
ar.L4 -0.1727 0.258 -0.669 0.503 -0.678 0.333
Regime transition parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
p[0->0] 0.0002 1.705 0.000 1.000 -3.341 3.341
p[1->0] 0.0397 0.186 0.213 0.831 -0.326 0.405
==============================================================================

To understand what the model is doing, look at the intercepts in the two regimes. In Hamilton's model, the "low" regime has an intercept of -0.35, whereas with the updated data, the "low" regime has an intercept of -1.25.
What that tells us is that with the updated dataset, the model is doing a "better job" fitting the data (in terms of a higher likelihood) by choosing the "low" regime to be much deeper recessions. In particular, looking back at the GNP data series, it's apparent that it's using the "low" regime to fit the very low growth in the late 1950's and early 1980's.
In contrast, the fitted parameters from Hamilton's model allow the "low" regime to fit "moderately low" growth rates that cover a wider range of recessions.
We can't compare these two models' outcomes using e.g. the log-likelihood values because they're using different datasets. One thing we could try, though is to use the fitted parameters from Hamilton's dataset on the updated GNP data. Doing that, we get the following result:
Model 3: Updated dataset using parameters estimated on Hamilton's dataset
Markov Switching Model Results
================================================================================
Dep. Variable: GNPC96 No. Observations: 131
Model: MarkovAutoregression Log Likelihood -191.807
Date: Sun, 02 Apr 2017 AIC 401.614
Time: 19:52:52 BIC 427.491
Sample: 04-01-1951 HQIC 412.129
- 10-01-1984
Covariance Type: opg
Regime 0 parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.3588 0.185 -1.939 0.053 -0.722 0.004
Regime 1 parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 1.1635 0.083 13.967 0.000 1.000 1.327
Non-switching parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
sigma2 0.5914 0.090 6.604 0.000 0.416 0.767
ar.L1 0.0135 0.100 0.134 0.893 -0.183 0.210
ar.L2 -0.0575 0.088 -0.651 0.515 -0.231 0.116
ar.L3 -0.2470 0.104 -2.384 0.017 -0.450 -0.044
ar.L4 -0.2129 0.084 -2.524 0.012 -0.378 -0.048
Regime transition parameters
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
p[0->0] 0.7547 0.100 7.563 0.000 0.559 0.950
p[1->0] 0.0959 0.051 1.872 0.061 -0.005 0.196
==============================================================================

This looks more like what you expected / wanted, and that's because as I mentioned above, the "low" regime intercept of 0.35 makes the "low" regime a good fit for more time periods in the sample. But notice that the log-likelihood here is -191.8, whereas in Model 2 the log-likelihood was -188.0.
Thus even though this model looks more like what you wanted, it does not fit the data as well.
(Note again that you can't compare these log-likelihoods to the -181.3 from Model 1, because that is using a different dataset).