0

I'm attempting to use SimplePie to parse RSS feeds for a client (client is an author on Washington Post).

After reading through the documentation and using the example code as a reference, I was able to get the feeds parsed into the site, but now I'm encountering an issue where the apostrophe character isn't decoded ( ' is displayed as ')

I've attempted to resolve this issue using the suggested solutions in the SimplePie FAQ: 1. Verified that the site's meta tag 2. Using SimplePie's handle_content_type() function 3. Use PHP's built-in header() function to correct the HTTP headers

Unfortunately none of these have resolved the problem for me.

Below is the code I'm using to parse the RSS feed:

<?php

require_once('php/autoloader.php');

$feedJB = new SimplePie();
$feedJB->set_feed_url('http://washingtontimes.dynamic.feedsportal.com/pf/637323/communities.washingtontimes.com/neighborhood/feeds/latest/status-update/');
$feedJB->init();
$feedJB->handle_content_type();

$feedRB = new SimplePie();
$feedRB->set_feed_url('http://washingtontimes.dynamic.feedsportal.com/pf/637323/communities.washingtontimes.com/neighborhood/feeds/latest/2nd-golden-era-advertising/');
$feedRB->init();
$feedRB->handle_content_type();

?>

This is the output code on the page:

<!-- Left -->
            <li class="left">
                <h3>Recent Posts</h3>
                <ul class="feed-list">
                    <?php foreach ($feedJB->get_items(0, 5) as $item): ?>
                    <li>
                        <strong><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></strong>
                        <small>Posted on <?php echo $item->get_date('j F Y'); ?></small>
                    </li>
                    <?php endforeach; ?>
                    <li><h4><a href="<?php echo $feedJB->get_permalink(); ?>">Read more articles by Jeff</a></h4></li>
                </ul>
            </li>
            <!-- /Left -->

            <!-- Right -->
            <li class="right">
                <h3>Recent Posts</h3>
                <ul class="feed-list">
                    <?php foreach ($feedRB->get_items(0, 5) as $item): ?>
                    <li>
                        <strong><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></strong>
                        <small>Posted on <?php echo $item->get_date('j F Y'); ?></small>
                    </li>
                    <?php endforeach; ?>
                    <li><h4><a href="<?php echo $feedRB->get_permalink(); ?>">Read more articles by Rob</a></h4></li>
                </ul>
            </li>
            <!-- /Right -->

I've tested this locally on my Machine (Mac Pro Lion running MAMP) as well as my web server (Linux running Apache 2.2.22 & PHP 5.2.17).

You can also view this for the time being by going to the following link: http://clients.josephmainwaring.com/statuscreative/#!columns.php

If anyone has suggestions to address the character encoding issue it would be greatly appreciated.

theaccordance
  • 889
  • 5
  • 13

1 Answers1

0

I've found that the Washington Post's feeds are all served as ISO-8859-1 even when they contain UTF-8 characters. I don't use SimplePie, but every time I fetch a feed, I run it through the following function, where $xml is the text of the feed, and $url is the feed's URL:

function feed_fix_broken ( $xml, $url ) {
  $xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml );
  $broken = array ('washingtonpost.com' => 'ISO-8859-1');
  foreach ($broken as $domain => $encoding) {
    if (stristr($url, $domain)) {
      $xml = iconv( 'UTF-8', $encoding.'//TRANSLIT', $xml );
    }
  }
  return $xml;
}

This transliterates UTF-8 encoded entities to their ISO-8859-1 counterparts, where possible.

Notice that in FeedDemon, "Chávez" is screwy...

"Chávez" is screwy...

but I've got it right.

but I've got it right

danmactough
  • 5,444
  • 2
  • 21
  • 22