0

I'm reading a RSS feed that contains video url. I'm using ROME and does not succeed to extract it. Below is the RSS feed xml and I'm trying to extract

I would appreciate any help.

The Rss Feed:

<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        <title>Wochit - Top Stories</title>
        <link>http://www.wochit.com</link>
        <description>
            Features the latest breaking news videos from Wochit
        </description>
        <copyright>Copyright 2012, wochit.com</copyright>
        <language>en-us</language>
        <lastBuildDate>Mar 17, 2016 8:22:46 AM</lastBuildDate>
        <source>
            Rolling Stone Music, EOnline, The Guardian USnews, AP Entertainment, Mashable, Reuters Entertainment, AP US, The Hollywood Reporter, Reuters, The Daily Beast, Wired.com Tech, News 24, Buzzfeed US, MSNBC, CNN Entertainment, Geek.com, Reuters World, Rolling Stone News, The Hollywood Reporter - TV, Comic_book, CBS US News, Politico Picks, Fusion. net, Screen Rant
        </source>
        <item>
            <link>http://www.wochit.com/share-video/107249827</link>
            <guid>107249827</guid>
            <origin>
                <link>
                    http://www.cbsnews.com/videos/what-do-we-know-about-obamas-supreme-court-pick/
                </link>
                <title>Obama's Supreme Court Pick</title>
                <source>CBS US News</source>
            </origin>
            <pubDate>Mar 17, 2016 8:22:46 AM</pubDate>
            <media:description>
                President Obama has nominated Judge Merrick Garland as the nominee for the Supreme Court to replace the late Justice Antonin Scalia. Garland is a known and respected figure for environmental causes. He is considered a moderate as well. Obama demanded a fair hearing for Judge Garland and said that refusing to even consider his nomination would provoke “an endless cycle. The republican party and its senate members have vowed to disapprove of anyone the current president nominates.
            </media:description>
            <media:transcript>
                President Obama has nominated Judge Merrick Garland as the nominee for the Supreme Court to replace the late Justice Antonin Scalia. Garland is a known and respected figure for environmental causes. He is considered a moderate as well. Obama demanded a fair hearing for Judge Garland and said that refusing to even consider his nomination would provoke “an endless cycle. The republican party and its senate members have vowed to disapprove of anyone the current president nominates.
            </media:transcript>
            <media:backlink>
                http://api.wochit.com/api/linkback?VIDEO_GUID=107249827&sc=f2c3306810e1c7658ebfbad2d70a2c92cc6dcdae515ff5605a4b9b38a1361bac
            </media:backlink>
            <media:source>CBS US News</media:source>
            <embedCodeSnippet>
                <![CDATA[
                <script language='javascript' type='text/javascript' src='http://api.wochit.com/api/wochitplayer.js?code=eyJ2ZXJzaW9uIjoiMS4wIiwicGFydG5lcklkIjoiZjJjMzMwNjgxMGUxYzc2NThlYmZiYWQyZDcwYTJjOTJjYzZkY2RhZTUxNWZmNTYwNWE0YjliMzhhMTM2MWJhYyIsInByb2dyYW1tZXJOYW1lIjoid29jaGl0IiwidHlwZSI6IkNMSVBfSUQiLCJkYXRhIjoiMTA3MjQ5ODI3IiwicHJlZmVycmVkVmlkZW9GYW1pbHkiOiJIRCJ9&pid=f2c3306810e1c7658ebfbad2d70a2c92cc6dcdae515ff5605a4b9b38a1361bac&progn=wochit&autostart=false&width=640&height=360' data-wochit-uid='b0pciod062'></script>
                ]]>
            </embedCodeSnippet>
            <embedCodeIframe>
                <![CDATA[
                <iframe src="http://api.wochit.com/api/player?code=eyJ2ZXJzaW9uIjoiMS4wIiwicGFydG5lcklkIjoiZjJjMzMwNjgxMGUxYzc2NThlYmZiYWQyZDcwYTJjOTJjYzZkY2RhZTUxNWZmNTYwNWE0YjliMzhhMTM2MWJhYyIsInByb2dyYW1tZXJOYW1lIjoid29jaGl0IiwidHlwZSI6IkNMSVBfSUQiLCJkYXRhIjoiMTA3MjQ5ODI3IiwicHJlZmVycmVkVmlkZW9GYW1pbHkiOiJIRCJ9&pid=f2c3306810e1c7658ebfbad2d70a2c92cc6dcdae515ff5605a4b9b38a1361bac&progn=wochit&autostart=false&width=640&height=360" frameBorder="0" style="overflow:hidden" scrolling="no" height="360" width="640"></iframe>
                ]]>
            </embedCodeIframe>
            <media:vidAssetPart>34</media:vidAssetPart>
            <media:text>
                Contact your |local office| for all commercial or promotional uses. Full editorial rights UK, US, Ireland, Canada (not Quebec). Restricted editorial rights for daily newspapers elsewhere, please call. A MAY 1, 2008, FILE PHOTO Broadcasters: NO ACCESS USA/NO ACCESS CNN Digital: FOR BROADCAST CLIENT USE ONLY/NO ACCESS INTERNET/MOBILE/WIRELESS . For Reuters customers only.
            </media:text>
            <media:category>Law & Crime</media:category>
            <media:category>News</media:category>
            <description>
                President Obama has nominated Judge Merrick Garland as the nominee for the Supreme Court to replace the late Justice Antonin Scalia. Garland is a known and respected figure for environmental causes. He is considered a moderate as well. Obama demanded a fair hearing for Judge Garland and said that refusing to even consider his nomination would provoke “an endless cycle. The republican party and its senate members have vowed to disapprove of anyone the current president nominates.
            </description>
            <media:keywords>...</media:keywords>
            <media:thumbnail url="http://wochitprod3-a.akamaihd.net/artifacts/headlines/singlePlus/107249827/107249827-1280x720_1_Mar_17_2016_13_22_15_poster.jpg"/>
            <title>Obama's Supreme Court Pick</title>
            <media:content medium="VIDEO" channels="2" bitrate="3072.0" duration="43" expression="full" fileSize="14985855" framerate="0.0" height="720" lang="en" samplingrate="44100.0" type="video/mp4" width="1280" isDefault="true" url="http://wochitprod3-a.akamaihd.net/artifacts/headlines/singlePlus/107249827/107249827-1280x720_Mar_17_2016_13_22_15.MP4?sc=f2c3306810e1c7658ebfbad2d70a2c92cc6dcdae515ff5605a4b9b38a1361bac"/>
        </item>
    </channel>
</rss>

This code so far works fine:

SyndFeed feed = null;
                try {
                    logger.info("Building feed for Url: " + feedUrl);
                    feed = new SyndFeedInput().build(reader);
                } catch (Exception e2) {
                    logger.error(e2);
                } finally {
                    try {
                        if (reader != null) {
                            reader.close();
                        }
                    } catch (IOException e) {
                        logger.error(e);
                    }
                }

                if (feed != null) {
                    logger.info("Feed from " + feedUrl + " is not null, total of " + feed.getEntries().size() + " entries");
                    String feedTitle = feed.getTitle();

                    for (Object entryObj : feed.getEntries()) {
                        SyndEntryImpl entry = (SyndEntryImpl) entryObj;
                        String author = entry.getAuthor();

                        String uri = entry.getUri();
                        logger.info("reading object from feed " + uri);
                        String description = "";
                        if (entry.getDescription() != null) {
                            description = entry.getDescription().getValue();
                        }
                        String title = entry.getTitle();
                        Date publishedDate = entry.getPublishedDate();
                        List<?> categoryList = entry.getCategories();

                        String transcript = null;
                        List<Element> foreignMarkup = (List<Element>) entry.getForeignMarkup();
                        if (foreignMarkup != null && foreignMarkup.size() > 0) {
                            for (Element element : foreignMarkup) {
                                //get name
                                String name = element.getName();
                                if (name != null && name.equals("transcript")) {
                                    transcript = element.getText();
                                }
                            }
                        }
                        String link = entry.getLink();
                        Date currentTime = Calendar.getInstance().getTime();

                        String categoryStr = "";

                        for (Object category : categoryList) {
                            SyndCategory sc = (SyndCategory) category;
                            categoryStr += sc.getName() + ",";
                        }

                        //HERE IS THE INSERTION TO THE DB
                    }

                    logger.info("End reading from feed " + feedUrl);
                }

Thanks!

janih
  • 2,214
  • 2
  • 18
  • 24

1 Answers1

0

If <media:backlink>contains the url in question, then extracting it would be simply:

String backlink = null;
for (Element foreignMarkup : entry.getForeignMarkup()) {
    if (foreignMarkup.getNamespaceURI().equals("http://search.yahoo.com/mrss/")) {
        if (foreignMarkup.getName().equals("backlink")) {
            backlink = foreignMarkup.getValue();
        }
    }
}

If you have rome-modules in your application's classpath, you can use the more flexiple API of the media rss module:

for (Module module : entry.getModules()) {
    if (module instanceof MediaEntryModule) {
        MediaEntryModule media = (MediaEntryModule)module;
        for (MediaContent mediaContent : media.getMediaContents()) {
            System.out.println(mediaContent.getReference());
        }               
    }
}
janih
  • 2,214
  • 2
  • 18
  • 24