How to get titles name from a news website?

Question

I want to get data from a news website without RSS Feed, I just want to get titles name, And i am using this code -

var url = NSURL(string: "http://www.gulf-times.com/stories/c/192/0/Sport")

        if url != nil {
            let task = URLSession.shared.dataTask(with: url! as URL, completionHandler: { (data, response, error) -> Void in
                print(data)

                if error == nil {

                    var urlContent = NSString(data: data!, encoding: String.Encoding.ascii.rawValue) as NSString!

                    print(urlContent)
                }
            })
            task.resume()
        }

Problem is that, I am unable to get titles value-

I'm not a swift developer, but I think it's a generic problem rather than language. So if you can help me with the value `urlContent` is holding, then I think solution can be provided. — Vipin Kumar, Nov 20 '17 at 07:56
as I am print the value of urlContect, its provide- https://justpaste.it/1dpot — iDeveloper, Nov 20 '17 at 08:01
Rather than those titles which are highlighted in you image and would need a javascript parser, can you not use some HTML tool to extract the same titles from the HTML? I.e. they all seem to be embedded lower down in an `
` element with class `bord-192`. Does that combination only apply to titles on that page? — Damien_The_Unbeliever, Nov 20 '17 at 08:15
@Damien_The_Unbeliever, Only an HTML extractor can help me? Because as I am checking of format, Its like - ` — iDeveloper, Nov 20 '17 at 08:23
It looks like there are plenty of [options for HTML parsing in swift](https://stackoverflow.com/q/31080818/15498). I personally would prefer to use a tool such as that and writing a nice concise XPath expression than starting (as you seem to here) with manually pulling apart mixed HTML and Javascript. — Damien_The_Unbeliever, Nov 20 '17 at 11:10

Vipin Kumar · Accepted Answer · 2017-11-20T13:07:05.437

If you are comfortable with RegEx then use following pattern

/title = "(.*)"/g

This will give you all the titles.

Modified:

Please use like below

let matched = matches(for: "title = \"(.*)\"", in: contentOfPage)

matches : Function

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Getting following result

[  
   "title = \"Al-Khelaifi voted Asian tennis Chairman\"",
   "title = \"Dimitrov downs Goffin for ATP Tour Finals crown\"",
   "title = \"Coach Lehmann calls for Australia to get behind Ashes selection\"",
   "title = \"Federer expects great things from returning trio\"",
   "title = \"Whateley wins Air Maroc League second stage\"",
   "title = \"Qatar-based Frijns finishes strong as Oliphant wins\"",
   "title = \"Sutton faces tough road ahead to get Chinese on track\"",
   "title = \"Qatar, Japan sign deal to import, export race horses\"",
   "title = \"Islanders deny Lightning comeback for third straight win\"",
   "title = \"Curry leads Golden Warriors fightback after Sixers blitz\"",
   "title = \"Fleetwood claims European Order of Merit as Rose falters\"",
   "title = \"Challengers win thriller against City Exchange\""
]

How are creating this path `"title = \"(.*)\""`? Can you please tell me. — iDeveloper, Nov 22 '17 at 10:26
Here you can test your regex to match as per your need. The regex I have given matches the content that starts with `"title = \"` and ends with `"`. — Vipin Kumar, Nov 22 '17 at 10:40

score 0 · Answer 2 · answered Nov 20 '17 at 08:11

0

You get the response of the url is a String then just parse the string into sub strings.

For example the title is the sub string between var title = and var brief =.

The String method like split, or components(separatedBy:_) etc. can make it.

answered Nov 20 '17 at 08:11

William Hu

15,423
11
100
121

Not understand, exactly. – iDeveloper Nov 20 '17 at 08:27

How to get titles name from a news website?

` element with class `bord-192`. Does that combination only apply to titles on that page?

2 Answers2