0

I am trying to start a project based on web scraping. I have the tools already setup for different platforms for JSON I use SwiftyJSON and for raw HTML I use hpple. My problem is I am trying to setup some generic class for content and some generic class for the fetcher for the content. Since every operation goes like this,

Login If there is username or password supply it. If it has captcha display and use the result Fetch the data using Alamofire Scrape the data either by using JSON or HTML Populate the content class.

I am wondering if there is a way to define some kind of protocol, enum or generic templates so that for each class I can define those different functions. I think if I can’t make this right, I will write the same code over and over again. This is what I have come up with. I will appreciate if you can help me to set this up right.

enum Company:Int {
    case CNN
    case BBC
    case HN
    case SO 
    
    var captcha:Bool {
        switch self {
        case CNN:
            return false
        case BBC:
            return true
        case HN:
            return true
        case SO:
            return false
        }
    }
    var description:String {
        get {
            switch self {
            case CNN:
                return "CNN"
            case BBC:
                return "BBC"
            case HN:
                return "Hacker News"
            case SO:
                return "Stack Overflow"
            }
        }
    }
}

class Fetcher {
    var username:String?
    var password:String?
    var url:String
    var company:Company
    
    init(company: Company, url:String) {
        self.url = url
        self.company = company
    }
    
    init(company: Company, url:String,username:String,password:String) {
        self.url = url
        self.company = company
        self.username = username
        self.password = password
    }
    
    func login() {
        
        if username != nil {
           // login
        }
        if company.captcha {
            //show captcha
        }
    }
    
    func fetch(){
        
    }
    
    func populate() {
        
    }
}

class CNN: Fetcher {
    
    
}
Meanteacher
  • 2,031
  • 3
  • 17
  • 48

2 Answers2

5

Here is what i did using alamofire and alamofire object mapper: Step 1: Create modal classes that conforms to Mappable protocols.

class StoreListingModal: Mappable {
var store: [StoreModal]?
var status: String?
required init?(_ map: Map){

}

func mapping(map: Map) {
    store <- map["result"]
    status <- map["status"]
}
}

Step 2: Create a fetch request using the generic types:

func getDataFromNetwork<T:Mappable>(urlString: String, completion: (T?, NSError?) -> Void) {
    Alamofire.request(.GET, urlString).responseObject { (response: Response<T, NSError>) in
        guard response.result.isSuccess else{
            print("Error while fetching: \(response.result.error)")
            completion(nil, response.result.error)
            return
        }
        if let responseObject = response.result.value{
            print(responseObject)
            completion(responseObject, nil)
        }
    }
}

Step 3: Now all you need is to call this fetch function. This can be done like this:

self.getDataFromNetwork("your url string") { (userResponse:StoreListingModal?, error) in

    }

You will not only get your response object but it will also be mapped to your modal class.

Kunal
  • 227
  • 3
  • 9
0

Okay, this was a fun exercise...

You really just need to build out your Company enumeration further to make your Fetcher more abstract. Here's an approach that only slightly modifies your own that should get you much closer to what you are trying to achieve. This is based on a previous reply of mine to a different question of yours.

Company

enum Company: Printable, URLRequestConvertible {
    case CNN, BBC, HN, SO

    var captcha: Bool {
        switch self {
        case CNN:
            return false
        case BBC:
            return true
        case HN:
            return true
        case SO:
            return false
        }
    }

    var credentials: (username: String, password: String)? {
        switch self {
        case CNN:
            return ("cnn_username", "cnn_password")
        case BBC:
            return nil
        case HN:
            return ("hn_username", "hn_password")
        default:
            return nil
        }
    }

    var description: String {
        switch self {
        case CNN:
            return "CNN"
        case BBC:
            return "BBC"
        case HN:
            return "Hacker News"
        case SO:
            return "Stack Overflow"
        }
    }

    var loginURLRequest: NSURLRequest {
        var URLString: String?

        switch self {
        case CNN:
            URLString = "cnn_login_url"
        case BBC:
            URLString = "bbc_login_url"
        case HN:
            URLString = "hn_login_url"
        case SO:
            URLString = "so_login_url"
        }

        return NSURLRequest(URL: NSURL(string: URLString!)!)
    }

    var URLRequest: NSURLRequest {
        var URLString: String?

        switch self {
        case CNN:
            URLString = "cnn_url"
        case BBC:
            URLString = "bbc_url"
        case HN:
            URLString = "hn_url"
        case SO:
            URLString = "so_url"
        }

        return NSURLRequest(URL: NSURL(string: URLString!)!)
    }
}

News

struct News {
    let title: String
    let content: String
    let date: NSDate
    let author: String
}

Fetcher

class Fetcher {

    typealias FetchNewsSuccessHandler = [News] -> Void
    typealias FetchNewsFailureHandler = (NSHTTPURLResponse?, AnyObject?, NSError?) -> Void

    // MARK: - Fetch News Methods

    class func fetchNewsFromCompany(company: Company, success: FetchNewsSuccessHandler, failure: FetchNewsFailureHandler) {
        login(
            company: company,
            success: { apiKey in
                Fetcher.fetch(
                    company: company,
                    apiKey: apiKey,
                    success: { news in
                        success(news)
                    },
                    failure: { response, json, error in
                        failure(response, json, error)
                    }
                )
            },
            failure: { response, json, error in
                failure(response, json, error)
            }
        )
    }

    // MARK: - Private - Helper Methods

    private class func login(
        #company: Company,
        success: (String) -> Void,
        failure: (NSHTTPURLResponse?, AnyObject?, NSError?) -> Void)
    {
        if company.captcha {
            // You'll need to figure this part out on your own. First off, I'm not really sure how you
            // would do it, and secondly, I think there may be legal implications of doing this.
        }

        let request = Alamofire.request(company.loginURLRequest)

        if let credentials = company.credentials {
            request.authenticate(username: credentials.username, password: credentials.password)
        }

        request.responseJSON { _, response, json, error in
            if let error = error {
                failure(response, json, error)
            } else {
                // NOTE: You'll need to parse here...I would suggest using SwiftyJSON
                let apiKey = "12345678"
                success(apiKey)
            }
        }
    }

    private class func fetch(
        #company: Company,
        apiKey: String,
        success: FetchNewsSuccessHandler,
        failure: FetchNewsFailureHandler)
    {
        let request = Alamofire.request(company.URLRequest)
        request.responseJSON { _, _, json, error in
            if let error = error {
                failure(response, json, error)
            } else {
                // NOTE: You'll need to parse here...I would suggest using SwiftyJSON
                let news = [News]()
                success(news)
            }
        }
    }
}

Example ViewController Calling Fetcher

class SomeViewController: UIViewController {

    override func viewDidLoad() {
        super.viewDidLoad()

        Fetcher.fetchNewsFromCompany(
            Company.CNN,
            success: { newsList in
                for news in newsList {
                    println("\(news.title) - \(news.date)")
                }
            },
            failure { response, data, error in
                println("\(response) \(error)")
            }
        )
    }
}

By allowing the Company object to flow through your Fetcher, you should never have to track state for a company in your Fetcher. It can all be stored directly inside the Enum.

Hope that helps. Cheers.

Community
  • 1
  • 1
cnoon
  • 16,575
  • 7
  • 58
  • 66
  • Thanks for through explanation. I think captcha mechanism should be different function. Because if there is captcha, there must be fist request to captha url, show the user. After user enters the detail,l then we can fetch the result. Or should I present another view controller with captcha? If it is not to much to ask, could you edit your code to return with success failure block and also add one example for usage of this class? Thanks again very much I progressed a lot thanks to your valuable help. – Meanteacher Mar 12 '15 at 16:56
  • Hi @Meanteacher, I updated the answer to include `SomeViewController` which shows how to actually put the `Fetcher` to use. I also updated `Fetcher` captcha information. I think you may have some legal implications of what you're doing, so I am going to opt out of that portion of your question. I think architecturally speaking through your approach for pulling out of the `login` logic is sound. – cnoon Mar 14 '15 at 05:13
  • Thanks for your edited answer. The thing is each web site as you know can have complicated data structure. Having just single URL, username, etc won't cut it.I created singleton(?) like below class var shared: SomeSite { struct Static { static let instance: SomeSite = SomeSite() } return SomeSite.instance } so that in my class functions I can refer to shared instance and use same session variables whenever I need to connect to that web site. So for each web site I have switch statement and I call their type methods. Ugly but works. – Meanteacher Mar 14 '15 at 14:22