1

I am making the below request to Youtube using URLRequest and URLSession. Most of the response looks fine, however I found that some of the script elements that are being returned have what seem to be escaped encoded characters such as { } [ ] = ' " in the form of \\x7b \\x7d \\x5b \\x5d \\x22 \\x3d \\x27

        let url = URL(string: "https://www.youtube.com/channel/UCPHWVzGcW-iozudjp8U984g/")
        guard let requestUrl = url else { fatalError() }
        var request = URLRequest(url: requestUrl)
        request.httpMethod = "GET"
        let task = URLSession.shared.dataTask(with: request) { [self] (data, response, error) in
            
            if let error = error {
                print("Error took place \(error)")
            }

            if let data = data, let dataString = String(data: data, encoding: .utf8) {
                print("Response data string:\n \(dataString)")
            }
          
        }
        task.resume()

I have done this request in Java using okhttp3 and I didn't see any encoding left in these script elements there, and they also seem just fine doing a source inspection in multiple browsers.

I tried to remove them by using replacingOccurrences which works, but for some reason the JSON is still malformed, so I must be missing some of the other weird encoding being returned. Is there any built in way to remove this encoding, or to get URLSession to not leave it encoded?

Here is a a sample:

<script nonce=\"koFDr1miSKW8U9aJTnGQVw\">var ytInitialData = \'\\x7b\\x22responseContext\\x22:\\x7b\\x22serviceTrackingParams\\x22:\\x5b\\x7b\\x22service\\x22:\\x22GFEEDBACK\\x22,\\x22params\\x22:\\x5b\\x7b\\x22key\\x22:\\x22browse_id\\x22,\\x22value\\x22:\\x22UCPHWVzGcW-iozudjp8U984g\\x22\\x7d,\\x7b\\x22key\\x22:\\x22logged_in\\x22,\\x22value\\x22:\\x220\\x22\\x7d,\\x7b\\x22key\\x22:\\x22e\\x22,\\x22value\\x22:\\x2224022617,24023962,24014268,24022308,23968386,24022875,24025790,24025869,23857948,24006666,24022914,23923339,23976696,23983296,23944779,23744176,23990877,24021968,24021668,23966208,24011119,23891346,24006795,24023271,24001373,23934970,23987676,23897180,23891344,23804281,23974595,24016478,24007246,24012654,24024964,1714255,24002010,23946420,23997485,23884386,24019883,23882502,23918597,24012117,23969934,24014440\\x22\\x7d\\x5d\\x7d,\\x7b\\x22service\\x22:\\x22CSI\\x22,\\x22params\\x22:\\x5b\\x7b\\x22key\\x22:\\x22c\\x22,\\x22value\\x22:\\x22MWEB\\x22\\x7d,\\x7b\\x22key\\x22:\\x22cver\\x22,\\x22value\\x22:\\x222.20210406.03.00\\x22\\x7d,\\x7b\\x22key\\x22:\\x22yt_li\\x22,\\x2"..
lewis
  • 2,936
  • 2
  • 37
  • 72
  • I don't see any json response only html. Show how you are getting the initial data from your html response – Leo Dabus Apr 10 '21 at 12:06
  • @LeoDabus The JSON is embedded in a few of the script elements inside of the HTML I noticed that while I was going to www.youtube.com I was being served m.youtube.com on the response. I was able to resolve the issue by changing my user agent to a desktop Mozilla user agent string. – Paul Thompson Apr 10 '21 at 13:10
  • How to resolve this issue? – Huy Nguyen Oct 09 '21 at 04:31

1 Answers1

0

While inspecting the response further I realized that while I was making a request to www.youtube.com I was receiving a response from m.youtube.com I changed my user agent field to a desktop Mozilla string by adding the following

request.setValue("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0", forHTTPHeaderField:"user-agent")

Most of my encoding issues were resolved, now I was receiving &quot html encoding for all of my quotation characters. I went ahead and replaced all of those with " using replacingOccurrences and the JSON now had a valid format.