I'm testing out making multiple requests to various URLs doing a some web scraping - and after the first request, the second often fails. I can't figure out why:
I make two simple requests to sites, and what's happening is the second request is returning Google-relevant response, and is failing. If I start the server and just hit Yahoo, then the request returns as expected. This same behavior happens if my first request hits Wikipedia, and subsequent requests go somewhere else.
Can someone explain whats happening?
Thanks.
deps: {:httpoison, "~> 1.5"}
First I start the server (as per docs)
iex(1)> HTTPoison.start
{:ok, []}
Next, I make a request to get Google's homepage:
iex(2)> HTTPoison.get "https://www.google.com"
{:ok,
%HTTPoison.Response{
body: "<!doctype html><html itemscope=\"\" itemtype=\"http://schema.org/WebPage\" lang=\"en\"><head><meta content=\"Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for.\" name=\"description\"><meta content=\"noodp\" name=\"robots\"><meta content=\"text/html; charset=UTF-8\" http-equiv=\"Content-Type\"><meta content=\"/logos/doodles/2019/celebrating-earl-scruggs-5680695065182208.3-law.gif\" itemprop=\"image\"><meta content=\"Celebrating Earl Scruggs\" property=\"twitter:title\"><meta content=\"Celebrating Earl Scruggs! #GoogleDoodle\" property=\"twitter:description\"><meta content=\"Celebrating Earl Scruggs! #GoogleDoodle\" property=\"og:description\"><meta content=\"summary_large_image\" property=\"twitter:card\"><meta content=\"@GoogleDoodles\" property=\"twitter:site\"><meta content=\"https://www.google.com/logos/doodles/2019/celebrating-earl-scruggs-5680695065182208-2xa.gif\" property=\"twitter:image\"><meta content=\"https://www.google.com/logos/doodles/2019/celebrating-earl-scruggs-5680695065182208-2xa.gif\" property=\"og:image\"><meta content=\"1000\" property=\"og:image:width\"><meta content=\"400\" property=\"og:image:height\"><meta content=\"https://www.google.com/logos/doodles/2019/celebrating-earl-scruggs-5680695065182208-2xa.gif\" property=\"og:url\"><meta content=\"video.other\" property=\"og:type\"><title>Google</title><script nonce=\"j0aPHCuRPUlftRzX2g6tTQ==\">(function(){window.google={kEI:'D1A4XKnVOo-6_wTgtpOgDA',kEXPI:'0,1353747,57,50,1907,1017,625,781,698,527,731,325,1124,349,30,1227,806,95,546,352,2335328,167,32,68,329226,1294,12383,4855,32692,2074,13173,867,10761,1402,6381,854,2481,2,2,6801,364,1165,7,2147,1262,4243,224,1017,1195,266,3742,1365,575,835,284,2,579,727,2069,363,58,2,1,3,933,364,4324,3397,302,658,610,291,482,2115,135,1407,1413,1529,395,525,621,5,2,2,1963,528,2067,182,283,2838,298,670,1044,1,468,1344,386,743,268,81,7,1,2,27,461,620,29,983,6,406,458,466,2,1379,769,536,428,267,2552,1739,313,876,412,2,554,2368,2,264,381,286,948,11,1209,38,363,557,270,303,145,155,499,285,433,42,1322,99,342,43,47,1080,543,1826,367,789,270,603,661,431,49,626,265,217,779,1531,35,2,4,2,670,44,226,1292,3,237,9,12,408,349,167,82,247,879,238,410,529,187,508,105,1,1496,5,12,620,464,87,99,25,178,283,278,6,38,53,290,390,37,117,9,81,345,103,17,112,7,203,173,81,2,83,340,14,617,604,58,351,614,175,97,1,1,2,177,803,60,264,88,5968727,2554,233,22,5997346,90,2800095,4,1572,549,332,445,1,2,80,1,900,583,4,309,1,8,1,2,2132,1,1,1,1,1,414,1,748,141,59,726,3,7,443,3,117,1,2,140,226,23,53,22306694',authuser:0,kscs:'c9c918f0_D1A4XKnVOo-6_wTgtpOgDA',kGL:'US'};google.kHL='en';})();google.time=function(){return(new Date).getTime()};(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute(\"eid\")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute(\"leid\")));)a=a.parentNode;return b};google.https=function(){return\"https:\"==window.location.protocol};google.ml=function(){return null};google.log=function(a,b,e,c,g){if(a=google.logUrl(a,b,e,c,g)){b=new Image;var d=google.lc,f=google.li;d[f]=b;b.onerror=b.onload=b.onabort=function(){delete d[f]};google.vel&&google.vel.lu&&google.vel.lu(a);b.src=a;google.li=f+1}};google.logUrl=function(a,b,e,c,g){var d=\"\",f=google.ls||\"\";e||-1!=b.search(\"&ei=\")||(d=\"&ei=\"+google.getEI(c),-1==b.search(\"&lei=\")&&(c=google.getLEI(c))&&(d+=\"&lei=\"+c));c=\"\";!e&&google.cshid&&-1==b.search(\"&cshid=\")&&\"slh\"!=a&&(c=\"&cshid=\"+google.cshid);a=e||\"/\"+(g||\"gen_204\")+\"?atyp=i&ct=\"+a+\"&cad=\"+b+d+f+\"&zx=\"+google.time()+c;/^http:/i.test(a)&&google.https()&&(google.ml(Error(\"a\"),!1,{src:a,glmm:1}),a=\"\");return a};}).call(this);(function(){google.y={};google.x=function(a,b){if(a)var c=a.id;else{do c=Math.random();while(google.y[c])}google.y[c]=[a,b];return!1};google.lm=[];google.plm=function(a){google.lm.push.apply(google.lm,a)};google.lq=[];google.load=function(a,b,c){google.lq.push([[a],b,c])};google.loadAll=function(a,b){google.lq.push([a,b])};}).call(this);google.f={};</scri" <> ...,
headers: [
{"Date", "Fri, 11 Jan 2019 08:13:03 GMT"},
{"Expires", "-1"},
{"Cache-Control", "private, max-age=0"},
{"Content-Type", "text/html; charset=ISO-8859-1"},
{"P3P", "CP=\"This is not a P3P policy! See g.co/p3phelp for more info.\""},
{"Server", "gws"},
{"X-XSS-Protection", "1; mode=block"},
{"X-Frame-Options", "SAMEORIGIN"},
{"Set-Cookie",
"1P_JAR=2019-01-11-08; expires=Sun, 10-Feb-2019 08:13:03 GMT; path=/; domain=.google.com"},
{"Set-Cookie",
"NID=154=eRdDgOkW7gEdW7vRAPVM1Q7p3GKbBPOSH3yr07CL414Lmx740Jtk9WTPtl9RbGzWJ4QCetWtoQIjSbv_F-ML6Bs6_I9tt91ED_TD8ZKQrenqMr9ykhB7oBd8XoN7W5TqWNTy5jdlEjPFjwkAL42qTrjgGR2MJ5_jTphwwzVCKS8; expires=Sat, 13-Jul-2019 08:13:03 GMT; path=/; domain=.google.com; HttpOnly"},
{"Alt-Svc", "quic=\":443\"; ma=2592000; v=\"44,43,39,35\""},
{"Accept-Ranges", "none"},
{"Vary", "Accept-Encoding"},
{"Transfer-Encoding", "chunked"}
],
request: %HTTPoison.Request{
body: "",
headers: [],
method: :get,
options: [],
params: %{},
url: "https://www.google.com"
},
request_url: "https://www.google.com",
status_code: 200
}}
Lastly, I make a request to get Yahoo's homepage
iex(3)> HTTPoison.get "https://www.yahoo.com"
{:ok,
%HTTPoison.Response{
body: "<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n <title>Error 404 (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That’s an error.</ins>\n <p>The requested URL <code>/</code> was not found on this server. <ins>That’s all we know.</ins>\n",
headers: [
{"Content-Type", "text/html; charset=UTF-8"},
{"Referrer-Policy", "no-referrer"},
{"Content-Length", "1561"},
{"Date", "Fri, 11 Jan 2019 08:13:27 GMT"},
{"Alt-Svc", "quic=\":443\"; ma=2592000; v=\"44,43,39,35\""}
],
request: %HTTPoison.Request{
body: "",
headers: [],
method: :get,
options: [],
params: %{},
url: "https://www.yahoo.com"
},
request_url: "https://www.yahoo.com",
status_code: 404
}}