0

Language -> C++11 or C++98 {NOT C}
OS -> Linux embedded system
Restriction-> NO use of any 3rd party library. Overview -> to establish connection with website.
I have an Linux embedded system and Its not allowed to download any libraries like poco or libcurl or boost to establish connection with website and extract information. So I am wondering if someone can direct me to how to establish connection purely by raw sockets in C++ [not c] and retrieve information from page.

Parsing the information and retrieving exact information is not a challenge for me, my main problem is how would I establish connection over http protocol. If I am right , to connect to website I need http protocol rather than TCP/IP.
Could some one please point me to right direction. Thanks

Filburt
  • 17,626
  • 12
  • 64
  • 115
samprat
  • 2,150
  • 8
  • 39
  • 73
  • HTTP is a layer on-top of TCP/IP. The TCP/IP layer is for transport. The HTTP layer is for how the data is interpreted. – Martin York May 14 '16 at 19:53
  • Why not C. If you are talking raw sockets the best API is there. – Martin York May 14 '16 at 19:53
  • You'll need to use Linux socket-related syscalls and do your own HTTP header parsing. – jcai May 14 '16 at 19:57
  • Is this your homework? – n. m. could be an AI May 14 '16 at 20:06
  • @n.m No mate. its task,I have been active in forum for many years so I cant be student. Also say even if its homework , there is no harm in asking or seeking for advices. Had someone ask for code without trying then its wrong – samprat May 14 '16 at 20:19
  • There is nothing wrong with homeworks, however artificial restrictions like "NOT C" and "not allowed to download any libraries" make little sense outside of typical homework environments. – n. m. could be an AI May 14 '16 at 20:22
  • @n.m, well yeah your right , your points make sense . But I assure its not homework. Also just a small hint from Loki and Arcinde made me to think in different direction and I think I can solve the issue . Earlier I googled and all I was getting is using libcurl etc but their bit of advice helped me and I am sure i wil resolve my issues – samprat May 14 '16 at 20:36
  • I'm not out to disqualify the question or blame you for whatever, I'm just trying to understand your exact situation. – n. m. could be an AI May 14 '16 at 20:41
  • Tutorials on RAW sockets for `C` will work just fine in `C++` (most likely). Do you actually mean RAW sockets or do you just mean sockets rather than a library? – Galik May 14 '16 at 20:46
  • http://stackoverflow.com/questions/23842394/c-c-http-client-library-for-embedded-projects – Rajeev Kumar May 15 '16 at 07:27

1 Answers1

4

You can communicate with HTTP with raw TCP socket.Since you didn't provide code, I can't provide code either. If you already know how to connect, send and receive data from server, it should be easy. Just follow the steps below. Let's assume you want to connect to www.cnn.com.

1. Convert the the domain name of the website to an IP Address.

2. Connect to that IP address with port 80.

3. Send the string GET / HTTP/1.1\r\nHost: www.cnn.com\r\nConnection: close\r\n\r\n

4. Read from the socket/server. If the server is available, it will respond with the page or html code on that webpage.

5. Close socket connection.

Note that some websites will not respond or will even block you if you don't provide the User-Agent/Web browser name you are using.

To fix this, in step add, add User-Agent:MyBrowserName \r\n header to the string. You can fake browsers. You must put \r\n after each header.

For example, the Chrome browser I am using is Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36.

Your new string that will be sent in Step 3 should look something like this GET / HTTP/1.1\r\nHost: www.cnn.com\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.97 Safari/537.36\r\n\r\n. You should notice that there is \r\n after each header. The last header ends with \r\n\r\n instead of \r\n.

Other useful headers are Connection: Keep-Alive\r\n , Accept-Language: en-us\r\n, Accept-Encoding: gzip, deflate\r\n ,

Replace port 80 with 443 if the website is https instead of http. Things get complicated from here because you have to implement the SSL protocol.

Assuming you want to access page in another directory instead of the home page and the url is http://www.cnn.com/2016/05/13/health/healthy-eating-quiz/index.html

The string to send should look like this:

GET /2016/05/13/health/healthy-eating-quiz/index.html HTTP/1.1\r\nHost: www.cnn.com\r\nConnection: close\r\n\r\n

If you are using proxy, you have to put the whole url after GET command:

GET GET http://www.cnn.com/2016/05/13/health/healthy-eating-quiz/index.html HTTP/1.1\r\nHost: www.cnn.com\r\nConnection: close\r\n\r\n

Programmer
  • 121,791
  • 22
  • 236
  • 328