web server: how does this request appear

Question

I'm building a web server with Python-tornado. The server is to provide a kind of search service about all of restaurants in some country. So the logic is quite simple: user types a key word and submits on the web page, the server replies some messages. In a word, it is just like a mini-google.

I also make a simple log.

In the log, I can see that most of requests are like this:

[I 170625 19:23:12 web:2063] 200 GET /images/icon-language.png (116.31.83.132) 0.88ms
[I 170625 19:23:12 web:2063] 200 GET /index?type=Sight&key=Bol%20content (116.31.83.132) 10.05ms
[I 170625 19:30:30 web:2063] 304 GET / (116.31.83.132) 0.87ms
[I 170625 19:30:44 web:2063] 200 GET / (116.31.83.132) 0.78ms
[W 170625 19:30:51 web:2063] 405 POST / (116.31.83.132) 1.20ms
[W 170625 19:31:00 web:2063] 405 POST / (116.31.83.132) 0.63ms
[I 170625 19:31:22 web:2063] 200 POST /index (116.31.83.132) 0.89ms
[I 170625 19:31:42 web:2063] 200 GET /index (116.31.83.132) 0.62ms
[I 170625 19:31:49 web:2063] 200 GET / (116.31.83.132) 0.78ms
[W 170625 19:31:57 web:2063] 404 GET /abce (116.31.83.132) 0.65ms

But to my surprise, there are a few of requests as below:

[W 170625 18:43:41 web:2063] 404 GET http://baidu.com/ (106.2.125.215) 0.60ms

I can't understand how this kind of request is generated. For example, if the address of my web server is www.example.com and I send some get request to it, it must be like this: www.example.com/abcd. But this request doesn't start with /, how comes?

Is this some kind of XSS(Cross Site Scripting)? It seems that someone was trying to do some Cross-Origin request through my web server. If I'm right, I'm gonna filter all key words of user containing <script>. Am I right?

score 1 · Answer 1 · answered Jun 26 '17 at 11:55

What you see might be scans for open proxies, e.g. someone is looking if he/she/it can misuse your server to browse other sites. It doesn't have to do explicitly with Python.

This is usually done by using tools and issuing the GET-request directly. That's a common practice for advertisements.

If you install stuff like OSSEC you'll see a lot more scanners brute-forcing your website for different things all the day long.

What you can do is to set up some firewall rules. Anyway, that won't stop the requests, but at least they won't make it as far as server logs. Then again -- if your main issue is log bloat, and if it is your own server, you could always exclude Baidu from logging. (I wouldn't personally do this, just pointing out it's an option.) But remember that search engines never get bored and go away.

Dalen · Accepted Answer · 2017-06-26T12:09:17.760

It seems to me that somebody mixed your server with baidu.com. Or your server have some connections with them and request bounced up to you because of poorly set DNS or such stuff. It is just possible that somebody programming misstyped IP address for baidu.com and got your server instead.

I hope you know how HTTP requests do look like and that making a call to an IP isn't enough for a professional web server. You have to look at "Host" HTTP header too. I don't know whether tornado does this by default. But when Host header isn't your websites URL, you drop the connection and no mixes occur.

And you are wrong. <script> has nothing to do with server side of HTTP protocol and has nothing what so ever with direct effect to it. Do not mix HTML and JS with HTTP. They have in common just that HTTP's most usual transfers are HTML pages and JS scripts.

Ow, BTW, It would be clever of you to include information from HTTP header "User-Agent" into a log and, you can check who gets to you to some degree by using whois and similar services.

web server: how does this request appear

2 Answers2