I'm using netty 4.1 as NIO socket server for MMORPG game. It was running perfectly for years but recently we are suffering from DDOS attacks. I was fighting it for a long time but currently, I don't have any more ideas on how could I improve it. Ddoser is spamming with new connections from thousands of ips from all over the world. It's difficult to cut it on the network level because attacks look very similar to normal players. Attacks are not very big compared to attacks on HTTP servers but big enough to crash our game.
How i'm using netty:
public void startServer() {
bossGroup = new NioEventLoopGroup(1);
workerGroup = new NioEventLoopGroup();
try {
int timeout = (Settings.SOCKET_TIMEOUT*1000);
bootstrap = new ServerBootstrap();
int bufferSize = 65536;
bootstrap.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class)
.childOption(ChannelOption.SO_KEEPALIVE, true)
.childOption(ChannelOption.SO_TIMEOUT, timeout)
.childOption(ChannelOption.SO_RCVBUF, bufferSize)
.childOption(ChannelOption.SO_SNDBUF, bufferSize)
.handler(new LoggingHandler(LogLevel.INFO))
.childHandler(new CustomInitalizer(sslCtx));
ChannelFuture bind = bootstrap.bind(DrServerAdmin.port);
bossChannel = bind.sync();
} catch (InterruptedException e) {
e.printStackTrace();
} finally {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
}
}
Initalizer:
public class CustomInitalizer extends ChannelInitializer<SocketChannel> {
public static DefaultEventExecutorGroup normalGroup = new DefaultEventExecutorGroup(16);
public static DefaultEventExecutorGroup loginGroup = new DefaultEventExecutorGroup(8);
public static DefaultEventExecutorGroup commandsGroup = new DefaultEventExecutorGroup(4);
private final SslContext sslCtx;
public CustomInitalizer(SslContext sslCtx) {
this.sslCtx = sslCtx;
}
@Override
public void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline pipeline = ch.pipeline();
if (sslCtx != null) {
pipeline.addLast(sslCtx.newHandler(ch.alloc()));
}
pipeline.addLast(new CustomFirewall()); //it is AbstractRemoteAddressFilter<InetSocketAddress>
int limit = 32768;
pipeline.addLast(new DelimiterBasedFrameDecoder(limit, Delimiters.nulDelimiter()));
pipeline.addLast("decoder", new StringDecoder(CharsetUtil.UTF_8));
pipeline.addLast("encoder", new StringEncoder(CharsetUtil.UTF_8));
pipeline.addLast(new CustomReadTimeoutHandler(Settings.SOCKET_TIMEOUT));
int id = DrServerNetty.getDrServer().getIdClient();
CustomHandler normalHandler = new CustomHandler();
FlashClientNetty client = new FlashClientNetty(normalHandler,id);
normalHandler.setClient(client);
pipeline.addLast(normalGroup,"normalHandler",normalHandler);
CustomLoginHandler loginHandler = new CustomLoginHandler(client);
pipeline.addLast(loginGroup,"loginHandler",loginHandler);
CustomCommandsHandler commandsHandler = new CustomCommandsHandler(loginHandler.client);
pipeline.addLast(commandsGroup, "commandsHandler", commandsHandler);
}
}
I'm using 5 groups:
- bootstrap bossGroup - for new connections
- bootstrap workerGroup - for delivering messages
- normalGroup - for most messages
- loginGroup - for heavy login process
- commands group - for some heavy logic
I'm monitoring the number of new connections and messages so I can immediately find out if there is an attack going. During the attack I'm not accepting new connections anymore: I'm returning false in the custom firewall ( AbstractRemoteAddressFilter ).
protected boolean accept(ChannelHandlerContext ctx, InetSocketAddress remoteAddress) throws Exception {
if(ddosDetected())
return false;
else
return true;
}
But even that I'm dropping new connections right away my workgroup is getting overloaded. PendingTasks for worker group (all other groups are fine) are growing which causes longer and longer communications for normal players and finally, they get kicked by socket_timeouts. I'm not sure why is it happen. During normal server usage, the busiest groups are login and normal group. On network level server is fine - it's using just ~10% of its bandwidth limit. CPU and RAM usage also isn't very high during the attack. But after a few minutes of such an attack, all my players are kicked out from the game and are not able to connect anymore.
Is there any better way to instantly drop all incoming connections and protect users that are aready connected?