5

Imagining there is a firewall, and the system administrator blocked many subnets, perhaps all subnets of a specific country.

For example:

192.168.2.0 / 255.255.255.0
223.201.0.0 / 255.255.0.0
223.202.0.0 / 255.254.0.0
223.208.0.0 / 255.252.0.0
....

To determine whether a IP address have been blocked, the firewall may use the algorithm below.

func blocked(ip)
    foreach subnet in blocked_subnets
        if in_subnet(subnet, ip)
            return true
    return false

But, the algorithm needs too much time to run, the time complexity is O(n). If the route table contains too many entries, the network will become almost unavailable.

Is there a more efficient way to match the IP addresses to huge route entries? It is based on some kinds of trees/graphs (Trie?) I guess. I have read something about Longest prefix match and Trie but didn't get the point.

比尔盖子
  • 2,693
  • 5
  • 37
  • 53

5 Answers5

14

All you really need is a trie with four levels. Each non-leaf node contains an array of up to 256 child nodes. Each node also contains a subnet mask. So, given your example:

192.168.2.0 / 255.255.255.0
223.201.0.0 / 255.255.0.0
223.202.0.0 / 255.254.0.0
223.208.0.0 / 255.252.0.0

Your tree would look something like that below. The two numbers for each node are the IP segment followed by the subnet mask.

             root
         /           \
     192,255             223,255
       |           -------------------------
     168,255       |           |           |
       |          201,255    202,255    208,255
      2,255

When you get an IP address, you break it into segments. You search for the first segment at the root level. For speed, you'll probably want to use an array at the root level so that you can do a direct lookup.

Say the first segment of the IP address is 223. You'd grab the node from root[223], and now you're working with just that one subtree. You probably don't want a full array at the other levels, unless your data is really dense. A dictionary of some kind for the subsequent levels is probably what you'll want. If the next segment is 201, you look up 201 in the dictionary for the 223 node, and now your possible list of candidates is just 64K items (i.e. all IP addresses that are 223,201.x.x). You can do the same thing with the other two levels. The result is that you can resolve an IP address in just four lookups: one lookup in an array, and three dictionary lookups.

This structure is also very easy to maintain. Inserting a new address or range requires at most four lookups and adds. Same with deleting. Updates can be done in-place, without having to rebuild the entire tree. You just have to make sure that you're not trying to read while you're updating, and you're not trying to do concurrent updates. But any number of readers can be accessing the thing concurrently.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • This answer seems incorrect. For the second level, the mask is not always 255. For example: `223.208.0.0 / 255.252.0.0` – shawn Apr 28 '22 at 09:37
6

Using hash map or trie would let you have a hard time dealing with CIDR IP ranges (i.e. the mask is not necessarily 8-based, like 192.168.1.0/28)

An efficient way of doing this is binary search, given that all these IP ranges don't overlap:

  1. Convert the range A.B.C.D/X into a 32-bit integer representing the starting IP address, as well as an integer of how many IPs in this range. For example, 192.168.1.0/24 converts to 3232235776, 256.

  2. Add these ranges in a list or array, and sort by the starting IP address number.

  3. To match an incoming IP address to any range in the list is to do the binary search.

Su Excelle
  • 141
  • 1
  • 6
  • Brilliant! But won't this work as well even if the ranges overlap? For example, say we had three identical /8, /16, /24 ranges, represented as (64M, 16777216), (64M, 65536), (64M, 256). In the binary search array, would just sort the identical tuples by 2nd member, thus obtaining a longest-prefix-matching effect when walking overlapping ranges from left to right. Locating the "start" of the overlapping range in the binary array may take a bit of extra time, but the lookup should still be fast. – Liviu Chircu Apr 14 '22 at 09:55
  • Please disregard the above: you are right, the algo does not work for overlapping ranges. Example: 10.0.0.0/8 and 10.1.0.0/16: both match 10.1.200.200, but we cannot accurately represent them in the binary search array, such that the /16 always has priority over the /8. – Liviu Chircu Apr 14 '22 at 15:00
1

Use red-black or avl trees to store blocked ip for separate subnets . As you are dealing with ip which are basically set of 4 numbers you can use a customized comparator in your desired programming language and store it in red-black tree or avl tree.

Comparator :-

Use 4/6 ip parts to compare the two ip whether they are greater of less using first unmatched part.

example :-

10.0.1.1 and 10.0.0.1

Here ip1 > ip2 because the 3rd unmatched entry is greater in one.

Time Complexity :-

As red-black tree is balanced BST you will need O(logn) for insertion,deletion and search. For each subnet of k subnets so total O(log(n)*k) for searching ip.

Optimization :- If number of subnet is large then use different key with similar comparisons as above but with only one red-black tree.

Key = (subnet_no,ip)

You can compare them similar to above and would get O(log(S)) where S is total number of ip entries in all subnets.

Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
1

This may be a simple one, but as no one said anything about memory constraints, you may use a look-up table. Having a 2^32 item LUT is not impossible even in practice, and then the problem is reduced into a single table lookup regardless of the rules. (The same can be used for routing, as well.) If you want it fast, it takes 2^32 octets (4 GiB), if you have a bit more time, a bitwise table takes 2^32 bits, i.e. 512 MiB. Even in that case it can be made fast, but then using high-level programming languages may produce suboptimal results.

Of course, the question of "fast" is always a bit tricky. Do you want to have fast in practice or in theory? If in practice, on which platform? Even the LUT method may be slow, if your system swaps the table into HDD, and depending on the cache construction the more complicated methods may be faster even compared to RAM-based LUTs, because they fit into the processor cache. Cache miss may be several hundred CPU cycles, and during those cycles rather complicated operations can be done.

The problem with the LUT approach (in addition to the memory use) is the cost of rule deletions. As the table results from a bitwise OR of all rules, there is no simple way to remove a rule. So, in that case it must be determined where there are no overlapping rules with the rule to be deleted, and then those areas have to be zeroed out. This is probably best done bit-by-bit with the structures outlined in the other answers.

DrV
  • 22,637
  • 7
  • 60
  • 72
-2

Recall that an IP address is basically a 32 bits number.

You can cannonize each subnet to its normal form, and stored all the normal forms in a hash-table.

On run-time, cannonize the given address (easy to do), and check if the hash table contains this entry - if it does, block. Otherwise - permit.

Example, let's say you want to block the subnet 5.*.*.*, this is actually the network with the leading bits 00000101. so add the address 5.0.0.0 or 00000101 - 00000000 - 00000000 - 00000000 to your hash table.
Once a specific address arrives - for example 5.1.2.3, cannonize it back to 5.0.0.0, and check if its in the table.

The query time is O(1) on average using a hash table.

amit
  • 175,853
  • 27
  • 231
  • 333
  • Thanks. But what about the subnets with non-255 / non 8-based netmasks? For example 255.254.0.0 (/15). – 比尔盖子 Jun 16 '14 at 10:24
  • @比尔盖子 In this case, place in your table `255.254.0.0`, and once specific IP address comes (for example `255.254.5.6`) - stri[ it from the 16 least significant bits, and you will get back `255.254.0.0` - which you can check in your hash table. Note that using the IP structure, you always know which bits need to be stripped. – amit Jun 16 '14 at 10:32
  • 3
    Knowing how many bits wide your cannonized addresses need to be is going to add an excessive amount of complexity to this solution. You can get a much less complex solution by using a trie type data structure. – Chuck Wolber Oct 11 '17 at 22:19