0

I have a CSV file that is a little over a half million lines long by ten columns wide.

I need to take out common IP addresses that have two specific protocols (UDP and IGMP) and filter out all other IP addresses that only have one associated protocol. Here is an example of the data in my table:

enter image description here

So, this query would run and output:

enter image description here

It returns the IP/server that has BOTH UDP and IGMP and NOT the server that only has UDP. How can I do this? This needs to be run against 510,000+ lines with about 11,000 unique IPs.

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
falkon114
  • 11
  • 7

3 Answers3

1

One method is to use window functions:

select t.*
from (select t.*,
             sum(case when protocol = 'UDP' then 1 else 0 end) over (partition by sourceIP) as num_udp,
             sum(case when protocol = 'IGMP' then 1 else 0 end) over (partition by sourceIP) as num_igmp
      from table t
     ) t
where num_udp > 0 and num_igmp > 0;

EDIT:

If you only want exactly those protocols:

select t.*
from (select t.*,
             sum(case when protocol = 'UDP' then 1 else 0 end) over (partition by sourceIP) as num_udp,
             sum(case when protocol = 'IGMP' then 1 else 0 end) over (partition by sourceIP) as num_igmp,
             sum(case when protocol not in ('UDP', 'IGMP') then 1 else 0 end) over (partition by sourceIP) as num_other
      from table t
     ) t
where num_udp > 0 and num_igmp > 0 and num_other = 0;

If you just want to filter the protocols, then add where protocol in ('UDP', 'IGMP').

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you! This is almost exactly what I need, but my results still have some TCP connections included. Any idea? Let me clarify a bit too. This sheet has UDP, TCP, ICMP, IGMP, and PIM type protocols, we're trying to strip all those out and keep only addresses with communication on both UDP and IGMP inclusively. – falkon114 Dec 05 '15 at 01:29
0

select * from table where SourceIP in (select DestIP from table where Protocol = 'IGMP')

This assumes the if a IGMP record exist then a UDP would also have to exist. (I't really a cheap answer)

Chuck
  • 1,001
  • 1
  • 13
  • 19
0

I believe Gordon has already given a good answer using the window functions ... Another alternative could be ...

Select * from [tablename]
where SourceIP in
(

Select t2.SourceIP,count(SourceIP) from 
    (
      Select SourceIP,Protocol,count(*) from [tablename] where Protocol in ('UDP','IGMP') group by SourceIP,Protocol
    ) as t2 
group by SourceIP having count(SourceIP)=2

)

Note: I haven't run this query yet...

Abhinav
  • 36
  • 7