i'm working on CDR (Call Details Record) data for Telecom company in a report with a billions of rows, I ask for a SQL query that let me analyzing a subscriber’s frequency of calls made to specific MSISDNs, to guide which subscribers should be offered family deals or identify how many peers in their network have competitor services.
my table have a huge amount of details, i got the useful ones that help me in getting this insight, the table contains:
- PRI_IDENTITY (The subscriber calling number)
- RECIPIENT_NUMBER (The called number)
- Timestamp (The call date/time)
- Call_Duration (in seconds)
- CALLED_HOME_NETWORK_CODE (The Recipient number network code)
SELECT
PRI_IDENTITY, RECIPIENT_NUMBER,
TO_timestamp(CUST_LOCAL_START_DATE,'yyyyMMddHH24miss')::Date AS Call_Date,
ACTUAL_USAGE AS Call_Duration,
CALLED_HOME_NETWORK_CODE
FROM TBL.CDR
the sample data
The output that I need to figure out is contain below:
- Year
- Month
- PRI_IDENTITY
- CALLED_HOME_NETWORK_CODE
- Frequency 5 (The count of RECIPIENT_NUMBERS whom the PRI_IDENTITY called 5 times on the day)
- Frequency 10 (The count of RECIPIENT_NUMBERS whom the PRI_IDENTITY called 10 times on the day)
- Frequency 15 (The count of RECIPIENT_NUMBERS whom the PRI_IDENTITY called 15 times on the day)
- Frequency 20 (The count of RECIPIENT_NUMBERS whom the PRI_IDENTITY called 20 times on the day)
- Frequency More than 20 (The count of RECIPIENT_NUMBERS whom the PRI_IDENTITY called more than 20 times on the day)
HINT: the PRI_IDENTITY and RECIPIENT_NUMBERS on the sample data on the image is full in the real data and due to privacy I couldn't share the whole info online