1

My question deals with selecting items from a series that were not present in a previous observation. I am trying to adapt a technique from SQL: selecting rows where column value changed from previous row.

I have an application that includes a table that stores a list of apps installed on devices. Devices submit this data daily, and the full list is saved each day. I have been exporting new records each day for import into a reporting system but the data is growing very large because my reporting dataset has to retain a much larger time range that the actual application. And the way the app stores this data is not optimal for reporting. I would like to export only records that indicate that a new app has been installed on a device since the last time I performed a data export. The report_ids of last export and for the most recent report from the device are known. The data structure is not ideal. Essentially I am trying to convert a time series of data into a change log.

I don't own the source application so I can't change the database's data structures, triggers, etc.

*** Please see fiddle for data structure and sample data: http://sqlfiddle.com/#!9/ec6040/17

SQL:

-- Sample data: 
CREATE TABLE last_exported_reports
  (report_id int, mobile_device_id int);
INSERT INTO last_exported_reports 
  (mobile_device_id, report_id)
VALUES 
  (1, 1), (2, 6), (3, 7);

CREATE TABLE mobile_devices_denormalized
  (mobile_device_id int, last_report_id int);
INSERT INTO mobile_devices_denormalized 
  (mobile_device_id, last_report_id)
VALUES 
  (1, 4), (2, 6), (3, 8);

CREATE TABLE reports
  (`report_id` int, `mobile_device_id` int, `date_entered_epoch` bigint(32));
INSERT INTO reports
  (`report_id`, `mobile_device_id`, `date_entered_epoch`)
VALUES
  (1, 1, 1529981397691),
  (2, 2, 1529981397692),
  (3, 3, 1529981397693),
  (4, 1, 1529981397694),
  (5, 2, 1529981397695),
  (6, 2, 1529981397696),
  (7, 3, 1529981397697),
  (8, 3, 1529981397698);

CREATE TABLE mobile_device_installed_applications
    (`report_id` int, `identifier` varchar(8), `application_short_version` varchar(5));
INSERT INTO mobile_device_installed_applications
    (`report_id`, `identifier`, `application_short_version`)
VALUES
    (1, 'Chrome', 'c1.1'),   -- device 1
    (1, 'Word',   'w2.1'),   -- device 1
    (2, 'Skype',  's1.0'),   -- device 2
    (3, 'Excel',  'e3.0'),   -- device 3    
    (4, 'Chrome', 'c2.1u'),  -- device 1
    (4, 'Word',   'w2.1n'),  -- device 1
    (4, 'Excel', 'w2.0'),    -- device 1
    (5, 'Skype',  's1.0'),   -- device 2
    (6, 'Skype',  's1.9'),   -- device 2
    (7, 'Excel',  'e3.0'),   -- device 3
    (8, 'Excel',  'e3.0');   -- device 3


SELECT 
  mdd.mobile_device_id AS md_id, mdia.report_id, mdia.identifier, mdia.application_short_version AS ver
FROM 
  -- List of all devices
  mobile_devices_denormalized mdd
  INNER JOIN
    -- Add in apps
    mobile_device_installed_applications mdia
  ON
    -- Only if they are from the last report from a device
    mdia.report_id = mdd.last_report_id
  AND
    -- And only if the latest report has not already been exported
    mdia.report_id NOT IN (select report_id FROM last_exported_reports)
  AND
    -- And only if the app in the new report was not in the last exported report
    NOT EXISTS (
      SELECT * 
      FROM mobile_device_installed_applications exported
      WHERE exported.identifier = mdia.identifier
      AND  exported.report_id = mdd.last_report_id
    )
;

If I run the above without the NOT EXISTS clause, it works fine in that I get all app records for new reports as expected. When I add in the final step, the subquery to get rid of repetative apps, I get no results.

RESULT IF RUN WITHOUT THE SUBQUERY CLAUSE:

md_id    report_id    identifier    ver
1        4            Chrome        c2.1u
1        4            Word          w2.1n
1        4            Excel         w2.0
3        8            Excel         e3.0
  • This is correct for device 1 because report 4 is most recent and is not the previously exported report (4 != 1)
  • This is correct for device 2 because report 6 is most recent but it was already exported so no records are exported.
  • This is correct for device 3 because report 8 is most recent and is not the previously exported report (8 != 7)

RESULT WHEN I ADD IN THE FINAL SUBQUERY CLAUSE:

No records.

EXPECTED RESULT:

md_id    report_id    identifier  ver
1        4            Excel       w2.0

The above record should have been printed because it is in the most recent report (report_id=4) for device #1 and the app was not on the device the last time it was exported (report_id=1).

Dark Knight
  • 6,116
  • 1
  • 15
  • 37
Lindsey
  • 47
  • 8
  • What if an app is uninstalled? What should be expected output? – Dark Knight Aug 26 '19 at 06:35
  • I think the output should be `1 4 Excel w2.0`. As Word & Chrome was already present in report 1 (except different version for Word) but Excel was the new application installed in report 4. I have a query for that let me know if I am right. – Dark Knight Aug 26 '19 at 06:49
  • @JitendraYadav, good question! Once I understand how to evaluate the current status based on the previous, I think I will be able to adapt the answer to deal with deletes as well as version upgrades. But I didn't want to over-complicate the question. It's already complicated enough. – Lindsey Aug 26 '19 at 12:44
  • 1
    Let me know if you are still looking for an answer. The answer will exclude the first comment case scenario and returns output like the second comment. – Dark Knight Aug 26 '19 at 12:49
  • You are right @jitendra. I corrected the question. Thank you. – Lindsey Aug 26 '19 at 18:36

1 Answers1

4
SELECT a.* 
FROM (
    SELECT  
        mdd.mobile_device_id, 
        mdia.report_id, 
        mdia.identifier, 
        mdia.application_short_version AS ver
    FROM mobile_devices_denormalized mdd
    INNER JOIN mobile_device_installed_applications mdia ON mdia.report_id = mdd.last_report_id
    ) AS a
LEFT JOIN (
    SELECT  
        ler.mobile_device_id, 
        mdia.report_id, 
        mdia.identifier, 
        mdia.application_short_version AS ver
    FROM last_exported_reports ler
    INNER JOIN mobile_device_installed_applications mdia ON mdia.report_id = ler.report_id
    ) AS b ON a.mobile_device_id = b.mobile_device_id  AND a.identifier = b.identifier 
WHERE b.report_id IS NULL;

Logic: Match new report app list with app list from last exported report for the same device and select which are not matching from new report.

Dark Knight
  • 6,116
  • 1
  • 15
  • 37