Preamble (you can skip this, this is just my justification)
I have created an application which uses sqlite as its database backend, and the schema works (and performs) very well during general application use.
Now I am attempting to build a reporting system for it, and I have built an excel xll which creates query tables from an unnamed DSN. Because of this, I have to do all of my reporting in sql only (I.e. I can do nothing programatically). This works very well for everything except for one query...
/// SKIP HERE....
My database contains a list of Features which have an id, distance and an indicator for whether the feature is a marker or not. The id is not necessarily in the same order as the distance (a feature with id 10 might have distance 100 and the feature with id 11 might have distance 90).
So the item basically looks like:
Feature { int id, int distance, bool is_marker }
What I am trying to do is find the next and previous feature that are also markers.
/// Edit
My first attempt used:
select
* /* I want all the data from this feature */
(select MAX(f2.distance) - f1.distance
from feature as f2
where f2.is_marker && f2.distance < f1.distance) /* and the distance to the previous marker */
from feature as f2
second attempt (this one works, it just takes WAAAY too long for 100,000 features, somewhere around 9 days...):
select
*, /* I want all the data from this feature */
(select f1.distance - MAX(f2.distance)
from feature as f2
where f2.distance AND f2.distance< f1.distance) /* and the distance to the previous marker */
from feature as f1
This query does return what I want, and performs adequately for small databases, but I have to support MUCH larger databases as well.
(Some of the databases have less than 1000 features, but the one I am working on right now has >90,000 features. The query for 1000 features takes <1s, but the one for 90,000 features takes 20 hours. This is because it does not grow linearly resulting in an 80x decrease in performance: 20*60*60/(90,000/1000) = 8000)
The backend database uses sqlite, and I am using the sqliteodbc connector to hookup excel.
If I was to do this in code, I would do this:
var features = featureRepository.GetAll();
var featuresWithMarkerDistance = new List<FeatureWithMarkerDistance>();
var previousMarker = null;
for(var index = 0; index < features.Length; index++) {
var currentFeature = features[index];
featuresWithMarkerDistance.Add(
new FeaturesWithMarkerDistance(currentFeature,
feature.distance - previousMarker.distance));
if(feature.is_marker) {
previousMarker = feature;
}
}
// FeatureWithMarkerDistance { int id, int distance, bool is_marker, int marker_distance }
// EDITS:
Here is a concrete example:
(The underlying table)
feature_id is_marker distance
1 false 100
2 false 90
3 false 101
4 true 50
5 false 5
6 true 85
7 false 150
8 false 75
(There is an index on distance)
The results I want:
feature_id is_marker distance distance_to_closest_previous_marker
1 false 100 15
2 false 90 5
3 false 101 16
4 true 50 null
5 false 5 null
6 true 85 35
7 false 150 65
8 false 75 25
So if I was getting the previous marker for feature_id 1, the distance for feature_id 1 is 100, and the closest marker is feature_id 6 at distance 85. To get the distance to closest previous marker, I take (100 - 85) = 15. I need to get this value for every single feature to include in a report. (This must be done in a single sql query because I'm using an odbc connector with excel). The above query does fetch what I want, it just performs very badly because in the where clause it has to search over the entire database for every feature.
What I would like to do is this: (unless there is a more performant way)
select *
/* I want all the data from this feature */
/* previous = */ (select MAX(f2.distance) - f1.distance
from feature as f2
where f2.is_marker && f2.distance >= previous && f2.distance < f1.distance)
/* and the distance to the previous marker */
from feature as f2
So the basic theory is that I would store the previous marker value and only look at that value and beyond when I'm looking for the next marker.
Sorry about the confusion originally (I forgot to put the MAX() in originally)