Environment:
- Java 8; specifically, Oracle JDK 1.8u25;
- h2 as a SQL backend;
- jooq for querying.
The table/database I'm querying from:
private static final String H2_URI_PREFIX = "jdbc:h2:";
private static final String H2_URI_POSTFIX
= ";LOG=0;LOCK_MODE=0;UNDO_LOG=0;CACHE_SIZE=131072";
private static final String H2_USERNAME = "sa";
private static final String H2_PASSWORD = "";
private static final List<String> H2_DDL = Arrays.asList(
"create table matchers ("
+ " id integer not null,"
+ " class_name varchar(255) not null,"
+ " matcher_type varchar(30) not null,"
+ " name varchar(1024) not null"
+ ");",
"create table nodes ("
+ " id integer not null,"
+ " parent_id integer not null,"
+ " level integer not null,"
+ " success integer not null,"
+ " matcher_id integer not null,"
+ " start_index integer not null,"
+ " end_index integer not null,"
+ " time long not null"
+ ");",
"alter table matchers add primary key(id);",
"alter table nodes add primary key(id);",
"alter table nodes add foreign key (matcher_id)"
+ " references matchers(id)"
);
// ...
private void doDdl(final DSLContext jooq)
{
H2_DDL.forEach(jooq::execute);
jooq.createIndex("nodes_parent_id").on(NODES, NODES.PARENT_ID)
.execute();
jooq.createIndex("nodes_start_index").on(NODES, NODES.START_INDEX)
.execute();
jooq.createIndex("nodes_end_index").on(NODES, NODES.END_INDEX)
.execute();
}
Even though I show the full DDL code here (note that NODES
and MATCHERS
are generated by jooq's code generation package), only the nodes
/NODES
table is of interest.
One row in the nodes
table represents a match event; of interest here are the start_index
, end_index
and level
columns. It is guaranteed that start_index
is less than or equal to end_index
; as to the level
column, it is the depth in the matcher tree, and depths start at 0; that is, for some matcher c
in matcher path /a/b/c
, c
's level
will be 2.
Now, the result I want to obtain is the following:
Given a line range (10, 25 or 50), return a map where keys are the line number and values are the maximum depth of the parse tree for this line; should only be considered matchers which are currently active for this line
A line is materialized by an interval [start, end)
(start
inclusive, end
exclusive). A matcher is considered active for a given line if both of the following statements are true:
- its start index is strictly less than the line's end index; and
- its end index is greater than, or equal to, the line's start index.
Now, how I solve this query:
- I create a virtual column consisting of a series of SQL
case
statements, one per line, checking whether a matcher is active for a given line; this column is calledline
; - I do
select line, max(level)
and group byline
, with the added condition that the end index should be greater than or equal to the first line's start index, and the start index should be strictly less than the last line's end index.
The code:
@Override
public Map<Integer, Integer> getDepthMap(final int startLine,
final int wantedLines)
throws GrappaDebuggerException
{
loadInputBuffer();
final List<IndexRange> ranges
= IntStream.range(startLine, startLine + wantedLines)
.mapToObj(inputBuffer::getLineRange)
.collect(Collectors.toList());
final int startIndex = ranges.get(0).start;
final int endIndex = ranges.get(ranges.size() - 1).end;
final Condition indexCondition = NODES.START_INDEX.lt(endIndex)
.and(NODES.END_INDEX.ge(startIndex));
final Field<Integer> lineField = getLineField(startLine, ranges);
final Map<Integer, Integer> ret = new HashMap<>();
jooq.select(lineField, DSL.max(NODES.LEVEL))
.from(NODES)
.where(indexCondition)
.groupBy(lineField)
.forEach(r -> ret.put(r.value1(), r.value2() + 1));
IntStream.range(startLine, startLine + wantedLines)
.forEach(line -> ret.putIfAbsent(line, 0));
return ret;
}
private Field<Integer> getLineField(final int startLine,
final List<IndexRange> ranges)
{
CaseConditionStep<Integer> step = DSL.decode()
.when(activeThisRange(ranges.get(0)), startLine);
final int size = ranges.size();
for (int i = 1; i < size; i++)
step = step.when(activeThisRange(ranges.get(i)), startLine + i);
return step.as("line");
}
private static Condition activeThisRange(final IndexRange range)
{
return NODES.START_INDEX.lt(range.end)
.and(NODES.END_INDEX.ge(range.start));
}
This request takes approximately 15 seconds on the busiest parts of a 22 million table entry if I query for 25 lines (that is, lines n to n + 24 for some n), but is there any way to improve it?
Note that changing the SQL engine is NOT an option; this is a GUI app for which databases are "forgettable"; and I don't want to require that a "full fledged" RDBMS engine be installed!