2

I have two tables:

  • persons
  • persons_fts.

Here are the definitions of the tables:

CREATE TABLE persons(name TEXT PRIMARY KEY NOT NULL, details TEXT);

CREATE VIRTUAL TABLE persons_fts USING FTS4(name TEXT NOT NULL, details TEXT, context=persons);

I want to do a full-text-search with a query on the persons_fts table, and rank the results on the basis of relevance. After looking at the official docs on how to do this, I ended with the following query:

SELECT *
FROM persons
JOIN persons_fts ON persons.name = persons_fts.name
WHERE persons_fts MATCH :query
ORDER BY rank(matchinfo(persons_fts)) DESC;

Besides the extra join, this query is the exact same as the one outlined in the official docs. However, I get an error when I try to execute it:

Error retrieving data from the table: Wrong number of arguments to function rank() (code 1 SQLITE_ERROR)

What am I doing wrong?

Please note that using FTS5 is not an option for me.

harold_admin
  • 1,117
  • 2
  • 10
  • 20
  • `rank()` is a window function that takes no arguments. Not sure what you're trying to do with it here... – Shawn Aug 30 '20 at 15:05
  • Ah, I see what you're looking at in the fts3 documentation. Read the comments in that query carefully. – Shawn Aug 30 '20 at 15:14
  • @Shawn If you could point me to the docs for FTS4, that would be great. As far as i know, the page I linked in the question is the documentation for both FTS3 and FTS4. – harold_admin Aug 30 '20 at 16:58
  • That is the documentation, yes. – Shawn Aug 30 '20 at 20:51
  • Thanks @Shawn. I didn't realize that `rank` didn't ship with SQLite, and was instead a user defined function. I have added an answer below which accomplishes the same thing as the example in the official docs. – harold_admin Aug 31 '20 at 04:58

1 Answers1

4

The linked SQLite documentation in the question clarifies what the rank function does in the comments above the query it is used in:

If the application supplies an SQLite user function called "rank" that interprets the blob of data returned by matchinfo and returns a numeric relevancy based on it, then the following SQL may be used to return the titles of the 10 most relevant documents in the dataset for a users query.

rank is expected to be a user supplied function. It does not ship with SQLite.

Here's an implementation of the rank function in Kotlin which calculates the relevance score on the basis of the data supplied by matchinfo using the default "pcx" argument:

fun rank(matchInfo: IntArray): Double {
  val numPhrases = matchInfo[0]
  val numColumns = matchInfo[1]

  var score = 0.0
  for (phrase in 0 until numPhrases) {
    val offset = 2 + phrase * numColumns * 3
    for (column in 0 until numColumns) {
      val numHitsInRow = matchInfo[offset + 3 * column]
      val numHitsInAllRows = matchInfo[offset + 3 * column + 1]
      if (numHitsInAllRows > 0) {
        score += numHitsInRow.toDouble() / numHitsInAllRows.toDouble()
      }
    }
  }

  return score
}

To understand how this code works, you should go through the rankfunc example given in the official docs.

Since our rank function is a Kotlin function, it can not be used directly by SQLite. Instead, we will need to first retrieve the matchinfo blob from the database, and then pass it to our rank function.

Here's an example on how to do it with Room:

@Dao
interface PersonsDao {
  
  @Query("""
    SELECT *, matchinfo(persons_fts, 'pcx') as mi
    FROM persons
    JOIN persons_fts ON persons.name = persons_fts.name
    WHERE persons_fts MATCH :query
  """)
  suspend fun search(query: String): List<PersonWithMatchInfo>
}

data class PersonWithMatchInfo(
  @Embedded
  val person: Person
  @ColumnInfo(name = "mi")
  val matchInfo: ByteArray
)

The retrieved ByteArray contains a sequence of numbers representing the match info, in which each number is represented by 4 bytes. The first byte is the actual value, and the next three bytes are zero. Therefore we need to remove the redundant zeroes before passing this ByteArray to rank. This can be done with a simple method:

fun ByteArray.skip(skipSize: Int): IntArray {
  val cleanedArr = IntArray(this.size / skipSize)
  var pointer = 0
  for (i in this.indices step skipSize) {
    cleanedArr[pointer] = this[i].toInt()
    pointer++
  }

  return cleanedArr
}

This setup can be used like this:

suspend fun searchWithRanks(query: String): List<Person> {
  return personDao.search(query)
        .sortedByDescending { result -> rank(result.matchInfo.skip(4)) }
        .map { result -> result.person }
}
harold_admin
  • 1,117
  • 2
  • 10
  • 20
  • Many thanks for the answer! I believe it could be improved in 2 ways: 1) by using Kotlin standard lib SLICE(indices step 4) function instead of custom SKIP(4) extension; 2) by adding list of columns weights and multiplying score by them. – ivkil Sep 27 '20 at 11:02