I am trying to implement a rank/dense rank functionality using typescript on a large set of data. Is there a library function or an easy way to implement this in foundry typescript.
Asked
Active
Viewed 279 times
2
-
I don't think there is anything natively. It is possible you'll have to unpack the results and calculate it yourself – fmsf May 12 '22 at 15:56
-
Are you looking to rank all rows on a property of the data, or perform an aggregation that groups the data and then ranks groups by their count? – domdomegg Jul 06 '22 at 14:39
1 Answers
0
If you want to get the rank or a dense rank for objects in TypeScript, you could implement a rank function either for an object set or all objects of a particular type like this:
import { Function, FunctionsMap, Integer, OntologyObject } from "@foundry/functions-api";
import { Objects, ExampleDataFlight, ObjectSet } from "@foundry/ontology-api";
export class MyFunctions {
@Function()
public async rankSetOfFlights(flightSet: ObjectSet<ExampleDataFlight>): Promise<FunctionsMap<ExampleDataFlight, Integer>> {
const flights = await flightSet.allAsync()
return rank(flights, compareFlight)
}
@Function()
public async rankAllFlights(): Promise<FunctionsMap<ExampleDataFlight, Integer>> {
const flights = await Objects.search().exampleDataFlight().allAsync()
return rank(flights, compareFlight)
}
}
// A comparison function, as per https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort
const compareFlight = (a: ExampleDataFlight, b: ExampleDataFlight): number =>
(a.date ?? Infinity).valueOf() - (b.date ?? Infinity).valueOf();
/**
* Creates a FunctionsMap from an object to its (sparse) rank or dense rank, for a given comparison function.
*
* Example call 1:
* rank(
* [{ value: 10 }, { value: 15 }, { value: 15 }, { value: 20 }],
* (a, b) => a.value - b.value,
* 'sparse',
* )
*
* Example output 1:
* Map<[
* { value: 10 } -> 1,
* { value: 15 } -> 2,
* { value: 15 } -> 2,
* { value: 20 } -> 4,
* ]>
*
* Example call 2:
* rank(
* [{ value: 10 }, { value: 15 }, { value: 15 }, { value: 20 }],
* (a, b) => a.value - b.value,
* 'dense',
* )
*
* Example output 2:
* Map<[
* { value: 10 } -> 1,
* { value: 15 } -> 2,
* { value: 15 } -> 2,
* { value: 20 } -> 3,
* ]>
*/
const rank = <T extends OntologyObject>(objs: T[], compareFn: (a: T, b: T) => number, how: 'sparse' | 'dense' = 'sparse'): FunctionsMap<T, Integer> => {
const map = new FunctionsMap<T, Integer>();
if (objs.length === 0) return map;
// Sort the objects, so we can iterate through them in order
const sortedObjs = objs.sort(compareFn)
// Iterate through the sorted objects, keeping track of the current rank
let rank = 1;
sortedObjs.forEach((obj, i) => {
// Increase the rank when the current object is greater than the last one
if (i >= 1 && compareFn(obj, sortedObjs[i - 1]) > 0) {
if (how === 'sparse') rank = i;
if (how === 'dense') rank++;
}
// Set the rank for the object in the map
map.set(obj, rank)
})
return map;
}
This is likely to work well for smaller datasets, and currently Foundry will limit you to running it on 100,000 objects in most cases. You can try filtering your object set (e.g. in Quiver or Workshop) before passing it to the function to help with this.
You mentioned in your question that this is for a large set of data. For larger datasets it's probably best to use the built-in Spark rank and dense rank functions in a transform, for example in Code Repositories. To do this, a transform like this might help:
from pyspark.sql import functions as F
from pyspark.sql.window import Window as W
from transforms.api import transform_df, Input, Output
@transform_df(
Output("/path/to/flights_ranked"),
source_df=Input("/path/to/flights"),
)
def compute(source_df):
return (
source_df
# (you can also use .partitionBy() on the window definition)
.withColumn("rank", F.rank().over(W.orderBy("date")))
.withColumn("dense_rank", F.dense_rank().over(W.orderBy("date")))
)

domdomegg
- 1,498
- 11
- 20