0

I have the following query (it finds the row with the most similar name and selects it plus the value and rating):

SELECT
  DISTINCT FIRST_VALUE(value) over (partition by name order by distance asc) value,
  rating,
  name
FROM (
  SELECT
    value,
    rating,
    LEAST(
      UTL_MATCH.EDIT_DISTANCE('pineapple',   upper(name)),
      UTL_MATCH.EDIT_DISTANCE('yellowapple', upper(name)),
      UTL_MATCH.EDIT_DISTANCE('greenapple',  upper(name)),
    ) as apple,
    LEAST(
      UTL_MATCH.EDIT_DISTANCE('carbage',  upper(name)),
      UTL_MATCH.EDIT_DISTANCE('cabbagge', upper(name)),
      UTL_MATCH.EDIT_DISTANCE('bagge',    upper(name)),
    ) as cabbage,
    LEAST(
      UTL_MATCH.EDIT_DISTANCE('chakn',  upper(name)),
      UTL_MATCH.EDIT_DISTANCE('chkn',   upper(name)),
      UTL_MATCH.EDIT_DISTANCE('chikee', upper(name)),
    ) as chicken
    -- more of the above could follow...
  FROM food_table
) UNPIVOT (distance FOR name in (apple as 'apple', cabbage as 'cabbage', chicken as 'chicken'));

It turns my table from this:

category subcategory name value rating
fruit red apple 4g 4/10
vegetable green cabbage 4g 3/10
meat white chicken 49g 8/10
chemical black cocacola 12g 10/10
chemical blue pepsi 12g 10/10

to this:

name value rating
apple 4g 4/10
cabbage 4g 3/10
chicken 49g 8/10

The subquery basically returns something like this (before the unpivoting):

value rating apple cabbage chicken
4g 4/10 3* 8 7
4g 3/10 8 4* 6
49g 8/10 5 9 4*
12g 10/10 7* 12 10
12g 10/10 11 7 6*
  • I match the most similar strings to what I have in the table.
  • If there is more than one least per variable given, pick the one with the least distance (for example chicken over pepsi and apple over cocacola).
  • I unpivot and partition by order so I can find the most similar and eliminate based on that (remove pepsi and cocacola).

I managed to do the following to only get the subquery, but I need to add the outer query now:

public Stream<Projection> stream(
    Stream<Pair<String, Stream<String>>> similarStringGroups,
    Class<Projection> projectionClass,
    Class<T> entityClass,
    String termField
) {
    CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder();
    CriteriaQuery<Projection> query = criteriaBuilder.createQuery(projectionClass);
    Root<T> root                    = query.from(entityClass);
    RecordComponent[] recordComponents = projectionClass.getRecordComponents();

    Stream<Expression<Integer>> leastExpressions = similarStringGroups.map(
        similarStringGroup ->
            criteriaBuilder.function(
                "LEAST",
                Integer.class,
                similarStringGroup.getSecond().map(
                    similarString -> criteriaBuilder.function(
                        "UTL_MATCH.EDIT_DISTANCE",
                        Integer.class,
                        criteriaBuilder.literal(similarString.toUpperCase()),
                        criteriaBuilder.upper(root.get(termField))
                    )
                ).toArray(Expression[]::new)
            )
    );

    return entityManager.createQuery(criteriaQuery.multiselect(Stream.concat(leastExpressions, Stream.of(root.get(recordComponents[4].getName()), root.get(recordComponents[5].getName()))).toArray(Selection[]::new)))
}

Just to add to the above method that I wrote:

  • entityClass is just the Food.class.
  • termField is name in the SQL query.
  • projectionClass is temporarily set to public record TemroraryRecord(Integer apple, Integer cabbage, Integer chicken, String value, String rating) but it will eventually be public record NameValueRating(String name, String value, String rating) for the outer query. I can't have a class for the subquery because the number of result columns is variable.
  • similarStringGroups is basically the following:
Stream.of(
    Pair.of("apple",   Stream.of("pineapple", "yellowapple", "greenapple")),
    Pair.of("cabbage", Stream.of("carbage",   "cabbagge",    "bagge")),
    Pair.of("chicken", Stream.of("chakn",     "chkn",        "chikee"))
)
yaserso
  • 2,638
  • 5
  • 41
  • 73

0 Answers0