1

I'm using Memgraph Lab for a project related to music genres and I imported a dataset structured something like this: The dataset is composed of 2k users. Each user is defined by id and a list of genres he loves. The edges represent the mutual friendship between the users. The genres are listed in the order the users have added them. First I wanted to count all of the genres and managed to do that by running this query:

MATCH (n)
WITH n, "Pop" AS genre
WHERE genre IN n.genres
RETURN genre, count(n);

My issue is that now if we assume that users picked the genres in order of preference, my goal is to create a query or a query module that tells us in what percentage each genre appears in top n place and I'm stuck on creating that

mattrixxxx
  • 23
  • 3

1 Answers1

1

I don't know about the particular query, but you can make it easier for yourself by creating a Query Module and implement all of this in that way. I suppose something like this would work:

import mgp
from collections import defaultdict

@mgp.read_proc
def genre_count(context: mgp.ProcCtx,
                genre: str) -> mgp.Record(genre=str, count=int):
    count = len(
        [v for v in context.graph.vertices if genre in v.properties['genres']])
    return mgp.Record(genre=genre, count=count)
    
@mgp.read_proc
def in_top_n_percentage(context: mgp.ProcCtx,
                        n: int) -> mgp.Record(genre=str,
                                              percentage=float,
                                              size=int):
    genre_count = defaultdict(lambda: {'total_count': 0, 'in_top_n_count': 0})

    for v in context.graph.vertices:
        for index, genre in enumerate(v.properties['genres']):
            genre_count[genre]['total_count'] += 1
            genre_count[genre]['in_top_n_count'] += index < n

    def get_record(genre, counts): return mgp.Record(
        genre=genre,
        percentage=counts['in_top_n_count'] / counts['total_count'],
        size=counts['total_count']
    )

    return [get_record(
        genre,
        counts) for genre,
        counts in genre_count.items()]
MPesi
  • 212
  • 8