Efficient class iteration in Python

Question

I have a python class called StudentGrades, that is something like

class StudentGrades:
    def __init__(self, scores):
        self.scores = scores

    def average(self):
        return sum(self.scores) / len(self.scores)

    def check_grade(self, threshold=0.7):
        avg = self.average()
        if avg >= threshold:
            return "Accepted"
        return "Rejected"

Then I have to make to use the check_grade method several times, so I'd use it like

jhon_grade = StudentGrades(scores=[0.8, 0.9])
ana_grade = StudentGrades(scores=[0.6, 0.2])

print(jhon_grade.check_grade()) # Accepted
print(ana_grade.check_grade()) # Rejected
print(jhon_grade.check_grade(0.9)) # Rejected
print(ana_grade.check_grade(0.9)) # Rejected

Is there a way I can modify this class to calculate the jhon's and ana's grades by avoiding initializing the class independently (or making a foor loop/ list comprehension), and checking several threshold at the same time, something like having

students = StudentGrades(scores = [[0.8, 0.9], [0.6, 0.2]])
grades = students.check_grades(threshold = [0.7, 0.9])
# returns 
[['Accepted', 'Rejected'] # Jhon (result of threshold 1, 2)
 ['Rejected', 'Rejected']] # Ana (result of threshold 1, 2)

EDIT: I'd expect something like class inheritance with my base methods, and then use some multiprocessing for making each calculation, but I'm not sure how to set it up

Thanks in advance!

score 0 · Answer 1 · answered Apr 05 '21 at 22:31

0

Use a list comprehension.

grades = [StudentGrades(scores = s).check_grade() for s in [[0.8, 0.9], [0.6, 0.2]]]

answered Apr 05 '21 at 22:31

Barmar

741,623
53
500
612

nbrix · Answer 2 · 2021-04-05T22:34:28.213

0

Try this:

students = [[0.8, 0.9], [0.6, 0.2]]
grades = [StudentGrades(scores).check_grade() for scores in students]

Output:

['Accepted', 'Rejected']

edited Apr 05 '21 at 22:34

answered Apr 05 '21 at 22:31

nbrix

296
2
6

Thanks for your answer, that somehow is what I'm trying to avoid, making the "user" to make for loops, since in the example I only put 1 parameter, but my real class needs to iterate in a grid of large parameters set – Rodrigo A Apr 05 '21 at 22:33

score 0 · Answer 3 · answered Apr 05 '21 at 22:58

Try this:

class StudentGrades:
    def __init__(self, scores: list, thresholds: list):
        self.scores = scores
        self.thresholds = thresholds

    def average(self, score):
        return sum(score)/len(score)

    def check_grade(self, scores, threshold=0.7):
        avg = self.average(scores)
        if avg >= threshold:
            return "Accepted"
        return "Rejected"

    def result(self):
        return [[self.check_grade(score, threshold) for threshold in self.thresholds] for score in self.scores]
        # you could replace the return for a yield here too


students = StudentGrades(scores=[[0.8, 0.9], [0.6, 0.2]], thresholds=[0.7, 0.9])
grades = students.result()
# returns [['Accepted', 'Rejected'], ['Rejected', 'Rejected']]

It's not elegant (to me), but it uses similar code to what you've already written and returns what you want. All I have done is added a function result which returns an array of arrays with the checked grades for each student, based on each threshold. When you initialise the class you pass in two array, the array of scores and the array of thresholds.

```scores=[[random.randrange(0,100)/100 for __ in range(100)] for _ in range(6000)];thresholds=[random.randrange(0,100)/100 for _ in range(6000)];students=StudentGrades(scores=scores, thresholds=thresholds)``` works well for alot of scores! — GAP2002, Apr 05 '21 at 23:25

Alexander L. Hayes · Answer 4 · 2021-04-06T00:05:14.857

I read two questions here:

How do I handle multiple students and thresholds?
How do I make this fast?

I'll focus on an answer to "How do I make this fast" since three answers have already mentioned how to handle multiple students.

Here are two steps:

Profile your code to find where most time is spent. Aanecdotally: there are usually a small handful of tight loops where your program spends the majority of its runtime. I've skipped this step here, but for more complicated examples this is needed.
Benchmark snippets and optimize, even rewriting in a lower-level language as needed.

Here's a version of your code + @nbrix's answer. I've modified the check_grade method to return True or False to be consistent with the numpy version shown next.

class StudentGrades:
    def __init__(self, scores):
        self.scores = scores

    def average(self):
        return sum(self.scores) / len(self.scores)

    def check_grade(self, threshold=0.7):
        avg = self.average()
        if avg >= threshold:
            return True
        return False

def run_student_grades(data):
    return [StudentGrades(scores).check_grade() for scores in data]

And here is a function I've written using numpy, which calculates the mean and whether the mean is greater than the 0.7 threshold:

import numpy as np

def run_numpy_student_grades(data):
    return np.mean(data, axis=1) > 0.7

For small inputs (two students, two assignments) there is probably no difference between these. In fact, using numpy is slightly slower:

- benchmark 'Small Input: Two Students, Two Assignments': 2 tests -
Name (time in us)                  Mean            Median          
-------------------------------------------------------------------
test_pure_python_small_input     4.9058 (1.0)      4.9330 (1.0)    
test_numpy_small_input           8.5494 (1.74)     8.5580 (1.73)   
-------------------------------------------------------------------

For big inputs (here: 1000 students, each with 100 assignments) the difference between these is substantial: the numpy version is ~250x faster than the Python version that initializes objects and does list comprehension over them.

------ benchmark 'Big Input: 1000 Students, 100 Assignments': 2 tests -----
Name (time in us)                     Mean                 Median          
---------------------------------------------------------------------------
test_numpy_big_input               55.5528 (1.0)          56.1480 (1.0)    
test_pure_python_big_input     13,675.3789 (246.17)   13,865.2100 (246.94) 
---------------------------------------------------------------------------

Which version is correct in practice will depend on your data and other outside factors: e.g. how many students and assignments you will realistically be working with.

Here is the benchmark code, assuming the run_* methods are implemented:

# File: `benchmark.py`
# Install: `pip install pytest pytest-benchmark numpy`
# Run with: `pytest benchmark.py`

import pytest
from demo_plain import run_student_grades
from demo_numpy import run_numpy_student_grades
import numpy as np
from numpy.random import default_rng

rng = default_rng(42)
two_students_two_assignments = np.array([[0.8, 0.9], [0.6, 0.2]])
thousand_students_hundred_assignments = rng.standard_normal(size=(1000, 100))


@pytest.mark.benchmark(group="Small Input: Two Students, Two Assignments")
def test_pure_python_small_input(benchmark):
    result = benchmark(run_student_grades, two_students_two_assignments)

@pytest.mark.benchmark(group="Small Input: Two Students, Two Assignments")
def test_numpy_small_input(benchmark):
    result = benchmark(run_numpy_student_grades, two_students_two_assignments)

@pytest.mark.benchmark(group="Big Input: 1000 Students, 100 Assignments")
def test_pure_python_big_input(benchmark):
    result = benchmark(run_student_grades, thousand_students_hundred_assignments)

@pytest.mark.benchmark(group="Big Input: 1000 Students, 100 Assignments")
def test_numpy_big_input(benchmark):
    result = benchmark(run_numpy_student_grades, thousand_students_hundred_assignments)

It's separate, but here's a version for handling multiple thresholds:

def run_numpy_student_grades_thresholds(data, thresholds):
    _avg = np.mean(data, axis=1)
    return np.c_[
        [_avg > threshold for threshold in thresholds]
    ]

print(run_numpy_student_grades_thresholds([[0.8, 0.9], [0.6, 0.2]], [0.7, 0.9]))
# [[ True False]
#  [False False]]

Efficient class iteration in Python

4 Answers4