How do I select one or more random rows from a table using SQLAlchemy?
9 Answers
This is very much a database-specific issue.
I know that PostgreSQL, SQLite, MySQL, and Oracle have the ability to order by a random function, so you can use this in SQLAlchemy:
from sqlalchemy.sql.expression import func, select
select.order_by(func.random()) # for PostgreSQL, SQLite
select.order_by(func.rand()) # for MySQL
select.order_by('dbms_random.value') # For Oracle
Next, you need to limit the query by the number of records you need (for example using .limit()
).
Bear in mind that at least in PostgreSQL, selecting random record has severe perfomance issues; here is good article about it.

- 5,724
- 3
- 24
- 45

- 35,061
- 4
- 33
- 33
-
14+1. Same as Postgres works for SQLite: `select.order_by(func.random()).limit(n)` – mechanical_meat Apr 24 '10 at 07:11
-
You can use order_by('dbms_random.value') in Oracle. – Buttons840 May 21 '12 at 19:53
-
I've added the import as best as I can tell without SQLAlchemy installed - can someone confirm it's right..? Also edited bernie and Buttons840 suggestions into the answer – dbr Sep 22 '12 at 20:32
-
14If you are using declarative models: `session.query(MyModel).order_by(func.rand()).first` – trinth Jun 04 '13 at 17:46
-
3Thanks @trinth, it worked when I added paranthesis to the end: `session.query(MyModel).order_by(func.rand()).first()` – Kent Munthe Caspersen Dec 07 '15 at 15:05
-
Why `.rand` and `.random`? Shouldn't SQLAlchemy take care of such DB-specific idiosyncrasies for us? – rr- Apr 03 '16 at 11:37
-
9Since SQLAlchemy v0.4, `func.random()` is a generic function that compiles to the database's random implementation. – RazerM Nov 24 '16 at 14:02
-
appears that func.random() does not work with oracle. suggest using `select.order_by(sqlalchemy.literal_column('dbms_random.value'))` to avoid warnings – user1445240 Apr 16 '18 at 19:35
Here's four different variations, ordered from slowest to fastest. timeit
results at the bottom:
from sqlalchemy.sql import func
from sqlalchemy.orm import load_only
def simple_random():
return random.choice(model_name.query.all())
def load_only_random():
return random.choice(model_name.query.options(load_only('id')).all())
def order_by_random():
return model_name.query.order_by(func.random()).first()
def optimized_random():
return model_name.query.options(load_only('id')).offset(
func.floor(
func.random() *
db.session.query(func.count(model_name.id))
)
).limit(1).all()
timeit
results for 10,000 runs on my Macbook against a PostgreSQL table with 300 rows:
simple_random():
90.09954111799925
load_only_random():
65.94714171699889
order_by_random():
23.17819356000109
optimized_random():
19.87806927999918
You can easily see that using func.random()
is far faster than returning all results to Python's random.choice()
.
Additionally, as the size of the table increases, the performance of order_by_random()
will degrade significantly because an ORDER BY
requires a full table scan versus the COUNT
in optimized_random()
can use an index.

- 22,014
- 12
- 72
- 88
-
What about picking samples? Like what `random.sample()` do? What is optimized way here? – hamidfzm Nov 10 '16 at 15:19
-
Open a new question and link to it and I'll take a stab at answering. If possible, specify the underlying flavor of SQL as that influences the answer as well. – Jeff Widman Nov 11 '16 at 06:12
-
1
If you are using the orm and the table is not big (or you have its amount of rows cached) and you want it to be database independent the really simple approach is.
import random
rand = random.randrange(0, session.query(Table).count())
row = session.query(Table)[rand]
This is cheating slightly but thats why you use an orm.

- 17,907
- 2
- 35
- 27
There is a simple way to pull a random row that IS database independent. Just use .offset() . No need to pull all rows:
import random
query = DBSession.query(Table)
rowCount = int(query.count())
randomRow = query.offset(int(rowCount*random.random())).first()
Where Table is your table (or you could put any query there). If you want a few rows, then you can just run this multiple times, and make sure that each row is not identical to the previous.

- 1,723
- 22
- 30
-
Update - at around 10 million rows in mysql this actually started to get a little slow I guess you could optimize it. – GuySoft Apr 16 '13 at 18:31
-
1
-
1Now up at 11 million rows on Oracle.... not so good anymore :-) Linear degradation, but still... I have to find something else. – Mario Jul 16 '13 at 10:16
-
-
I haven't done any workbench tests, but I changed the query to this: query.offset(random.randint(0, rowCount - 1)).limit(1).first() and it's working fine. But my database is pretty small compared to yours. – Jayme Tosi Neto Jun 27 '14 at 17:43
-
2@Jayme: you could use `query.offset(random.randrange(rowCount)).limit(1).first()`. – jfs Sep 11 '16 at 09:05
-
1@Jayme also, is there a reason to use `.limit(1)` before `.first()`? It seems redundant. Perhaps, `query.offset(random.randrange(row_count)).first()` is enough. – jfs Sep 12 '16 at 19:16
-
@J.F.Sebastian, I don't remember whether the sql generate was better, or I used it just to not have to handle the IndexError exception in case of no results. Because of accessing "query[0]". But the docs say that first already applies the limit of one. Seems you are right again ;) – Jayme Tosi Neto Sep 13 '16 at 19:25
-
@J.F.Sebastian found it here: http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.first I guess the limit was kept because of another piece of code I used to get pages of random items and reduced it to get just one. Well, anyway thanks! – Jayme Tosi Neto Sep 13 '16 at 19:26
Some SQL DBMS, namely Microsoft SQL Server, DB2, and PostgreSQL have implemented the SQL:2003 TABLESAMPLE
clause. Support was added to SQLAlchemy in version 1.1. It allows returning a sample of a table using different sampling methods – the standard requires SYSTEM
and BERNOULLI
, which return a desired approximate percentage of a table.
In SQLAlchemy FromClause.tablesample()
and tablesample()
are used to produce a TableSample
construct:
# Approx. 1%, using SYSTEM method
sample1 = mytable.tablesample(1)
# Approx. 1%, using BERNOULLI method
sample2 = mytable.tablesample(func.bernoulli(1))
There's a slight gotcha when used with mapped classes: the produced TableSample
object must be aliased in order to be used to query model objects:
sample = aliased(MyModel, tablesample(MyModel, 1))
res = session.query(sample).all()
Since many of the answers contain performance benchmarks, I'll include some simple tests here as well. Using a simple table in PostgreSQL with about a million rows and a single integer column, select (approx.) 1% sample:
In [24]: %%timeit
...: foo.select().\
...: order_by(func.random()).\
...: limit(select([func.round(func.count() * 0.01)]).
...: select_from(foo).
...: as_scalar()).\
...: execute().\
...: fetchall()
...:
307 ms ± 5.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [25]: %timeit foo.tablesample(1).select().execute().fetchall()
6.36 ms ± 188 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [26]: %timeit foo.tablesample(func.bernoulli(1)).select().execute().fetchall()
19.8 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Before rushing to use SYSTEM
sampling method one should know that it samples pages, not individual tuples, so it might not be suitable for small tables, for example, and may not produce as random results, if the table is clustered.
If using a dialect that does not allow passing the sample percentage / number of rows and seed as parameters, and a driver that does not inline values, then either pass the values as literal SQL text if they are static, or inline them using a custom SQLA compiler extension:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql import TableSample
@compiles(TableSample)
def visit_tablesample(tablesample, self, asfrom=False, **kw):
""" Compile `TableSample` with values inlined.
"""
kw_literal_binds = {**kw, "literal_binds": True}
text = "%s TABLESAMPLE %s" % (
self.visit_alias(tablesample, asfrom=True, **kw),
tablesample._get_method()._compiler_dispatch(self, **kw_literal_binds),
)
if tablesample.seed is not None:
text += " REPEATABLE (%s)" % (
tablesample.seed._compiler_dispatch(self, **kw_literal_binds)
)
return text
from sqlalchemy import table, literal, text
# Static percentage
print(table("tbl").tablesample(text("5 PERCENT")))
# Compiler inlined values
print(table("tbl").tablesample(5, seed=literal(42)))

- 50,538
- 7
- 126
- 127
-
Has anyone been able to implement this with an Azure SQL db? I'm tring but i get ``` ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Variables are not allowed in the TABLESAMPLE or REPEATABLE clauses. (497) (SQLExecDirectW); [42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Statement(s) could not be prepared. (8180)') ``` – Robert Franklin Oct 12 '21 at 10:16
-
Sounds like an interesting edge case that the implementation does not cover; by the looks of it passing the percentage or seed as a parameter is not supported. The Azure dialect would need to inline the argument in the query compiler. I can produce an example of that later today. – Ilja Everilä Oct 12 '21 at 12:17
-
This is my function to select random row(s) of a table:
from sqlalchemy.sql.expression import func
def random_find_rows(sample_num):
if not sample_num:
return []
session = DBSession()
return session.query(Table).order_by(func.random()).limit(sample_num).all()

- 267
- 1
- 4
- 9
This is the solution I use:
from random import randint
rows_query = session.query(Table) # get all rows
if rows_query.count() > 0: # make sure there's at least 1 row
rand_index = randint(0,rows_query.count()-1) # get random index to rows
rand_row = rows_query.all()[rand_index] # use random index to get random row

- 2,653
- 22
- 26
-
1This would be incredibly slow on big tables. You would be grabbing every single row and then slicing it up. – Matthew Oct 17 '18 at 18:04
-
1Wow yeah, this is not great. If there is a query to get table record count, that would be a better approach. This was done on a web-app with a small DB, no longer working with that company, so I can't do much about it. – ChickenFeet Oct 18 '18 at 07:34
this solution will select a single random row
This solution requires that the primary key is named id, it should be if its not already:
import random
max_model_id = YourModel.query.order_by(YourModel.id.desc())[0].id
random_id = random.randrange(0,max_model_id)
random_row = YourModel.query.get(random_id)
print random_row

- 1,526
- 17
- 16
Theres a couple of ways through SQL, depending on which data base is being used.
(I think SQLAlchemy can use all these anyways)
mysql:
SELECT colum FROM table
ORDER BY RAND()
LIMIT 1
PostgreSQL:
SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1
MSSQL:
SELECT TOP 1 column FROM table
ORDER BY NEWID()
IBM DB2:
SELECT column, RAND() as IDX
FROM table
ORDER BY IDX FETCH FIRST 1 ROWS ONLY
Oracle:
SELECT column FROM
(SELECT column FROM table
ORDER BY dbms_random.value)
WHERE rownum = 1
However I don't know of any standard way

- 29,364
- 31
- 116
- 182
-
8Yeah. I know how to do it in SQL (I posted that answer in http://beta.stackoverflow.com/questions/19412/how-to-request-a-random-row-in-sql#19568) but was searching for a SQLAlchemy specific solution. – cnu Sep 13 '08 at 20:14