0

In my database I have citations between objects as a ManyToMany field. Basically, every object can cite any other object.

In Postgres, this has created an intermediate table. The table has about 12 million rows, each looks roughly like:

 id | source_id | target_id 
----+-----------+-----------
 81 |    798429 |    767013
 80 |    798429 |    102557

Two questions:

  1. What's the most Django-tastic way to select this table?

  2. Is there a way to iterate over this table without pulling the entire thing into memory? I'm not sure Postgres or my server will be pleased if I do a simple select * from TABLE_FOO.

mlissner
  • 17,359
  • 18
  • 106
  • 169
  • For the select, it depends on what you want to query. Any examples? – Shang Wang Aug 03 '15 at 17:41
  • @ShangWang, I want to select the whole intermediate table. – mlissner Aug 03 '15 at 17:43
  • You can use through table for many to many relation and iterate over there. https://docs.djangoproject.com/en/1.8/topics/db/models/#extra-fields-on-many-to-many-relationships – Geo Jacob Aug 03 '15 at 17:46
  • I have used mysql for something called `Server side cursor` to deal with iterating large data set. I've never tried postgres in django, but I'm sure they support something similar. – Shang Wang Aug 03 '15 at 17:48
  • 1
    Here's a tool you can try, https://github.com/niwinz/djorm-ext-core. – Shang Wang Aug 03 '15 at 17:55

1 Answers1

2

The solution I found to the first question was to grab the through table and then to use values_list to get a flattened result.

So, from my example, this becomes:

through_table = AcademicPaper.papers_cited.through
all_citations = through_table.objects.values('source_id', 'target_id')

Doing that runs the very basic SQL that I'd expect:

print all_citations.query
SELECT 'source_id', 'target_id' FROM my_through_table;

And it returns flattened ValueList objects, which are fairly small and I can work with very easily. Even in my table with 12M objects, I was actually able to do this and put it all in memory without the server freaking out too much.

So this solved both my problems, though I think the advice in the comments about cursors looks very sound.

mlissner
  • 17,359
  • 18
  • 106
  • 169