2

I'm trying to create a database from existing csv files that are about 20,000 columns wide and 700 rows deep. In grails I would like the 20,000 column domain to belongTo another simpler domain (about 200 columns). But upon compilation I get:

java.lang.RuntimeException: Class file too large!

Which is understandable because it's way too much data. My question is, what is the best approach to handle this problem in grails? Should I simply break up the big table into separate domains? Look for a different table format?

I'm specifically worried about:

1) Search time, parsing search methods then delegating to sub domains.

2) Importing the data from the huge csv file into the domains.

janDro
  • 1,426
  • 2
  • 11
  • 24
  • Did the RuntimeException come from a domain class with 20,000 persistent properties, 200 persistent properties, or something else? – Jeff Scott Brown Oct 09 '14 at 16:41
  • It just came up when I tried doing a grails run-app after I added 20,000 persistent properties to a domain. But I think it's not even getting to the point where it's a grails issue it just can't compile it because it's too big. The stacktrace is similar to this question: http://stackoverflow.com/questions/17758510/groovy-configslurper-gives-class-file-too-large-runtimeexception – janDro Oct 09 '14 at 16:43
  • 1
    Did you have a script or something generate the source code for that domain class that has 20,000 properties or define all of those properties by hand? – Jeff Scott Brown Oct 09 '14 at 16:45
  • By hand. Had to extract the column names from the csv file then add types...it was awful :) – janDro Oct 09 '14 at 16:47
  • 14
    What year did you start extracting the names? – Burt Beckwith Oct 09 '14 at 16:55
  • Your tenacity is impressive. – toniedzwiedz Oct 09 '14 at 19:14
  • I am down voting this question because I simply do not believe that you hand wrote a class that has 20,000 properties. I am not sure why you might make that up, but there is just no way that anyone would ever actually do that. – Jeff Scott Brown Oct 10 '14 at 03:50
  • Just for clarity I copy pasted them in chunks from the csv file, then used sublime reg ex to replace commas with new lines and add types to each column. But either way it was stupid – janDro Oct 13 '14 at 17:55

1 Answers1

8

When you crash into a JVM size limit like this, take it as a big hint that your approach is way off. As I mentioned in another question earlier this week, we shouldn't even know what these limits are, much less be anywhere near hitting them.

I don't see much benefit in using something like GORM or even an O-O approach in general to this much data. It's not an object in any realistic, usable sense - it's a massive bunch of data. You'll need to programmatically access everything anyway even if it did work, since hand-managing the code for that would be crazy amounts of code. Do you really plan on creating one or more instances of these beasts and passing them around as method args?

You'll need to look at this from a big data perspective, not an ORM perspective.

Burt Beckwith
  • 75,342
  • 5
  • 143
  • 156
  • 1
    Hmm ok I see. So what if I rearrange my domain file to have the column names be some generic attribute with some value (the original type I was going to assign) and then have id + fk to sort them? Would that work? Basically make the domains waayyy smaller and end up with 20,000*600 rows? btw so awesome to get a question answered by THE Burt Beckwith :) – janDro Oct 09 '14 at 17:06
  • 2
    There are way too many followup questions to ask, and those will send this off into a discussion, but SO isn't designed for that. It's best at addressing focused questions and providing one or more possible focused answers. You need to think about the structure of the data, how it'll be used, and research tools. This might be better in a NoSQL DB, but there's a lot of options there. And you also need to look at recently developed tools that can assist in analysis and data management. In both cases these won't be easy decisions and you need to not lock into 1 approach. – Burt Beckwith Oct 09 '14 at 17:20