0

I have a text file that contains > 500 million lines and it is structured as follows:

54517. lat:53.533459; lon:8.8005426; path:c:\brem_5.xml;
54518. lat:53.037579; lon:8.800404; path:c:\brem_5.xml;
54519. lat:53.03358275; lon:8.610994; path:c:\brem_5.xml;
54520. lat:53.027389; lon:8.797809; path:c:\brem_6.xml;
54521. lat:53.043866; lon:8.7971675; path:c:\brem_7.xml;
54522. lat:53.0311901; lon:8.794269; path:c:\brem_7.xml;
....
....
....

and I am writing a method that given the "lat" and "lon" it should return the path. I thought about dividing the huge file into sections "sec0,sec1,sec2,sec3,...,secn", and then creating threads for each section that looks for that "lat" and "lon", and when a thread returns the path, the other threads will be killed.

my question is, is my approach is valid? and what is the optimum solution for such a problem.

Lawa Fazil
  • 161
  • 1
  • 14
rmaik
  • 1,076
  • 3
  • 15
  • 48
  • 1
    Databases deal with this kind of task so much better. – biziclop Aug 07 '15 at 15:29
  • @biziclop what kind of database? would u please recommend which DB i should use? – rmaik Aug 07 '15 at 15:30
  • Relational databases are usually the natural choice for tabular data, but your data is so simple that any kind of database really. – biziclop Aug 07 '15 at 15:32
  • @biziclop please bear with me, can i use SQLite with java, or do u recommedn somthing else? thanx – rmaik Aug 07 '15 at 15:32
  • 1
    You could try SQL/SQLite since all the fields are the same (or so it seems). If the fields are different in different parts of your file then MongoDB would be a better choice. To answer your original question, parsing a file with multiple threads is valid, so long as you are aware of where 1 thread begins and 1 thread ends and that you do not modify the file. – Jeremy Fisher Aug 07 '15 at 15:35
  • So, you are given a text file with `(key,value)` pairs `((lat,long), path)`. Read it once and input the data then query when needed. – Dan Aug 07 '15 at 15:35
  • @JeremyFisher yes i will trs the DB solution, but my concern is which is faster to use DB or to divide the file into threads? – rmaik Aug 07 '15 at 15:37
  • @Dan do u mean using DB? – rmaik Aug 07 '15 at 15:40
  • If you are going to use a DB, you will have to parse the whole file anyway. If you are only going to find one path from the file then a multithreaded approach would be better than storing into a DB. Since I believe you'll probably query the file more than once, reading a file into a DB is more optimal. The 2 approaches need not be mutually exclusive since you could also read the file with a multithreaded implementation and save the information to a file, but the time that is saved in this case is only used towards populating the database faster. – Jeremy Fisher Aug 07 '15 at 15:41
  • @JeremyFisher do u think i can store the whole file into the DB instead of saving its path only. i mean can i insert a whole file into DB table? – rmaik Aug 07 '15 at 15:47
  • Of course. As Dan mentioned, you can parse each line, split the line by `;`, and save the `lat, lon` as a key and `path` as a value in the DB. If you like this `Key, value` approach then MongoDB is a good choice for this. However, as other users mentioned, your data seems tabular enough to just use SQL(ite). – Jeremy Fisher Aug 07 '15 at 15:51
  • 1
    @rmaik if you have enough memory, you could do something like make a dictionary/map/hash. Probably would not be best but it could be interesting to try. – Dan Aug 07 '15 at 16:28
  • @Dan can u please clarify more what do u ean by dictionary/map/hash?? – rmaik Aug 08 '15 at 11:46
  • @JeremyFisher i learned how to use sqlite with java, and i created a test table in a test database, but what i want to know is, how i suppose to find my created tables on the hard drive as well as the database file ".db". i searched my hard drive for it , but i could not find them...Kind Regards, – rmaik Aug 09 '15 at 15:27
  • 1
    Look under java docs for built in object types. http://docs.oracle.com/javase/7/docs/api/java/util/Map.html - This provides a listing of objects that use the map interface at the bottom. It might be a bit much for 500 million lines/entries. – Dan Aug 10 '15 at 17:08

1 Answers1

2

I'd suggest MySQL. Create a table; ID, Lat, Long, Path. Write a script to insert in all the data. And then parse the data like select path from table where lat = x and long = y

j.con
  • 879
  • 7
  • 19
  • can i insert the whole file instead of inserting its path?something like: Create a table; ID, Lat, Long, File.xml – rmaik Aug 08 '15 at 11:49