Are there any good programs for dealing with reading large CSV files? Some of the datafiles I deal with are in the 1 GB range. They have too many lines for Excel to even deal with. Using Access can be a little slow, as you have to actually import them into a database to work with them directly. Is there a program that can open large CSV files and give you a simple spreadsheet layout to help you easily and quickly scan through the data?
-
Yes, there is. You can use [OpenRefine][1] (or Google Refine). OpenRefine is like a spreadsheet on steroids. The file size that you can manipulate depend on your computer's memory. [1]: http://openrefine.org – Estevão Lucas Oct 05 '15 at 21:52
7 Answers
I've found reCSVeditor is a great program for editing large CSV files. It's ideal for stripping out unnecessary columns. I've used it for files 1,000,000 record files quite easily.

- 22,600
- 28
- 79
- 90

- 81
- 1
- 1
-
+1 reCSVeditor worked for me with nearly 2GB file of >2,000,000 rows – Stuart Allen Jul 07 '13 at 09:03
-
hey, i downloaded the zip but i cant figure how to use it, can you please guide me how to? – aasthetic Jun 02 '14 at 10:20
-
@richi_18007 Recsveditor unzip the contents then run the installer – Bruce Martin Jun 26 '14 at 04:54
MySQL can import CSV files very quickly onto tables using the LOAD DATA INFILE
command. It can also read from CSV files directly, bypassing any import procedures, by using the CSV storage engine.
Importing it onto native tables with LOAD DATA INFILE
has a start up cost, but after that you can INSERT/UPDATE
much faster, as well as index fields. Using the CSV storage engine is almost instantaneous at first, but only sequential scan will be fast.
Update: This article (scroll down to the section titled Instant Data Loads) talks about using both approaches to loading CSV data onto MySQL, and gives examples.

- 4,886
- 3
- 28
- 22
-
i did work with Real Estate MLS datasets that consisted of 15-30MB CSV file's. Without MySQL LOAD INFILE, each feed would have taken a hour or more to process.... but using MySQL and raw tables I cut processing down to 5-6 minutes for even the larger data sets. – David Sep 18 '08 at 21:35
vEdit is great for this. I routinely open up 100+ meg (i know you said up to one gig, I think they advertise on their site it can handle twice that) files with it. It has regex support and loads of other features. 70 dollars is cheap for the amount you can do with it.

- 37,618
- 18
- 59
- 69

- 113,795
- 27
- 197
- 251
GVim can handle files that large for free if you are not attached to a true spreadsheet static field size view.

- 36,735
- 12
- 65
- 85
vEdit is great but don't forget you can always go back to "basics" check out Cygwin and start greping.
Helpfull commands
- grep
- head
- tail
- of course perl!

- 9,804
- 5
- 34
- 41
It depends on what you actually want to do with the data. Given a large text file like that you typically only want a smaller subset of the data at any one time, so don't overlook tools like 'grep' for pulling out the pieces you want to look for and work with.

- 11,894
- 12
- 69
- 85
If you can fit the data into memory and you like python then I recommend checking out the UniTable portion of Augustus. (Disclaimer: Augustus is open source (GPLv2) but I work for the company that writes it.)
It's not very well documented but this should help you get going.
from augustus.kernel.unitable import *
a = UniTable().from_csv_file('filename')
b = a.subtbl(a['key'] == some_value) #creates a subtable
It won't directly give you an excel like interface but with a little bit of work you can get many statistics out quickly.

- 17,926
- 9
- 33
- 53