1

i have a large csv file to edit (tens of thousands lines) , the task is to edit it 'friendly' anf fast, so i would like to use PyQt QTableView to hold it, there are also some filter required, basically i need to do some "update price = 200 where name = 'Jack'" operation,

i've came up with few options but wonder if we could just combine the advantages

update: the code was on workstation and could not touch internet, i'll write psudo code instead, sorry

1, import into local sqlite

it's more than a minute to the disk and less than 10 seconds to memory, with a QSqlTableModel, it's relative fast, and adding a filter is easy, the output are not totally displayed until it's scrolled, which is good

# use pyqt and sqlite
fh = open("path/to/csv/file")    
query = QSqlQuery()
query.prepare("""insert into table""")
for line in fh:
  id, name, price = line.split(",")
  query.bindValue()
  query.exec()

self.view = QTableView()
self.model = QSqlTableModel()
self.model.setTable(table)
self.view.setModel(self.model)
# some line edit gather pattern input
self.model.setFilter("id = 1003")

2, just parse csv into model and display, i referenced pyqt - populating QTableWidget with csv data with the answer of user1006989

just put cell into QStandardItem, the Model/View works fine, but load large files is too slow ( about 20 seconds here ), and don't know how to implement a filter ( if we just skip when load into model, how can i write data back )

3, command line replace

I've implement both option 1 and 2, it's not very quick but might be acceptable, here I wonder if compose a Perl-like regex replacement could help, ( we need to see it's original value first )

proposed work flow is

regex search ==A> display / populate in table / model ==B> accept edit ==C> prepare a regex replace

gather pattern with pyqt line edit
grep $pattern large_csv > small_csv
cp small_csv small_csv.bk
populdate small_csv into model and display in table widget
commit change
diff small_csv small_csv.bk and prepare a 
perl -ie 's/old_line/new_line' large_csv

which came up with a solution that i do not have to load the full content into database or populate into widget, this should be even faster, while still equiped with a filter, but i got some problem on ==C>

if the filter gots 4 record and i edit one of them, just backup the filter result and then diff, and then prepare a whole line prelacement ? ( each line is unique, there are some primary key things inside and won't change )

Hope some one could review my thoughts or give some advice,

Thanks a lot for your time :)

Community
  • 1
  • 1
Jack Wu
  • 447
  • 4
  • 14
  • 1
    That prose is TLDR. Where's your code? – decltype_auto Dec 10 '15 at 15:08
  • sorry that it's on another offline workstation, i just left some psudo code ( sorry for the bad regex or grammer), hope you could get what i've did and supposed to do, thanks again for your attention – Jack Wu Dec 10 '15 at 15:27

1 Answers1

1

finally i did as the 3rd way, but 'grep files into a smaller result' was also did in python cause it's windows,

# use regex to simulate grep
# write to a new smaller file
# populate into table widget model    
# save changed model back into file

# compare two files, when mismatch found, write the orignal large file once
# but only one line changed, there seems have to be 
#  m ( orginal lines ) * n ( changed lines ) loop
# accumulate changes until done

it's faster, it's less than 1 second to grep and left than 2 second for each changed line, still do not have a good way to do multiple replacement in file

which means in cureent version, if

# befoer:after
# ... lines not greped
# A > A
# B > B1
# C > C
# D > D1
# E > E1
# ... lines not greped

i have to loop the whole large file for B > B1, even there is only one match, and then D1 and E1, there should be a more complex regex replace

Jack Wu
  • 447
  • 4
  • 14