0
import re
import sys, getopt
import mmap

shakes = open(sys.argv[1:][0],'r')
love = open(sys.argv[1:][1], "w")
#moreLove = open (sys.argv[1:][2], "w")
#HardLove = open (sys.argv[1:][3], "w")

node =  re.compile('\*NODE[a-zA-Z, \r\n\t0-9\.-]+')
element3 = '\*ELEMENT, TYPE=S3RS[a-zA-Z, \r\n\t0-9\.=;_-]+'
element4 =  '\*ELEMENT, TYPE=S4RS[a-zA-Z, \r\n\t0-9\.=;_-]+'

m = mmap.mmap(shakes.fileno(), 0, access=mmap.ACCESS_READ)

line = node.findall(m.read().decode('utf-8'))
#for item in line:
#  love.write(item)
#print(m.read())
print(line)

Following is the code in which I am trying to apply the regex on the complete file. whenever i test this code on the smaller files < 1MB the code works fine but on large files it is not working and returning empty arrays. Below is the sample of my data which I am trying to parse. normally it involves 3M rows of such data.

*Assembly, name=Assembly
**  
*Instance, name=vessel-1, part=vessel_bot
*Node
      1,   24.8572464,     213.8125,   53.1415176
      2,   41.4983292,     213.8125,   41.4983292
      3,   44.4593391,     213.8125,   44.4593391
      4,   28.0079861,     213.8125,   56.2922592
      5,   24.8572464,     233.8125,   53.1415176
      6,   28.0079861,     233.8125,   56.2922592
      7,   48.2778168,     233.8125,   61.0057411
      8,    46.156498,     233.8125,   61.0057411
      9,   53.5811195,     223.3125,   53.5811195
     10,    54.641777,     224.8125,    54.641777
     11,   49.6920319,     233.8125,   62.4199524
     12,   56.0559921,     224.8125,   56.0559921
     13,   50.7526894,     233.8125,   61.3592911
     14,   56.0559921,     226.3125,   56.0559921
     15,   41.4983292,     226.3125,   41.4983292
     16,   35.8528366,     233.8125,   46.4594383
     17,   37.5893517,     233.8125,   52.4385948
     18,   45.8735542,     223.3125,   45.8735542
     19,   44.4593391,     221.3125,   44.4593391
     20,   35.0599136,     233.8125,   52.1926079
     21,   34.0794373,     233.8125,    44.686039
     22,   31.5089321,     233.8125,   44.4683838
     23,   38.5373192,     243.3125,   38.5373192
  • 2
    I think you're trying to get too fancy. There's no need to read the whole file into memory. Just open the file and iterate line by line (for line in f). – Jonathon Reinhart Nov 18 '15 at 14:12
  • Your sample just has `*Node`, but you're trying to match `*NODE` as case sensitive. – SuperBiasedMan Nov 18 '15 at 14:15
  • Also you never use `element3` or `element4`, are they supposed to do anything? – SuperBiasedMan Nov 18 '15 at 14:16
  • i am using them but they are just next steps of the code; @JonathonReinhart I can' iterate it line by line as there are different chunks in the F.E model file which have to be searched at once instead of line by line – Dexter Abeer Nov 18 '15 at 14:23
  • As a side note, mmap()ping a file and then reading all of it into physical memory anyway is rather pointless. You should read the data into an in-memory array instead – three million rows with 4 number each will use about 50MB of memory. – Sven Marnach Nov 18 '15 at 14:23
  • @SvenMarnach i managed to resolve that issue but the result returned to me after findall is [b'*Node, nset'] – Dexter Abeer Nov 18 '15 at 14:34
  • Closing as "can't be reproduced" due to the OP's additional input (as an answer). – Jonathon Reinhart Nov 18 '15 at 15:58

0 Answers0