I have about 30 files, the size of each is around 300MB. There are some information I'm interested in in each file, such as usernames. Now I want to find the usernames using regex, then find the most common usernames. Here's my code:
rList=[]
for files in os.listdir("."):
with open(files,'r') as f:
for line in f:
m=re.search('PATTERN TO FIND USERNAME',line)
if m:
rList.append(m.group())
c=Counter(rList)
print c.most_common(10)
Now as you can see, I add every username I find to a list and then call Counter(). This way it takes about several minutes to finish. I've tried removing the c=Counter(rList)
and calling c.update()
every time I finish reading a file, but it won't make any differnce, will it?
SO, is this the best practice? Are there any ways to improve the performance? Thanks!