Usually I'm able to find the answers to my dilemmas pretty quickly on this site but perhaps this problem requires a more specific touch;
I have a ~50 million long unicode string I download from a Tektronix Oscilloscope. Getting this assigned is a pain in a** for memory (sys.getsizeof() reports ~100 MB)
The problem lies in that I need to turn this into a CSV so that I can grab 10,000 of the 10 million Comma Sep Values (this is fixed)... 1) I have tried split(",") method, using this, the RAM usage on the python kernel SPIKES another 300 MB....BUT the process is VERY efficient (except when I loop this ~100 times in one routine...somewhere between iterations 40-50, the kernel spits back a memory error.) 2) I wrote my own script that after downloading the absurdly long string, scans the the number of commas until I see 10,000 and stops, turning all the values between the commas into floats and populating an np array. This is pretty efficient from a memory usage perspective (from before importing file to after running script, memory usage only changes by 150MB.) However it is MUCH slower, and usually results in a kernel crash shortly after completion of the 100x loops.
Below is the code used to process this file, and if you PM me, I can send you a copy of the string for experimenting (however I'm sure it may be easier to generate one)
Code 1 (using split() method)
PPStrace = PPSinst.query('CURV?')
PPStrace = PPStrace.split(',')
PPSvals = []
for iii in range(len(PPStrace)): #does some algebra to values
PPStrace[iii] = ((float(PPStrace[iii]))-yoff)*ymult+yzero
maxes=np.empty(shape=(0,0))
iters=int(samples/1000)
for i in range(1000): #looks for max value in 10,000 sample increments, adds to "maxes"
print i
maxes = np.append(maxes,max(PPStrace[i*iters:(i+1)*iters]))
PPS = 100*np.std(maxes)/np.mean(maxes)
print PPS," % PPS Noise"
Code 2 (self generated script);
PPStrace = PPSinst.query('CURV?')
walkerR=1
walkerL=0
length=len(PPStrace)
maxes=np.empty(shape=(0,0))
iters=int(samples/1000) #samples is 10 million, iters then is 10000
for i in range(1000):
sample=[] #initialize 10k sample list
commas=0 #commas are 0
while commas<iters: #if the number of commas found is less than 10,000, keep adding values to sample
while PPStrace[walkerR]!=unicode(","):#indexes commas for value extraction
walkerR+=1
if walkerR==length:
break
sample.append((float(str(PPStrace[walkerL:walkerR]))-yoff)*ymult+yzero)#add value between commas to sample list
walkerL=walkerR+1
walkerR+=1
commas+=1
maxes=np.append(maxes,max(sample))
PPS = 100*np.std(maxes)/np.mean(maxes)
print PPS,"% PPS Noise"
Also tried Pandas Dataframe with StringIO for CSV conversion. That thing gets memory error just trying to read it into a frame.
I am thinking the solution would be to load this into a SQL table and then pull CSV in 10,000 sample chunks (which is intended purpose of the script). But I would love to not do this!
Thanks for all your help guys!