-1

So I have the following code, that puts a delimiter of ~||~ after every semicolon, or 500 characters. This is working, but is removing the semicolons when it finds them. I have looked on here, and found an answer, but I can't get this to work in my code.

chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
    length = len(line)
    if length > chunk_len:
        chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
        lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()

I found this solution on here, but I couldn't find a way to incorporate it into my code. Sorry for the duplicated question.

d = ">"
for line in all_lines:
    s =  [e+d for e in line.split(d) if e != ""]
user3754031
  • 3
  • 1
  • 3
  • `"doesnt work"` means nothing to us ... what do you mean by cant get it to work? – Joran Beasley Jul 24 '14 at 16:23
  • Sorry. I've tried two different ways. One time it kept the semicolons, but wasn't doing the chunks of every 100 characters. The other time it still removed the semicolon. Sorry for being unclear. – user3754031 Jul 24 '14 at 16:26
  • possible duplicate of [tokenize a string keeping delimiters in Python](http://stackoverflow.com/questions/1820336/tokenize-a-string-keeping-delimiters-in-python) – skrrgwasme Jul 24 '14 at 16:28

3 Answers3

1

If I'm understanding your question correctly, what you're really trying to do is insert your own delimiter after every semicolon, and every 500 characters. Try doing this in two steps:

with open(filename, "r") as fi: # read in file using "with" statement
    text = fi.read()

block_size = 500            # sets how many characters separate new_delim
old_delim = ";"             # character we are adding the new delimiter to
new_delim = "~||~"          # this will be inserted every block_size characters
del_length = len(new_delim) # store length to prevent repeated calculations

for i in xrange(len(line)/block_size): 
    # calculate next index where the new delimiter should be inserted
    index = i*block_size + i*del_length + block_size

    # construct new string with new delimiter at the given index        
    text = "{0}{0}{1}".format(text[:index], new_delim, text[index:]) 

replacement_delim = old_delim + new_delim # old_delim will be replaced with this

with open(outputfile, 'w') as fo:
    # write out new string with new delimiter appended to each semicolon
    fo.write(text.replace(old_delim, replacement_delim))

If semicolons happen to occur at a multiple of 500 characters, you may end up with two of your special delimiters next to each other. Also, if you have exactly a multiple of block_size characters in your string, you will have your delimiter at the end of the string.

Also, this may not be the best approach if you have very long files you're reading in. The For loop is creating a whole new string every time your delimiter is inserted.

This approach makes the split method's treatment of delimiters a null point.

skrrgwasme
  • 9,358
  • 11
  • 54
  • 84
  • Alright. I'll try this method. I have to change it slightly because I'm running 2.4.2, and don't have "with open" method, but surprsignly I know how to change that. Thank you. – user3754031 Jul 24 '14 at 18:13
0

change

lines = text.split(';')

to

lines = filter(None,re.split('([^;]+;)',text))

and that should keep the semicolon ... or just add it in later like in the other answer

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
-2

split() splits a string and removes the delimiter, you just need to add it back in. I did it below in your loop: line = line + d

chunk_len = 100
split_char = ';'
delim = ("~||~")
d = ";"
f = open(filename, "r")
text = f.read()
f.close()
lines = text.split(';')
for lines_idx, line in enumerate(lines):
    line = line + d  #NEW LINE ADDED HERE
    length = len(line)
    if length > chunk_len:
        chunks = [line[idx:idx+chunk_len]for idx in range(0,length,chunk_len)]
        lines[lines_idx] = delim.join(chunks)
new_text = delim.join(lines)
f = open(outputfile, 'w')
f.write(new_text)
f.close()
pgreen2
  • 3,601
  • 3
  • 32
  • 59
  • This worked! I knew it was something simple to make me look stupid. Thank you! I would upvote but I'm too noob. :( – user3754031 Jul 24 '14 at 16:34
  • @user3754031, Since this is your question, you should be able to accept the answer. You click the check box below the votes. – pgreen2 Jul 24 '14 at 16:36