6

I have a long string (multiple paragraphs) which I need to split into a list of line strings. The determination of what makes a "line" is based on:

  • The number of characters in the line is less than or equal to X (where X is a fixed number of columns per line_)
  • OR, there is a newline in the original string (that will force a new "line" to be created.

I know I can do this algorithmically but I was wondering if python has something that can handle this case. It's essentially word-wrapping a string.

And, by the way, the output lines must be broken on word boundaries, not character boundaries.

Here's an example of input and output:

Input:

"Within eight hours of Wilson's outburst, his Democratic opponent, former-Marine Rob Miller, had received nearly 3,000 individual contributions raising approximately $100,000, the Democratic Congressional Campaign Committee said.

Wilson, a conservative Republican who promotes a strong national defense and reining in the size of government, won a special election to the House in 2001, succeeding the late Rep. Floyd Spence, R-S.C. Wilson had worked on Spence's staff on Capitol Hill and also had served as an intern for Sen. Strom Thurmond, R-S.C."

Output:

"Within eight hours of Wilson's outburst, his"
"Democratic opponent, former-Marine Rob Miller,"
" had received nearly 3,000 individual "
"contributions raising approximately $100,000,"
" the Democratic Congressional Campaign Committee"
" said."
""
"Wilson, a conservative Republican who promotes a "
"strong national defense and reining in the size "
"of government, won a special election to the House"
" in 2001, succeeding the late Rep. Floyd Spence, "
"R-S.C. Wilson had worked on Spence's staff on "
"Capitol Hill and also had served as an intern"
" for Sen. Strom Thurmond, R-S.C."
Karim
  • 18,347
  • 13
  • 61
  • 70

2 Answers2

14

EDIT

What you are looking for is textwrap, but that's only part of the solution not the complete one. To take newline into account you need to do this:

from textwrap import wrap
'\n'.join(['\n'.join(wrap(block, width=50)) for block in text.splitlines()])

>>> print '\n'.join(['\n'.join(wrap(block, width=50)) for block in text.splitlines()])

Within eight hours of Wilson's outburst, his
Democratic opponent, former-Marine Rob Miller, had
received nearly 3,000 individual contributions
raising approximately $100,000, the Democratic
Congressional Campaign Committee said.

Wilson, a conservative Republican who promotes a
strong national defense and reining in the size of
government, won a special election to the House in
2001, succeeding the late Rep. Floyd Spence,
R-S.C. Wilson had worked on Spence's staff on
Capitol Hill and also had served as an intern for
Sen. Strom Thurmond
Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
4

You probably want to use the textwrap function in the standard library:

http://docs.python.org/library/textwrap.html

Paul McMillan
  • 19,693
  • 9
  • 57
  • 71