5

I'm looking for a solution to remove/turn off the 2 spaces between columns that df.to_string creates automatically.

Example:

from pandas import DataFrame

df = DataFrame()
df = df.append({'a':'12345', 'b': '12345'})
df.to_string(index=False, header=False)
'12345  1235'

For clarity, the result is: '12345..12345' where the dots represent actual spaces.

I already tried the pandas.set_option and pandas.to_string documentation.

EDIT: The above example is overly simplified. I am working with an existing df that has spaces all over the place and the output text files are consumed by another blackbox program that is based off char-widths for each line. I've already figured out how to reformat the columns with formatters and make sure my columns are not cutoff by pandas default so I am 90% there (minus these auto spaces). FYI here are some good links on to_string() formatting and data-truncation:

Appreciate the help!

smci
  • 32,567
  • 20
  • 113
  • 146
PydPiper
  • 411
  • 5
  • 18
  • 3
    Well the stupid but simple solution is to replace all occurrences of two or more spaces with a single space... after a brief look at the code, I'm not seeing an obvious way to do it – Noah Aug 26 '18 at 22:33
  • 2
    You can't turn them off, and they are not a regular 2 spaces. They are created because of justification (which will be more apparent if you insert more rows with numbers of different lengths). Why do you need to do that anyway? If you need a concatenation of columns, then you should do that before printing. – Qusai Alothman Aug 26 '18 at 22:39
  • 1
    thanks for the feedback @Noah and Qusai. The example is overly simplified for clarity. I have a large df that I am working with that is of all different length columns with plenty of random spaces everywhere. A new task that came out of this project was to write out text files that is to be consumed by another program. I cannot touch the other program and it has a specific width of characters that it 'reads' for each column. Since I already have a df and saw there was a to_string option I figured I would give it a shot, but I ran into this issue – PydPiper Aug 26 '18 at 22:46
  • @PydPiper You could try to do the conversion with `re.sub` manually. – a_guest Aug 26 '18 at 23:00

3 Answers3

4

You can use the pd.Series.str.cat method, which accepts a sep keyword argument. By default sep is set to '' so there is no separation between values. Here are the docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.cat.html

You can also use pd.Series.str.strip to remove any leading or trailing whitespace from each value. Here are the docs: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.strip.html

Here's an example based on what you have:

df = pd.DataFrame({'a': ['12345'], 'b': ['12345']})
df.iloc[0].fillna('').str.strip().str.cat(sep=' ')

Note that fillna('') is required if there are any empty values.

Henry Woody
  • 14,024
  • 7
  • 39
  • 56
1

Even if this post is old, just in case that someone else comes nowadays like me:

df.to_string(header=False, index=False).strip().replace(' ', ''))

import random
  • 3,054
  • 1
  • 17
  • 22
Lucia
  • 11
  • 1
0

I also had the same problem. There is a justify option in to_string() which is supposed to help in this case. But I ended up doing it the old way:

[row['a']+ row['b'] for index, row in df.iterrows()]