1

I'm writing a simple Sublime Text plugin to trim extra, unnecessary, spaces between words but without touching the leading spaces not to mess up Python formatting.

I have:

[spaces*******are********here]if****not***regions***and**default_to_all:

and want to get:

[spaces***are***still****here]if not regions and default_to_all:

Thinking about

regions = view.find_all('\w\s{2,}\w')
view.erase(edit, region)

but it cuts out the first and the last letter too.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • 2
    Can you add an example of a string that my answer won't catch? I'm slightly confused about what you need exactly. – Lord Elrond Jan 12 '20 at 22:06

2 Answers2

1

If I understand correctly, this should work:

>>> r = re.compile(r'( *[\S]*)(?: +)(\n)?')
>>> s = '       if   not regions    and  default_to_all:\n     foo'
>>> r.sub(' ', s)
   if not regions and default_to_all:
 foo
Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
1

For non-matching leading spaces implies you want to match multiple spaces following a non-space character (and replace it with single space), so you can replace (?<=\S) +(?=\S) with single space "".

Explanation:

(?<=\S) +(?=\S)
(?<=              Positive look-behind, which means preceded by...
    \S                non-space character
      )           end of look-behind group
        +         more than 1 space
         (?=\S)   Positive look-ahead, which means followed by...
                      non-space character
                  end of look-ahead group

That should be straight-forward to understand. You may need to tweak it a bit for trailing space handling though.

See "regular expressions 101" for more information.

However, just as a side note regarding your intention: This is not going to be a reliable way to reformat code. Apart from leading spaces, there are still many cases of multiple-spaces that are significant. The most obvious one is spaces within string literal.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Adrian Shum
  • 38,812
  • 10
  • 83
  • 131
  • Thank you! But how can I avoid deleting the first character? Please see the picture below. – Alexander Paul Wansiedler Aug 20 '20 at 20:24
  • don't understand what you meant by deleting first character. Create a regex101 example to demonstrate your problem – Adrian Shum Aug 21 '20 at 04:38
  • Please check this out: https://regex101.com/r/5d1cCY/1 Thank you! – Alexander Paul Wansiedler Aug 21 '20 at 08:54
  • @AlexanderPaulWansiedler don't understand what you are trying to do. It is simply doing what you are looking for: matching "a non-hash character followed by bunch of spaces" – Adrian Shum Aug 21 '20 at 15:31
  • I want only the spaces to be removed. As you see, sometimes a non space character BEFORE the spaces is being removed too. How can I avoid this? – Alexander Paul Wansiedler Aug 22 '20 at 08:40
  • @AlexanderPaulWansiedler it is just different from what you originally asked for in your original question, and the regex you are using is different too. Please be clear on what you are trying to do. We are not supposed to keep guessing your need. – Adrian Shum Aug 24 '20 at 03:00
  • Please read the first sentence in my question. I specifically asked about deleting only the spaces. Current regexp ```[^#](?<=\S)( {2,})(?=\S)``` also deletes randomly nonspace character before the spaces. Please check the link I posted. There you can see what I mean. – Alexander Paul Wansiedler Aug 24 '20 at 14:28
  • As I said before: the example you provided is different from what you originally asked, and regex you used in the example is different from the answer. – Adrian Shum Aug 24 '20 at 14:35
  • Please tell me should I create a new post? – Alexander Paul Wansiedler Aug 24 '20 at 17:42
  • You may. However I would recommend you to really understand the original answer and understand how it works. If you really understand this answer, and if my guess on what you were doing is correct, the solution just require a slight tweak. – Adrian Shum Aug 25 '20 at 06:15
  • I read the documentation, but I still cant figure out how to prevent a random cutting of some letters =( – Alexander Paul Wansiedler Aug 25 '20 at 18:22
  • There is no random cutting. Your regex means matching “a non-# char, followed by spaces.” That non-# char is part of the match hence it will be replaced. The whole reason of using lookahead and look behind group in my answer is to avoid something to be part of match. – Adrian Shum Aug 25 '20 at 23:06
  • Thank you! Tell please, how can I prevent that non-# char from being cut with spaces? – Alexander Paul Wansiedler Aug 26 '20 at 16:28
  • That's why I kept saying you should try to understand the original answer. Try to understand the effect of lookaround groups in regex – Adrian Shum Aug 31 '20 at 01:48