-1

I have a file with a lot of lines of text like so:
(text)

There are many instances where this occurs:
In textfile1.txt:
logging.info(text text text(text text) text text)

Which if I use:
$ sed -i '/(/,/)/d' textfile
Returns ' text text)'

There are also many instances like so:
(text
text text(text text)
text text)

I want to remove everything between the first opening parenthesis & and it's respective closing parenthesis (including both parenthesis). Doesn't matter what method is used, looking for any. Is this possible?

I tried writing my own python script but I didn't even remotely get close. It would be easier to look for a different solution than to fix my broken program I believe.

I've seen some other posts like:
Find text between Opening parenthesis closing [closed]
match opening parenthesis to the corresponding closing parenthesis
regex: parse opening closing parenthesis with other parenthesis in between But I am really bad with regex and they all use regex and I don't know

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

5 Answers5

1

You can make GNU sed loop until all pairs are removed:

sed -z ':a;s/([^()]*)//;ta' textfile1.txt

EDIT: I added -z, as suggested by Ed.

Walter A
  • 19,067
  • 2
  • 23
  • 43
1

Just use awk, e.g. given this input:

$ cat file
(text)
logging.info(text text text(text text) text text)
foo (text
text text(text text)
text text) bar

then using GNU awk for multi-char RS:

$ awk -v RS='^$' -v ORS= '{ while( gsub(/\([^()]*)/,"") ); } 1' file

logging.info
foo  bar

otherwise using any awk in any shell on every Unix box:

$ cat tst.awk
{ rec = rec $0 ORS }
END {
    while ( gsub(/\([^()]*)/,"",rec) );
    printf "%s", rec
}

$ awk -f tst.awk file

logging.info
foo  bar
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Hello this is a good solution but I've identified a flaw. In my tests, it breaks if a "parenthesis that does not have a matching parenthesis" is inside a set of matching parenthesis; such as the following: In textfile.txt: ```text("text ( text text")``` or ```text(text text The character for a right parenthesis looks like ")". text text text)``` – Mermaid Man Aug 24 '21 at 13:00
  • The edit your question to provide sample input that includes that case. All we have to go on when writing a script is what you tell us that script has to be able to handle. – Ed Morton Aug 24 '21 at 13:20
1

With sed:

sed ':a
s/([^()]*)//g;t a
/(/!b
$b
N;b a' file

Remove balanced (...) until none are left. If a ( is still present and the last line of input hasn't been read: add the next line to the pattern space, and repeat.

0
def a(test_str):
  ret = ''
  skip1c = 0
  skip2c = 0
  for i in test_str:
      if i == '[':
          skip1c += 1
      elif i == '(':
          skip2c += 1
      elif i == ']' and skip1c > 0:
          skip1c -= 1
      elif i == ')'and skip2c > 0:
          skip2c -= 1
      elif skip1c == 0 and skip2c == 0:
          ret += i
  return ret

x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
  • It would help to add the final result to your answer and choose variable names that are more descriptive. Also, unless you specify this is for Python 2, use `print()` with parentheses. – Jan Wilamowski Aug 23 '21 at 06:26
0

You can use a function such as find_parens() posted here to find indices of matching parentheses in the string. This function will also raise an exception if the string has unbalanced parentheses.

For a string

s = "This is a (little (really small)) test (?)." 

calling find_parens(s) gives:

{18: 31, 10: 32, 39: 41}

The indices are organized into a dictionary. We can replace it by a list of tuples sorted by the first elements of the tuples:

sorted(list(find_parens(s).items()), key=lambda x: x[0])

This gives:

[(10, 32), (18, 31), (39, 41)]

The first tuple gives the index of the first opening parenthesis in the string and its matching closing parenthesis:

bounds = sorted(list(find_parens(s).items()), key= lambda x: x[0])[0]
print(bounds)

This gives:

(10, 32)

It remains to remove the part of the string defined by these indices:

out = s[:bounds[0]] + s[bounds[1]+1:]
print(out)

We get:

'This is a  test (?).'
bb1
  • 7,174
  • 2
  • 8
  • 23