How to get the text without space from the bs4 output

Question

I wanted to scrape this page and get the containt without whitespace.

```
 import requests
from bs4 import BeautifulSoup
def getdat(url):
    r = requests.get(url)
    return r.text
newsurl = "https://www.msei.in/downloads/equity-reports/fii-dii-activities"
data = getdat(newsurl)
soup = BeautifulSoup(data, 'html.parser')
result = soup.findAll('tr')
for i in result:
    print(i.text)
```

The output of the code is-

My requirment is get the text without the blankspace.How can I get the text without the blankspace?

Does this answer your question? [How to remove whitespace in BeautifulSoup](https://stackoverflow.com/questions/4270742/how-to-remove-whitespace-in-beautifulsoup) — Russ J, Mar 20 '21 at 04:20
No i have already tried the solutions but my problem is not solved — Swagoto, Mar 20 '21 at 04:44

Cresht · Answer 1 · 2021-03-20T04:56:42.443

2

Using regular expressions you can remove ALL of the whitespace easily (or less if you want to, with a little more effort).

import requests
import re

from bs4 import BeautifulSoup
def getdat(url):
    r = requests.get(url)
    return r.text
newsurl = "https://www.msei.in/downloads/equity-reports/fii-dii-activities"
data = getdat(newsurl)
soup = BeautifulSoup(data, 'html.parser')
result = soup.findAll('tr')

ans = [re.sub(r"\u0000+", "\n", re.sub(r"\s+", "", re.sub(r"\n+", "\u0000", x.text))).strip() for x in result]

for i in ans:
    print(i)

edited Mar 20 '21 at 04:56

answered Mar 20 '21 at 04:24

Cresht

1,020
2
6
15

I have already tried this but it remove all the space and the string is not distinguishable. – Swagoto Mar 20 '21 at 04:35
I edited to make the output more legible, hopefully this works better for you. – Cresht Mar 20 '21 at 04:57

score 1 · Answer 2 · answered Mar 20 '21 at 04:19

1

You do know that the .strip function removes whitespace from both ends of a string?

for i in result:
    txt = i.text.strip()
    if txt:
        print(txt)

answered Mar 20 '21 at 04:19

Tim Roberts

48,973
4
21
30

I actually thought this approach would work, but to my surprise, it does not. I believe this is a special case with the HTML. – Jacob Lee Mar 20 '21 at 04:21
I have try this but output is still same. – Swagoto Mar 20 '21 at 04:25

score 1 · Answer 3 · answered Mar 20 '21 at 04:27

I tried using the approach which Tim Roberts mentioned, but, to my surprise, it did not work. Here's what I came up with:

import bs4
import requests

res = requests.get("https://www.msei.in/downloads/equity-reports/fii-dii-activities")
soup = bs4.BeautifulSoup(res.text, features="html.parser")

elems = soup.select("tr")
text = []
for e in elems:
    print(e.getText().split())

I found that calling the split() method was the easiest way to get a clean list of strings, with no whitespace.

['Category', 'Date', 'Buy', 'Value', 'Sell', 'Value', 'Net', 'Value']
['FII/FPI', '19-Mar-2021', '24,193.67', '22,775.24', '1,418.43']
['As', 'on', '19', 'Mar,', '2021']
['Category', 'Date', 'Buy', 'Value', 'Sell', 'Value', 'Net', 'Value']
['DII', '19-Mar-2021', '7,503.70', '6,944.08', '559.62']
['As', 'on', '19', 'Mar,', '2021']

Thank you so much. This code is perfect. – Swagoto Mar 20 '21 at 06:25 — Swagoto, Mar 20 '21 at 06:25

score 0 · Answer 4 · answered Mar 20 '21 at 04:25

If you want to remove all whitespace leading and trailing the printed text then you would do:

print(i.text.strip())

If you want to remove all whitespace everywhere in the text then you would do something like:

import re
...
removedWhiteSpaceText = re.sub(r'\s+', '', i.text)
print(removedWhiteSpaceText)
...

score 0 · Answer 5 · answered Mar 20 '21 at 04:27

0

Try using replace when printing the output.

for i in result:
    print(i.text.replace(" ",""))

answered Mar 20 '21 at 04:27

John Holmes

381
1
14

This remove the literal spaces, but does not removed newlines. OP wanted the content without *any* whitespace, which includes newlines. – Jacob Lee Mar 20 '21 at 04:30

How to get the text without space from the bs4 output

5 Answers5