0

How to save array output as csv file? i've tried with csv module but did not give me the right output. i want the output like the picture bellow.

output1.html

<div class="side-article txt-article">
    <p><strong></strong> <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"></a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"></a></p>
    <p><br></p>
    <p><a href="http://batam.tribunnews.com/tag/polres/" title="Polres"></a></p>
    <p><a href="http://batam.tribunnews.com/tag" title="Polres"></a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"></a></p>
    <br>

i have code :

import csv
from bs4 import BeautifulSoup
from HTMLParser import HTMLParser

with open('output1.html', 'r') as f:
    html = f.read()
soup = BeautifulSoup(html.strip(), 'html.parser')

for line in html.strip().split('\n'):
    link_words = 0

    line_soup = BeautifulSoup(line.strip(), 'html.parser')
    for link in line_soup.findAll('a'):
        link_words += len(link.text.split())

    # naive way to get words count
    words_count = len(line_soup.text.split())- link_words

    number_tag_p = len(line_soup.find_all('p'))
    number_tag_br = len(line_soup.find_all('br'))
    number_tag_break = number_tag_br + number_tag_p

    #for line in html.strip().split('\n'):
    number_of_starttags = 0
    number_of_endtags = 0


        # create a subclass and override the handler methods
    class MyHTMLParser(HTMLParser):
        def handle_starttag(self, tag, attrs):
            global number_of_starttags
            number_of_starttags += 1

        def handle_endtag(self, tag):
            global number_of_endtags
            number_of_endtags += 1

                # instantiate the parser and fed it some HTML


    parser = MyHTMLParser()
    parser.feed(line.lstrip())
    number_tag = number_of_starttags + number_of_endtags
    #print(number_of_starttags + number_of_endtags)
    CTTD = words_count + link_words + number_tag_break


    if (words_count + link_words) == 0:
        CTTD == 0
    else:
        CTTD

    print ('TC : {0} LTC : {1} TG : {2} P : {3} CTTD : {4}'
           .format(words_count, link_words, number_tag, number_tag_break, CTTD))



res = ('TC : {0} LTC : {1} TG : {2} P : {3} CTTD : {4}'
           .format(words_count, link_words, number_tag, number_tag_break, CTTD))
csvfile = "./output1.csv"

#Assuming res is a flat list
with open(csvfile, "wb") as output:
    writer = csv.writer(output, lineterminator='\n')
    for val in res:
        writer.writerow([val])

#Assuming res is a list of lists
with open(csvfile, "wb") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerows(res)

the output of algorithm

TC : 0 LTC : 0 TG : 0 P : 0 CTTD : 0
TC : 0 LTC : 0 TG : 0 P : 0 CTTD : 0
TC : 0 LTC : 0 TG : 1 P : 0 CTTD : 0
TC : 0 LTC : 0 TG : 1 P : 0 CTTD : 0
TC : 15 LTC : 0 TG : 2 P : 0 CTTD : 15

the output csv :

enter image description here

how to save the print to csv? any python library can do this?

i expected the output will be

enter image description here

Thank you.

Kim Hyesung
  • 727
  • 1
  • 6
  • 13
  • In what way did the `csv` module fail? Its the tool for the job. Is it an issue with whitespace or other decorative flourish in the output? No tool that I know of will output the exact table you present because that table is rendered by a GUI and isn't in a file at all. Were you to simply save a terse comma-separted csv and then import it into a spreadsheet, you'd get what you show. – tdelaney Nov 12 '16 at 15:16
  • Possible duplicate of [Writing arrays to a csv in columns](http://stackoverflow.com/questions/33268347/writing-arrays-to-a-csv-in-columns) – wuno Nov 12 '16 at 15:40
  • @tdelaney i update my output. thanks – Kim Hyesung Nov 12 '16 at 15:41
  • where did you get HTMLparser? – Nikaido Nov 12 '16 at 16:09
  • For the One using python 3.x you Can get HTMLParser doing : from html.parser import HTMLParser – Nikaido Nov 12 '16 at 17:00

2 Answers2

0

maybe this is what you want:

import csv
from bs4 import BeautifulSoup
from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        global number_of_starttags
        number_of_starttags += 1

    def handle_endtag(self, tag):
        global number_of_endtags
        number_of_endtags += 1

with open('output1.html', 'r') as f:
    html = f.read()
soup = BeautifulSoup(html.strip(), 'html.parser')

ress = []
for line in html.strip().split('\n'):
    link_words = 0

    line_soup = BeautifulSoup(line.strip(), 'html.parser')
    for link in line_soup.findAll('a'):
        link_words += len(link.text.split())

    words_count = len(line_soup.text.split())- link_words
    number_tag_p = len(line_soup.find_all('p'))
    number_tag_br = len(line_soup.find_all('br'))
    number_tag_break = number_tag_br + number_tag_p

    number_of_starttags = 0
    number_of_endtags = 0

    parser = MyHTMLParser()
    parser.feed(line.lstrip())
    number_tag = number_of_starttags + number_of_endtags
    CTTD = words_count + link_words + number_tag_break


    if (words_count + link_words) == 0:
        CTTD == 0
    res = [words_count, link_words, number_tag, number_tag_break, CTTD]
    ress.append(res)

csvfile = "./output.csv"
firstline = ["TC", "LTC", "TG", "P", "CTTD"]
with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerow(firstline)
    for val in ress:
        writer.writerow(val)

anyway, my output It's different from yours....I get this csv:

TC,LTC,TG,P,CTTD
0,0,1,0,0
0,0,8,1,1
0,0,3,2,2
0,0,4,1,1
0,0,6,1,1
0,0,1,1,1

Because you got only the last line of values in the for cicle (your . format its outside the for scope)

Nikaido
  • 4,443
  • 5
  • 30
  • 47
0

writerow takes a list of elements which forms the values of cells in a particular row.

So when writing to csv it is always advisable to construct your header as a list and all values as lists of list

header = ["TC", "LTC", "TG", "P", "CTTD"]
val = [[1,2,3,4],[2,3,4,5]]
with open(csvfile, "w") as output:
    writer = csv.writer(output, lineterminator='\n')
    writer.writerow(header)
    for v in val:
        writer.writerow(v)
Bhavani Ravi
  • 2,130
  • 3
  • 18
  • 41