1

Edit: Here is the code I am trying to use:

from bs4 import BeautifulSoup
import re
import sys
m = re.compile("^\d\d:\d\d$")
readfile = open("C:\\Temp\\LearnPythonTheCompletePythonProgrammingCourse_Udemy.htm", 'r').read()
soup = BeautifulSoup(readfile, "html.parser")

ci_details = soup.findAll("span",{"class":"ci-details"})

timeList = []
for detail in ci_details:
    for span in detail.findAll("span"):
        if m.match(span.text):
            timeList.append(span.text)


print (timeList)

for i in timeList:
    time1=timeList[0]
    print(time1)

edit I realized looking this over that I am telling Python to print time1 for every item in timeList. How do I iterate over timeList ?

I want to use dstubeda's code to take each entry in the list, convert it to raw seconds, add them up. Then once done, I will convert them to h:m:s. Where did I go wrong with my for loop?

Dave
  • 95
  • 13
  • Why download the time instead of reading the system time – Steve Robillard Dec 17 '15 at 19:56
  • 2
    It looks like the time is the length of the lecture judging by the html – wpercy Dec 17 '15 at 19:56
  • @wilbur that's what I thought but that then raises the question why do this with screen scraping instead of access the source via db or api. – Steve Robillard Dec 17 '15 at 19:57
  • Have you tried [this?](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) – Bob Dylan Dec 17 '15 at 19:57
  • @SteveRobillard maybe there is no api? I'm not sure - sometimes doing things with bs4 can be easier than working with a convoluted api. – wpercy Dec 17 '15 at 19:59
  • @wilbur i don't disagree just trying to help the OP come up with the best solution – Steve Robillard Dec 17 '15 at 20:01
  • 2
    Possible duplicate: [can we use xpath with BeautifulSoup?](http://stackoverflow.com/questions/11465555/can-we-use-xpath-with-beautifulsoup). The accepted answer suggests you use `lxml` and its `xpath` functionality. I think that would be the best solution for you. – tdelaney Dec 17 '15 at 20:02
  • You stated that your current code is able to extract the time from the snippet. Does that code already make use of BeautifulSoup or is that the question? – DJanssens Dec 17 '15 at 20:09
  • Steve: Wilbur is correct. I get some credit at work for doing away from work learning. Again, total newbie to programming so I would not know how to put this in a DB except as a huge glob. Nor do I know of any api's or even where to look. – Dave Dec 17 '15 at 21:49
  • tdelaney: I will check into your suggestion. – Dave Dec 17 '15 at 21:49
  • DJansens: I am using BeautifulSoup already. My question is now to loop over a web page to get the lecture times, by looping over the page. HTH? – Dave Dec 17 '15 at 21:50
  • Use `find_All` to get all the `span` tags that match `` and then do a `for` loop over those and each of those tags do a `.text` on to get the time value. – dstudeba Dec 18 '15 at 01:28
  • Dstubeda: I have that piece already thanks. My issue is that I am not sure how to do the for loop: timeList = [[span for span in detail.findAll("span") if m.match(span.text) ] for detail in ci_details] ..... should I do something like "for i in timelist" and then parse all the times found into a list? Then process that list for the time conversion? – Dave Dec 18 '15 at 15:22
  • See previous reply to Dstubeda as the suggest solution does not work as I need help determining which kind of loop, and how to implement it. – Dave Dec 20 '15 at 14:58

1 Answers1

0

It would be easier if you showed your code, but you should be able to figure it out from this:

totalTime = 0
spans = soup.find_all('span',{"class":"ci-details"})
for span in spans:
     rawTime = span.text
     processedTime = DavesTimeFunction(rawTime)
     totalTime += processedTime

print("The total time is: " + str(totalTime))
dstudeba
  • 8,878
  • 3
  • 32
  • 41