I need to extract different fields surrounding a news articles and I have been able to automate most of them except the published date of the news articles. Currently, I manually go to the respective website, check the HTML tag surrounding the published date and write a jQuery for extracting the date and implementing the same in pyquery. However, I want to remove this one manual step as well and write a generic web scraper for news websites like NY Times etc. The closest I can think of is writing a lot of regexes that can match the datetime format in the DOM of the article but can't figure out a way how it can differentiate between the actual published date and any other date that may be present in the actual article itself. I researched and realised that both Google and Duckduckgo show timestamp of the article in their search results so it must be possible to implement this.
Edit: I believe the language of my question was not very clear so my question is if there is a way to scrape published date from any news article automatically, i.e. a generic crawler which can extract published date from blog posts or news articles.