I have been a frequent leader of the tech news aggregator site at Hacker News. It is basically a collection of links and discussions of tech startup news, or just tech news in general that would appeal to the startup hacker.
When someone wants to show case their weekend projects, they usually post a "Show HN" posts on hacker news. As a way of increasing their presence, I have filtered all the Hacker News post for the "Show HN", screen grab the webpage, voila, showhn.blogspot.com is born.
I have automate the collecting of data via an Google Spreadsheet App script. It uses the hnsearch api to grab the data.
The hnsearch api works directly with the Hacker News database. There are a couple of other hacker news API that performs a HTML scrapes on the Hacker News website and store it in their internal database. An API is available to access the scraped data.
The screen capture is performed using a Chrome screen capture plugin, specifically the Google Chrome Screen capture.
At the moment, I am treating this as a learning tool. I am especially on getting a handle on how data can be obtained from the websites, and how that data can be manipulate. So far it has been quite a enlightening experience.