Modern Day Screen Scaping


Bill Humphries is writing about creating RSS feeds by screen scraping. He's using curl to get the page, tidy to clean up the HTML, and an XSL program to convert the result into RSS. Because the example he's using is making good use of CSS, he can use XPATH to easily grab the right nodes in the HTML doc. Very different from the PERL screen scapers we were writing 4 years ago.


Please leave comments using the Hypothes.is sidebar.

Last modified: Thu Oct 10 12:47:20 2019.