Web scrapping

Other than qualifying exam and regular lab work, I am laying my interest on web scraping lately.

In both of my favorite scripting language (and are the only ones that I am fairly competent on), R and python, I am trying out the most popular (?) packages beautifulsoup and rvest

To get all the title from cnn US news for the day:

#!/usr/bin/env python
  
import urllib2
from bs4 import BeautifulSoup

url = 'http://rss.cnn.com/rss/cnn_us.rss'
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)
for line in soup.findAll('title'):
    print line.get_text()

Still exploring…………..

This work is licensed under a Creative Commons Attribution 4.0 International License. If you liked this post, you can share it with your followers or follow me on Twitter!

Douglas C. Wu

Web scrapping