Parsing html with BeautifulSoup
You can find documentation on BeautifulSoup here:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/
Hint -- Read both Part 1" and "Part 2" before starting work. That
might enable you to avoid some re-factoring.
Part 1
Read an HTML file and parse it with BeautifulSoup. Then do each of
the following:
- Print the title of the document.
- Print the text from each of the "<p>" elements from the document.
- For each "<a>" element in the document, print (1) the value of its
"href" attribute and (2) the text in the element.
- Walk the document tree. Write a recursive function to do the
walk. For each element (node) in the document, print out the name
(tag) and the attributes.
- Add a footer to the document. For example, add a "<hr/>" element
and a "<p>some text</p>" element. Save the modified document to a
file.
You can use this HTML file (html_beautifulsoup.html) or any
other HTML file for input data.
Part 2
Use the "urllib" module from the Python standard library to download
a Web page, then do each of the above tasks.