Tuesday, August 2, 2011

Parse HTML To Extract Certain Values Out With Java

Problem:

I needed to parse and extract some values from an html page.  I tried using JTidy, dom4j, JDOM, and the JDK built-in parser.  They gave me errors and refused to work.   They were overly complicated for what I wanted to do.  I wanted something simple and easy to parse some html to get some values from it by some queries on the elements.

Solution:

I found what I was looking for:  Jsoup!  http://jsoup.org/   It's exactly what I wanted.  I could even connect to a web site and get the Document back.

Document doc = Jsoup.connect("some url here").get();

I must say that I really like it!



No comments:

Post a Comment