Quick Look – Html Agility Pack
Software projects often come down to transforming data from one format to another.
A common situation involves extracting some data from HTML. This can occur when transferring data from a legacy system, and the best way to do it is screen scraping.
There are a variety of options available when screen scraping, but a really good one is the Html Agility Pack.
To install Html Agility Pack, use the NuGet command.
dotnet add package HtmlAgilityPack
The Html Agility Pack is an open-source project that makes parsing and interacting with HTML easier.
To download a web page, you can use the following:
Once you get a page downloaded, the Html Agility Pack provides a lot of good ways to parse the content. You can see in the sample code below that we can find all td
attributes that have a particular class.
Using Html Agility Pack is pretty straightforward. Give it a try the next time you need to parse some HTML.