Docstrings

Harbest.read_htmlFunction

Returns a parsed HTML from an url

read_html(url::String)

Input:

  • url::String

Output

HTMLDocument

source
Harbest.html_elementsFunction

Returns HTML elements

html_elements(html,string)

Input:

  • html – It can be HTMLDocument, HTMLElement or Vector{HTMLNode}
  • string – It's the element in the HTML that you want to find. It can be a String or Vector{String}, if the latter, it will apply the function in sequence

Output

Your HTML reduced to the element that you indicated

source
Harbest.html_attrsFunction

Get an attribute

html_attrs(html,string)

Input:

  • html – It can be HTMLDocument, HTMLElement or Vector{HTMLNode}
  • string::String (optional) – Define the attribute that you want to return, if not provided, it would try to return a list of the attributes.

Output

Indicated attribute or a list of the available attributes

source
Harbest.html_tableFunction

Takes some HTML and turns it into a DataFrame, only if there is a very clear HTML Table.

htmltable(tablehtml)

Input:

  • table_html – Vector{HTMLNode}

Output

A DataFrame

source
Harbest.html_text3Function

Returns the text of an HTML.

html_text3(html)

Input:

  • html – HTMLDocument, HTMLElement or Vector{HTMLNode}

Output

A single String or a Vector{String} depending on the input

If you want/need whitespaces and other things, you can use htmltext or htmltext2

source