
This option is for diagnostic purposes.īy default, html2text renders underlined letters with sequences like "underscore-backspace-character" and boldface letters like

debug-scanner Let html2text report on each lexical token scanned, while scanning the HTML document. debug-parser Let html2text report on the tokens being shifted, rules being applied, etc., while scanning the HTML document.
#Url to text converter code#
May cause mis-interpretation of the HTML code and/or portions of the document being swallowed. Note that parse and scan errors are not fatal for html2text, but Report on parse errors and scan errors, which it does not in other modes of operation. In this mode of operation, html2text will This option is for diagnostic purposes: The HTML document is only parsed and not processed otherwise. To find out how non-ASCIIĬharacters are rendered, refer to the file "ascii.substitutes". Specifying this option, plain ASCII is used instead.

#Url to text converter manual#
RC file format is described in the html2textrc(5) manual page.īy default, html2text uses ISO 8859-1 for the output. If no RC file can be read (or if the RC file does not override all formatting properties), then "reasonable" defaults are assumed. $HOME/.html2textrc (or the file specified by the -rcfile command line option) if that file cannot be read, html2text attempts to read The way html2text formats the HTML documents is controlled by formatting properties read from an RC file. It also accepts syntactically incorrect input, and attempts to interpret it "reasonably". html2text parses HTML 4 input, too, but not always as successful as Program attempts to provide good substitutes for the elements it cannot render. Html2text understands all HTML 3.2 constructs, but can render only part of them due to the limitations of the text output format. A dash as the input-url is an alternate way to If no input-urls are specified on the command line, html2text reads from standard input. That begin with "file:" and URLs that do not contain a colon specify local files. Standard output (or into output-file, if the -o command line option is used).ĭocuments that are specified by a URL ( RFC 1738) that begins with "http:" are retrieved with the Hypertext Transfer Protocol ( RFC 1945). Html2text reads HTML documents from the input-urls, formats each of them into a stream of plain text characters, and writes the result to
