Wednesday, March 4, 2020

What is lxml?

lxml is a parser for XML and HTML. It binds c libraries linxml2 and licxslt to Python and should work with Python versions 2.8 through 3.8.

lxml is better than ElementTree I used with python 3.7:

https://hodentek.blogspot.com/2018/07/parsing-xml-with-python-37.html

You can download the complete and this te documentation in pdf from here:
https://lxml.de/lxmldoc-4.5.0.pdf

The HTML parsing may be available here:
https://lxml.de/index.html#download

As a parser, it builds a data structure (parse tree) given the XML. There are many programs to parse XML but this is python specific.

How do you install?

Get the lxml from here:

http://pypi.python.org/pypi/lxml/ 

 use the following public key:

https://lxml.de/3.8/pubkey.asc 

If you have Python 3.8, like I have, the lxml 3.8.0 is here:

https://lxml.de/files/lxml-3.8.0.tgz
Documentation here:
https://lxml.de/3.8/lxmldoc-3.8.0.pdf

If you are using Windows 10 and already have Python 3.8 use the following steps:


Step 1: Verify you can run 'pip' as shown here:
-------
Microsoft Windows [Version 10.0.18362.657]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\WINDOWS\system32>pip

Usage:
  C:\Users\Owner\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\python.exe -m pip [options]

Commands:
  install                     Install packages.
  download                    Download packages.
  uninstall                   Uninstall packages.
  freeze                      Output installed packages in requirements format.
  list                        List installed packages.
  show                        Show information about installed packages.
  check                       Verify installed packages have compatible dependencies.
  config                      Manage local and global configuration.
  search                      Search PyPI for packages.
  wheel                       Build wheels from your requirements.
  hash                        Compute hashes of package archives.
  completion                  A helper command used for command completion.
  debug                       Show information useful for debugging.
  help                        Show help for commands.

General Options:
  -h, --help                  Show help.
  --isolated                  Run pip in an isolated mode, ignoring environment variables and user configuration.
  -v, --verbose               Give more output. Option is additive, and can be used up to 3 times.
  -V, --version               Show version and exit.
  -q, --quiet                 Give less output. Option is additive, and can be used up to 3 times (corresponding to
                              WARNING, ERROR, and CRITICAL logging levels).
  --log                 Path to a verbose appending log.
  --proxy              Specify a proxy in the form [user:passwd@]proxy.server:port.
  --retries          Maximum number of retries each connection should attempt (default 5 times).
  --timeout              Set the socket timeout (default 15 seconds).
  --exists-action     Default action when a path already exists: (s)witch, (i)gnore, (w)ipe, (b)ackup,
                              (a)bort.
  --trusted-host    Mark this host as trusted, even though it does not have valid or any HTTPS.
  --cert                Path to alternate CA bundle.
  --client-cert         Path to SSL client certificate, a single file containing the private key and the
                              certificate in PEM format.
  --cache-dir
           Store the cache data in .
  --no-cache-dir              Disable the cache.
  --disable-pip-version-check
                              Don't periodically check PyPI to determine whether a new version of pip is available for
                              download. Implied with --no-index.
  --no-color                  Suppress colored output
---------------------

Step 2: Use this command to install lxml: