GSoC status update, week 0.5
Common-lisp.net's Trac site is down for last few days or so, which made the work a bit harder. Nevertheless, first real code has been written. Identifier normalization seems to work. I introduced another dependency, Puri – a portable version of Allegro's net.uri. Identifiers will be internally represented by Puri URI objects; for XRI, if I get to it (support for XRI is optional), I will probably subclass URI, so that minimal API changes will be needed.
I am pondering now how to do HTML parsing for HTML-based discovery. One way would be to use CL-HTML-Parse, or pxmlutils – a port of Allegro's xmlutils. Both would require adding dependencies to the library, dragging along (in case of pxmlutils) ACL-compat Allegro portability layer. Other way would be to add HTML parsing to XMLS. This may, however, seem tricky to do right, because HTML is not as structured as XML, and there may be many corner cases (possibly unquoted tag attributes, optional close tags, and so on). I don't know at the moment, how other parsers handle misformed HTML, and HTML's corner cases. If there is no reliable parser, an interface to a tool like HTML Tidy will be needed – either FFI, or by running external program
Work done this half-week
- Reviewed list of OpenID servers and libraries from openid.net;
- Installed and tested local OpenID server on my workstation (used SimpleID);
- Written code and tests for identifier normalization (commit info).
Problems
Minor only: reviewing available servers and installing one took more time than I expected, because much of information I needed to choose (like supported OpenID version, or required libraries) was well hidden inside the software. I also spent a few hours on what turned to be a documentation error in SimpleID during its installation.
Plans for next half-week
- Research and choose HTML parsing strategy
- Implement HTML-based discovery
- Write tests for discovery and HTML parsing
