New Message: Re: Frontier address limitations

Thu Jul 21 13:23:50 CDT 2005

A new message was posted:

Address: http://frontier.userland.com/discuss/msgReader$14162

By: Robert Cassidy (rmcassid at uci.edu)

Ah, super.

So, basically, if I'm cautious about my verb and variable usage, I should be able to work around it. Cool. I'll probably have to rewrite some of the xml verbs, but it'll be doable. So long as I can store the long address and be able to follow it, I can make it work.

--

XPath allows you to to find and manipulate specific xml content in an xml tree. For lack of a better description what I'm working on is effectively the functionality of the read subset of SQL for XML (SELECT, JOIN, etc). It's actually quite complex and while Frontier could certainly have a full-tilt XPath parser, it is by no means a small undertaking, nor do I think it's necessary, but the simple parsing of an XPath expression along with a fleshed out set of xml verbs makes moving data in, out and around Frontier so much easier that xml tables are almost exclusively replacing standard Frontier tables in my tools.

BTW, I think this is a good thing for Userland to assess, since RDBMS like MySQL are so horribly bad at dealing with xml and since most languages have no native database manipulation ability, having to rely on XPath, which is good, but often VERY slow (no indexing, etc.) and not as feature-complete as SQL. Frontier handles xml is a rather good way, still allows the usual table verbs to be used on an xml table, and with an XPath subset, can make a fairly familiar xml environment with more of the power of a proper database.

Anyhow, the script I'm building works similar to xml.getAddressList in that you pass it a base address and a path at it returns a list of matching addresses. This is unlike an XPath processor that would return the content of the match, sort it, and a mess of other stuff, but a list of addresses is very useful within Frontier and Frontier can take up all the other stuff anyway. The path is formatted just as a normal XPath. So, using your custom prefs wizard as an example, if you wanted the list of countries that you allow users to choose from you would pass in "/wizard/panel/item[@title='Country']/option" which will find all option elements inside of the item element with attribute title=Country, within any panel within the wizard. If you wanted to restrict it to just the preferences panel, change it to "/wizard/panel[@title='Preferences']/item[@title='Country']/option". My current script can handle any number of predicates (the match expressions within the []s) and will hand!
 le the following operators: = != > &lt; >= &lt;=. It can also do multiple predicates per node, so "/wizard/panel/item[@title='Country']/option[@value='US'][@title='United States']" which will only match options with a value attribute of US *and* a title attribute of United States.

It's VERY rudimentary compared to a full XPath processor, but even with what I have, you can quickly slice and dice a deep xml table without having to write a bazillion getAddressList/for structures. In fact, you don't need to write your code dependent on the depth of the data at all since you'll get a nice flat address list in return that you can then walk over and deal with.

I don't yet support the very common // shortcut which is equivalent to /descendant-or-self::node() which basically allows you to shortcut the above code to "//item[@title='Country']/option", demonstrating that you don't care where in the table the match is - it could be in "/wizard/panel/" or in "/wizard/prefs/panel/" and it'd return both. Very handy, actually, but much more of a PITA to code since you can't rule out ANY paths when you walk the tree until you hit a leaf - and it's slow as shit. I avoid relying on it as much as possible, so I've put that code off. :-)

I haven't added current and parent expressions (. and ..), though those are pretty easy - just haven't had need for them yet.

I also haven't added the position predicates: "/option[1]" for the first option element, "/option[last()-2]" for the third to last element, "/option[position()&lt;4]" for the first 3 option elements. These aren't too interesting to me because one of the reasons I moved to using xml is to get away from positional selection of data (csv, etc), but they should be dead simple to do. (what's an easy way to get your current row number in a table, btw - the inverse of table.goto?).

I haven't done wildcards either, but again, they're dead-simple - everything matches. Finally, I don't yet support the or (|) operator, so that you can do "/wizard/panel/item[@title='Country']/option[@value='US'] | /wizard/panel/item[@title='Country']/option[@value='CA']" to return both US and Canada. This is mostly because of how I parse the path, but that can be fixed. Since there is no dependencies between expressions, the simple but slow solution is to just run two queries and join the results, removing dups. The faster solution would be to check both paths simultaneously as you walk the tree, but that can get ugly with all the recursion going on (makes my head hurt just thinking about it).

Finally, I have no support for axes. Axes allow you to select elements based on position rather than name, so instead of "/wizard/panel/item/option"[@value='US']" you can do "child::*/child::*/child::option[@value='US']". They overlap completely with the operators above, so it's just a problem of parsing the path.

The only *really* hard things, IMO, are properly handling the // operator and consistently parsing the path expression to cover all of the various and nutty ways that a path can be constructed, including nested predicates, boolean operators and so on. These aren't trivial things, mind you, but with all in place, you should be able to very effectively drill down into an xml table with a single call. I'd then fall back on Frontier to handle everything after that - sorting, accessing the data itself, and so on.

--

The plist support sounds handy.

If I could pass a request along: I also have a suite of scripts that uses the site structure path to follow a parallel path on the server filesystem to show through files on the current page. It makes it dead simple to manage file attachments since you just use the Finder to move your files around, etc. The problem is that, I think it must be fileloop, truncates filenames to 31 characters - I think it's using the Mac OS call that returns the 31 character safe name for pre-OS X. Any chance we can get that to grab a full filename for Frontier/OS X?

This is a Manila site.. http://manila.userland.com/.