More issues with Unicode and XML

Bill Humphries whump at apple.com
Wed Sep 25 16:34:04 PDT 2002


Consider the following XML table stored within Frontier:

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Picture-2.gif
Type: image/gif
Size: 24738 bytes
Desc: not available
Url : http://lists.userland.com/pipermail/frontier-users/attachments/20020925/075498bf/Picture-2-0002.gif
-------------- next part --------------


Sorry for the image, but that's the only decent way to represent the table 
in-situ.

When I call xml.decompile () on the table I get:

<?xml version="1.0"?>
<page changed="false" name="unicode2" new="false" title="Page Title">
	<container access="employee" changed="false" new="false" 
title="Section Name">
		<para changed="false" new="false">Le droit au cong&#233; 
s'appr&#233;cie au cours d'une p&#233;riode dite (ann&#233;e de 
r&#233;f&#233;rence), qui d&#233;bute le 1er Juin de chaque ann&#233;e et 
se termine le 31 Mai de l'ann&#233;e suivante.</para>
		<para changed="false" new="false">
			Le droit au cong&amp;#233; s'appr&amp;#233;cie au cours d'une 
p&amp;#233;riode dite (ann&amp;#233;e de r&amp;#233;f&amp;#233;rence), qui
			<br></br>
			d&amp;#233;bute le 1er Juin de chaque ann&amp;#233;e et se 
termine le 31 Mai de l'ann&amp;#233;e suivante.
			</para>
		</container>
	</page>


Notice that the first para element's entities are written out correctly.

However, the entities in the second paragraph, with the embedded break 
child, are not serialized correctly. Entities of the form &#nnn; become 
&amp;#nnn;.

Aside from painful post processing using sed/awk, what can be done. Is 
there a patch for xml.decompile ()?

-- whump


More information about the Frontier-Users mailing list