A Php userland Dom Contest

posted: April 6th, 2005 · by: Sven

in: Programming · tagged as: , , ·  7 comments »

Now that we’ve seen that php5’s dom extension really lacks support for php’s own un/serialize()ation let’s have a quick look at some php implementations of XML Dom. Possibly, we can speed up our templating experiment’s performance by using a serialize()able XML implementation, so that we’re able to cache our components and ressurect them from an already ready-made, fully usable state?

We’ve done an internal, purely subjective mini-con/test with some php Dom implementations that were spit out by Google. It has been a veeery superficial search (so for sure there are more implementations out there) and it has been an even more superficial test setup (so for sure there are some tweaks to tune up performance for one or another implementation).

Probably, that’s interesting. So here are the results.

The competitors

After a quick search with Google we’ve come up with 5 implementations that promised to provide the behaviour we need and don’t rely on phps native dom/xml support.

Here they are (including our first impression of the code):

  • ActiveLink – a nice implementation with very clean code, having (at least for an xml dom) a somewhat unusual tree api that differentiates between “leafs” and “branches”.
  • DomIt – a promising, feature-rich and well documented implementation
  • MyXml – a threefold library (containing myDom, myXPath and myXslt), at the first glimpse it looks mostly compliant with php’s own dom api
  • MiniXml – mentioned somewhere in the php Dom manual, seems to be a php port of Cpans XML_Mini perl implementation (or probably a parallel branch).
  • PhpDomXml – a very small and basic implementation, 13kb are all we need?

All of these are for php4 :(, as far we can see. So we turn off error_reporting for E_NOTICE and E_STRICT to run it under php5.

Test Code

We’ve used the following code to test the native php5 DomDocument version. That’s a very simple html snippet, an iteration over creating, parsing and outputting it ten times and a simple output of the results.

<?php
error_reporting(E_ALL ^ E_NOTICE);
$iterations = 10;
$start = getMicrotime();
$source =
'<html>
        <style>
                body { font: 12px normal verdana, arial, serif }
                h2.test { color: #c00000; }
        </style>
        <title>Let's check some dom xml implementations</title>
        <body>
                <h2 class="test">Let's check some
                        dom xml implementations</h2>
                <p><b>This is the native php5 DomDocument
                        implementation.</b></p>
                <p>What about the <i>execution time</i>?</p>
        </body>
</html>';
for ($i = 0; $i < $iterations; $i++) {
        $doc = new DomDocument();
        $doc->loadXML($source);
        $result = $doc->saveXml();
}
echo '<pre>' . htmlentities($result) . '</pre>';
echo $result;
$time = (getMicrotime() - $start);
echo "$iterations Iterations done.<br/>";
echo "We needed $time seconds.<br/>";
echo "The average execution time was: " . ($time/$iterations);
function getMicrotime () {
        $microtime = explode(" ", microtime());
        return $microtime[0] + $microtime[1];
}
?>

We like it our own way …

Of course, for every php implementation we needed to include the libraries and use some different function calls because they all have their own api.

ActiveLink seems to want the following:

$doc = new XMLDocument();
$doc->parseFromString($source);
$result = $doc->getXMLString(true);
But DomIt prefers:
$doc = new DOMIT_Document();
$doc->parseXML($source);
$result = $doc->toNormalizedString();

Different from that, MyXml will be happy with:

$doc = new Document();
$doc->parse($source);
$doc->setOption('indent', true);
$result = $doc->toString();

And MiniXml likes to get called with:

$doc = new MiniXMLDoc();
$doc->fromString($source);
$result = $doc->toString();

Not enough, for PhpDomXml we have to use:

$doc = new XML();
$doc->parseXML($source);
$result = $doc->toString();

Funny, hm? There’s not one api really aligned to php’s own domxml api.

Ok, now for the interesting part ;)

The results – by speed …

This ran on a WindowsXP, Apache2, php5.0.3 system on a somewhat outdated, but usable 1800+ Athlon.

avg exec time output format comment
php5 Dom 0.0001s great fastest, of course
PhpDomXml 0.0022s awful missing text!
ActiveLink 0.0061s not the best strange api
MyXml 0.0088s looks ok?
DomIt 0.0140s ok
MiniXml 0.5075s extra spaces!

Native php DomDocument – fastest, of course

Here are some screenshots to show the output formatting. php5 DomDocument, unbeated in terms of performance as well as output formating (just included to have the relation), looks like this:

screenshot 1

PhpDomXml – fast and unusable

Next, in terms of speed, comes the very lean and feature-poor PhpDomXml. But besides the lacking output format it seems to be unusable – eating up some cdata text from the html:

screenshot 2

ActiveLink – fast and strange

ActiveLink is fast and has an output format that’s probably acceptable. But – as mentioned – ActiveLink uses a strange tree/branches/leaf api. We can’t call anything like getChildren() or first/nextChild because of that.

screenshot 3

MyXml – ok to go?

MyXml adds some extra whitespace to the second p tag. That’s bad for a template engine because of design problems with IEs whitespace handling. Besides that, it looks very nice and still is comparably fast (enough?):

screenshot 4

DomIt – somewhat slow, but good

With DomIt, there’s an extra indentation in the styles section, but no extra whitespace inside the tags:

screenshot 5

MiniXml – unusably slow

Argh! What’s that?

MiniXml needs half a second to parse 11 tags, format and print them out again?

We can’t believe that … probably there’s some extra option turned on by default, so that our drive c:/ get’s crawled looking for a tmp directory? Or probably an e-mail send to the developers? Added some rating to their freshmeat project? ;)

We didn’t have the time to investigate further what’s going on here.

But MiniXml (like MyXml, but even worse) does something very bad besides eating up our cpu. It adds an extra space to the left and right of each and every cdata content of our tags. A webdesigner who knows about IEs whitespace bug will strongly insist that this would be a clear no-go sign for using it with html templating stuff …

screenshot 6

Summary

Without having had a closer look to PhpDomXml’s features, it won’t make it with its hunger for our cdata text.

ActiveLink is fast but not really an option (is it?) because of its api – unless we’d create an extra wrapper to get around that.

MyXml would be the one to go with, if there wouldn’t be that extra whitespace. Probably there’s a switch to turn that off? Or an e-mail to the developers could make it?

DomIt looks great, clean, and feature-complete. But it’s somewhat too slow to be used in a template engine. (Is it?)

We can’t believe that we’ve seen MiniXmls intended behaviour for now. But with the shortness of time, there hasn’t been an opportunity to get it run faster.

php5’s DomDocument

ActiveLink, the probably fastest (in our case) usable library will need 0.0061s to parse and output a template. That’s 60 times slower than the native php5’s Dom implementation, which takes 0.0001s for the same job – of course it’s faster, it’s written in C. Furthermore, php5’s Dom is the only one of all of them that generates output exactly the way we’d expect it.

But sad to say, we can’t use it for our case. It can’t be un/serialize()d

Leave a comment

7 Comments

  1. Jacob Carstens said July 26th, 2006 at 04:46 PM  

    I am using minixml to generate xml and was annoyed by the extra whitespace - I removed it, however by making some changes to the source, thought I'd post it here in case someone had the same problem:

    In element.inc.php

    1) remove the space at line 1120 so it reads: $retString .= ">";

    2) remove the space at line 1156 so it reads: $retString .= "</$elementName>\n";

    alternatively call toString like this: $doc->toString(MINIXML_NOWHITESPACES)) has the side-effect of removing linebreaks and indentations as well though...

  2. Robert de Wilde said November 4th, 2008 at 11:08 AM  

    Thank you for sharing, good results I can use. Maybe a suggestion testing a big XML file (1GB+), monitoring more than only speed! Thank you so far anyway!

  3. jack said January 24th, 2011 at 03:29 PM  

    UCVHOST has changed the face of web hosting industry in a major way, people were paying gold for peanuts (and it is still happening). cheap hosting has become synonym with UCVHOST, anybody and everybody who wants a reliable and affordable domain web hosting visits UCVHOST and gets either windows vps or Linux hosting from UCVHOST. UCVHOST sells cheap hosting WITHOUT hidden terms and conditions where as competition has huge MSA and SLA’s which are good enough to confuse a seasoned lawyer also. For clients by now Business with us for the value of windows vps became very critical piece of puzzle for their whole operation, uptime and performance became a huge concern.. However it came with a cost, dedicated servers proved to be at least 100 times expensive in comparison to any windows or Linux plans. Somewhere in the labs engineers were working on splicing raw power of a server into virtual instances, this technology was called as Virtualization also termed as or virtual private servers. Also UCVHOST comes handy when you are looking for remotely hosted and managed FOREX MetaTrader4 terminals. Our forex vps platform is all geared up in fight of pips, our platform support any number of expert advisory (EA) and along with an assure of 100% uptime. Our Virtual Forex Tradng Terminals are well equipped to help you in making money .

  4. Allenwood said March 7th, 2011 at 11:41 AM  

    Website design cumbria - Finding someone to design a website for you is tricky business, as there are so many web design companies out there.

  5. chat said March 31st, 2011 at 08:23 PM  

    The following cleaned up the issue:

    Dependencies.loadoncepaths -= Dependencies.loadoncepaths.select{|path| \ path =~ %r(^#{File.dirname(FILE)}) }

  6. Okey oyunu said May 12th, 2011 at 04:30 PM  

    Thanks a lot. This is nice post. Tüm dünya artik okey oyunu oynuyor. Yillardir bir çok oyun programi olmasina ragmen, içlerinden en güzeli olarak nitelendirebilecegimiz tek bir site göze çarpmaktadir. Diger tüm okey oyunu programlarinin aksine ücretsiz olmasi ve 3 boyutlu olarak hizmet vermesi mükemmel bir gelismedir. Sizlerde www.okey-oyunu.com adresinden bu essiz okey oyununu indirebilirsiniz. Kullanimi çok basit ve Türkçe dil seçenegi ile kolaylikla oyuna baslayabilirsiniz. Ister kendi ülkenizden, isterseniz dünyanin tüm farkli bölgelerinden dilediginiz oyun odalarini seçerek, oyuna hemen baslayabilirsiniz. Okey oyunu oynamak için artik arkadas bile aramaniza gerek kalmadan, bilgisayarinizdan 100 binlerce üye ile online olarak okey oyununu oynamanin zevkine varabilirsiniz.

  7. Yeng21 said May 20th, 2011 at 07:20 AM  

    What I am searching for is adolescent humans of access in our community pass4sure 642-533. I am alive on ambience up an Advisory Council that will serve as consultants and connectors for the project pass4sure 642-731. I would adulation to acquisition a (or several) motivated, affiliated adolescent humans from the association who affliction about the affair of advantageous sexuality pass4sure 70-511. Adolescent parents would be a plus! Part of the access that Real Life. Real Talk. takes is that we charge non-traditional partners, including the business community, to accouterment this issue.

Sorry, comments are closed for this article.

artweb design
Sven Fuchs
Grünberger Str. 65
10245 Berlin, Germany


http://www.artweb-design.de

Fon +49 (30) 47 98 69 96
Fax +49 (30) 47 98 69 97