Blogger adding an extra clear: both div to posts

Lots of people have noticed recently that Blogger has started publishing posts with a

tag before the posting text. This broke my layout and have me invalidly nested tags since I was including the opening and closing

tags in the template. The extra

tag meant that the browser closed the previous block element (the

) and then after closing the

went back to put the first paragraph of content straight after. This meant my posting class wasn’t being applied so the first paragraph of every post didn’t look right. It also broke the already broken HTML on the site.

So, since it was annoying for me, I’ve written a partial fix in JavaScript. Paste the code below into your Blogger template just before the closing tag. This is important as it needs to be in the code after all the content.

var allDivs = document.getElementsByTagName("div");
for(var i=0;i<allDivs.length;i++){
 var curDiv=allDivs[i];
 if(curDiv.parentNode.className=='postingBlock'){
  var block=curDiv.parentNode;
  html=block.innerHTML;
  html=html.replace(/()?
/gi,''); block.innerHTML=html; } }

I’ve tested it in IE6 and Firefox so far, and I’m not really bothered about the rest at the moment although it should work in IE5+.

PHP 5 Static class variable inheritence

PHP 5 doesn’t seem to attempt to implement any kind of inheritance for properties within static classes. This can be mean having to duplicate code within static subclasses. As an example, we would like to have a parent class (this is using the Singleton pattern) such as this:

abstract class DataTable
{
 protected static $instance;

 protected function __construct()
 {
  parent::__construct();
 }

 public static function getInstance()
 {
  if(!isset(self::$instance))
  {
   self::$instance = new self();
  }
  return self::$instance;
 }
}

And then to create a subclass to inherit from it:

class PageTable extends DataTable
{
}

And then creating an instance of PageTable could be done by running a line of php such as this:

$pageTable = PageTable::getInstance();

Which would then return an instance of PageTable.

Then we might want another subclass such as:

class ContentTable extends DataTable
{
}

And so be able to create an instance of ContentTable with:

$contentTable = ContentTable::getInstance();

However, what really happens is that first PageTable::getInstance() is called and since PageTable itself doesn’t have a method defined as getInstance() the parent method is called. This sets $instance as a reference to the newly created object. However, the $instance property belongs to DataTable and not to PageTable. Even declaring a $instance property local to PageTable has the same result as in this case self::$instance refers to the parent class (which is where the method is run from). Thus when ContentTable::getInstance() is called, the value of self::$instance is detected as being set and the original instance of PageTable is returned. This is obviously not the intention.

The only way around this is to duplicate code by adding an $instance property to each subclass, as well as a duplicate of the getInstance() method.

As an aside, the same is also true of class constants which are also referred to with the self:: caller. From this we can infer that self:: always refers to the context of the current code block not the class from which a method is called.

Technical solutions to web site visitor tracking

There are two main ways that the problem of identifying an individual when they visit a web site. This is assuming they are not willing to put their hand up and identify themselves by logging in. That may not always be likely to happen depending on the type of site or service on offer. So, assuming we can’t pursuade them to identify themselves so readily, the two technical methods we need to think about for getting this information could broadly be classed as what I’d call ‘intrusive’ and ‘non-intrusive’.

Intrusive methods have a lasting effect on the visitor’s computer, usually through depositing a file of some kind. The most common, and the most familiar, method for doing this is the browser cookie. There are a few other techniques, such as the recent uses of Flash Shared Objects, but only a handful and each with their own problems. These have the advantage that we can be certain of the identity of the computer that is visiting the site (although still not 100% sure that it’s the same individual using it) as only their computer will have that particular file on it. It doesn’t matter if they change ISP or otherwise play around their computer’s settings as long as they use the same browser, and changing browser is not too common an occurrence.

The disadvantage of these techniques is that some people don’t like the idea of a web site, let alone an ad banner server, putting something on their computer without them being asked about it. This has generated a market for tools which block or remove these intrusive files, generally sold as anti-spyware. There are also only a limited number of techniques we can exploit to get the file onto the user’s system and we’re restricted by what the browser, or other common plug-in (Flash and Java), manufacturers decide to give us. As is likely to happen with Flash Shared Objects, these are all easy to block either through the use of bespoke software or simply the user changing their browser preferences to explicitly deny certain technologies.

Non-intrusive methods are what we’re left with if users decide to actively combat the intrusive methods, and this is an area that no-one seems to have really cracked. Historically, this was how web analytics started by tracking people by IP address. It hasn’t really come a lot further and most analytics vendors talk about using the combination of IP address and user agent if it’s not possible to use a cookie. (The user agent is the way a web browser identifies itself to a web server, i.e. ‘Microsoft Internet Explorer version x.xxxxxx’). Although there are a huge number of sub-versions of each browser, that doesn’t really solve the problem of changing IP addresses for repeated sessions (as with dial-up users), changing IP address mid-session (as with AOL users and other proxies) or mass installations of the same browser behind the same proxy server (as with some corporate setups). There are techniques to improve accuracy, but none reach certainty by any means.

All of this information, such as IP address and user agent, is ‘volunteered’ by the web browser as a user surfs, and with the addition of some client-side JavaScript code we can find out a lot more such as screen size, colour depth, system time, plus some others. I call non-intrusive because although we are running some code on the user’s machine (as part of the page they are visiting) we’re not leaving any kind of mark on the user’s system, only reading information.

The aim of the non-intrusive methods is to identify a user by generating some kind of unique fingerprint based on what we can find out. As far as I’m aware, there is no sure-fire way of doing this, but it’s where a lot of the lateral thinking is going. There isn’t really anywhere else to go with the intrusive methods unless the browser and plug-in manufacturers come up with something new. (And then, following that, seeing if it become’s regularly blocked in the same way as cookies are now.) Areas that we’ve been looking at include what we can find out about the visitor’s browser setup (which, again, will often be identical across corporate installations) but also by looking at what they do, what they’ve seen and whether we can generate a fingerprint that way. These ideas often fall foul of the fact that we’re not dealing with a controlled set of variables but rather with side-effects i.e. the user may change something about their system not because they don’t want to be tracked, but for some other completely innocuous reason. Examples of this might be making use of the browsing history and cache (which regularly expires and often cleared out by users, hence losing all that data) or system time (changes over time so can’t be relied on to any real degree of accuracy).

The change in browser technology and the shift in market share are also playing a part in making it difficult to produce new solutions. As an example, research may be conducted into the way files are downloaded from the server to try and guage whether a particular visitor has seen a page or combination of files before, but the download methods for the main browsers differ enough to cause problems here. Firefox is easily configurable to open many more parallel requests than either IE or its own default setting. Given that a new version of IE is on the horizon any solution which made use of any kind of browsing technology side-effect could well become obselete, or possibly even give completely incorrect results.

These techniques do have some usefulness when it comes to tracking a user within a single session, and indeed we can do that very accurately without cookies. What is much harder is to identify repeat visits weeks or months after someone has first seen a piece of content. It is these kinds of timeframes within which people are more likely to clear out their cookies but even with this deletion the cookie (or other intrusive method) is still much more accurate than any of the non-intrusive ‘guesswork’ methods.

That doesn’t mean we’re going to give up on trying to find a better way to measure but unless either a) there is a revolution in browsing technology and the general consumer is willing to accept being identified (I’m not holding my breath – as a web user I’d have issues with giving up that kind of anonymity) or b) we pursuade people to identify themselves voluntarily, we are for the moment only working towards greater statistical accuracy rather than certainty, especially in the most valuable areas of repeat visits.

Using importNode and appendChild with PHP 5 DOM

importNode is one of the DOM functions in PHP 5 that I struggled with for a while. What I wanted to do was to take an XML node from one document and insert it into another, and somehow that wasn’t particularly easy. I tried using ‘appendChild’ but kept getting ‘wrong document’ error messages. Once it was working it seemed obvious, but the example in the documentation wasn’t entirely clear to me.

As an outline, the steps that need to be gone through are:

  1. Import the node you want into the destination document. This is done by calling the importNode method and storing the result in a variable (the crucial step). At this stage, the node is in the document, but won’t appear anywhere if you print it out which may seem odd.
  2. Append the stored node to the destination document in the place you want it

The thing that threw me is that the importNode method doesn’t so much import the node as make a copy of it in the destination document – so the original is actually left untouched. This seems to be standard across XML DOM methods in other languages such as C# and JavaScript.

The code then is as follows. This is for taking a complete document and moving it into a new DOMDocument object. The existing xml is assumed to be loaded into $oldXML

$xml = new DOMDocument();
$xmlContent = $xml->importNode($oldXML->documentElement,true);
$xml->appendChild($xmlContent);

It is important to use the second parameter ‘true’ in importNode as this tells the method to import all children as well as the selected node. The node that $xmlContent is appended to can be any DOMElement. Note that importNode is a method of the DOMDocument (and must always be) as it is the document as a whole that the new node is being imported into, not a specific node.

importNode in the PHP 5 DOM documentation
appendChild in the PHP 5 DOM documentation

Using removeChild with the PHP 5 DOM

The documentation for PHP 5’s DOM functions isn’t at its most helpful yet, so I thought an example of how to use ‘removeChild’ wouldn’t go amiss.

Assuming first that you have some DOMDocument XML in a variable called $xml, that may look something like this:


My title

The first thing to do is to get a handle on the node you want to remove. We’re going to remove the node named ‘title’:

$node=$xml->getElementsByTagName("title")->item(0);

Then, simply remove $node from its parent:

$xml->getElementsByTagName("labels")->item(0)->removeChild($node);

Not too much to it at all.

PHP DOM removeChild documentation

The effects of browser settings on web analytics

Modern web browsers give the user some degree of control over their privacy and security settings. These can have a major impact in trying to measure how web site’s are used either through preventing web analytics products from registering the user at all or by altering the way data is captured such that the information is unreliable.

The most common two browsers at present are Microsoft’s Internet Explorer (IE) version 6 and Mozilla Firefox. IE currently has by far the larger share but Firefox is growing with around 10% at present and so is becoming statistically significant.

Each of these allows control over browser settings to similar effect, although how these are configured is different in each case. It’s fair to say that Firefox allows slightly more control whereas IE presents the controls in a more ‘user friendly’ way. However, many of the main features are the same in both.

Taking the options available to the user in turn, below is a summary of the effect that these may have on web analytics software.

Blocking 3rd party cookies

If tracking of users and recognising them even within a single session requires 3rd party cookies to be enabled then a session will appear as multiple sessions in any reports.

Blocking third party cookies is easy to enable in either browser. Indeed, without a compact privacy policy then third party cookies are blocked by the default installation settings of Internet Explorer 6.

Blocking 1st party cookies

1st party cookies are much less likely to be blocked than 3rd party cookies. However, the effects on the ability to track visitors if they are blocked is the same as with 3rd party cookies.

JavaScript disabled

If JavaScript is disabled then the data that can be gathered is greatly reduced. Any form of tracking is likely to rely on 3rd party cookies or, at best, 1st party cookies. If JavaScript is disabled and cookies are blocked then it may be impossible to track a user’s journey reliably.

Although few home users block JavaScript, and in fact how to disable JavaScript is not entirely clear in IE, this is often the default setting on corporate computer installations to lower security risks.

Blocked 3rd party images

Most web analytics products communicate data to the tracking server by dynamically writing an image tag into the page to be tracked. This image tag is generally located on the server of the analytics company and hence is 3rd party. If 3rd party images are being blocked (as it is simple to do in Mozilla Firefox) then no data at all will be received by the tracking server.

IE does not have a way for users to enable the blocking of 3rd party images but this option is presented on the same page as blocking cookies within Firefox’s preference settings.

Blocked domain

Even more severe than blocking 3rd party images, it is possible to block an entire domain meaning that any requests to it are blocked. In this case, even if the analytics information is passed to the tracking server by a method other than an image then the server will not receive it.

The blocking of domains is commonly achieved by installing a file known as the ‘hosts’ file in a particular folder within the Microsoft Windows operating system. These hosts files, containing common ad servers and analytics company’s domains, are readily available for download from the internet.

Reliance on cookies in web site user tracking

In the field of web analytics, one of the permanently hot topics is how to track a user’s activity on a web site accurately with the information that the technology available to us is able to provide. There are two main aims:

  • Tracking a user from their entry point into a site until they leave (known as a user’s ‘session’
  • Recognising the same user then they return to the site at some time in the future (to recognise them as a ‘visitor’)

This has been raised to the forefront again recently by Eric Peterson’s study for Jupiter that indicates that a high proportion of users (in the US, where the study was performed) regularly delete cookies from their computers.

Cookies are the principle, and still most reliable,form of tracking technology used by the major web analytics companies. These are used not only to track sessions, but also to recognise repeat visits by the same user.

If cookies are being deleted as regularly as the Jupiter report suggests then this would have the following implications for any web analytics that relies on cookie-based tracking:

  • If a user deletes a tracking cookie whilst in the middle of a session then their visit will not be tracked correctly, i.e it will be tracked. as two (or more) visits rather than one. Also, any conversions may be attributed to the wrong source.
  • If a user deletes a tracking cookie between sessions then they will not be recognised when they return to the site at a later date

Interms of what these would mean to site performance reports some of the following should illustrate why this is such an important issue for website owners:

  • Drop-off points will be incorrectly highlighted. Although this may not be statistically significant it will make it more difficult to identify trends.
  • The number of sessions reported will be higher than it is in reality
  • Conversion statistics may appear lower than they really are. One session may be incorrectly identified as two (or more), whereas only one will lead to a conversion
  • Conversions may then be assigned to the wrong source, possibly leading to incorrect decisions to abort or continue campaigns
  • Numbers of distinct visitors will appear higher than they really are
  • Repeat visits will appear lower, which may lead to unfounded doubts about the ‘stickiness’ of the site’s content.

The industry as a whole is trying to find ways to raise accuracy levels by other means but this brief look at the implications of cookie deletion when using a cookie-reliant tracking product shows what an important issue this is.

Jupiter study about cookie deletion rates

Nielsen Corroborates Jupiter’s Cookie Deletion Report

Clickz commentary on the jupiter report

Deleting page elements using removeChild in JavaScript

Here’s a handy(?) script to delete pieces of any page you’re on when you click on them:

javascript:document.onclick=new Function(‘e’,’document.all?src=event.srcElement:src=e.target;
src.parentNode.removeChild(src);return false’);void(0);

Go to any page on the web, paste it into the address bar of your browser and then start clicking around. It works okay in IE6 and Firefox – I haven’t tried it in anything else but it should be okay in IE5 (PC), and might work in Safari.

It uses JavaScript to access the DOM and then tells every element you click on to delete itself from its parent. Generally, a bit suicidal.

Try it out by clicking on this link and then clicking around on the page.