TRUNCATE TABLE on MySQL InnoDB databases

Having come up against the extremely poor performance of using TRUNCATE instead of DELETE on MySQL InnoDB tables (see previous post MySQL Truncate slow performance problems) I thought I better come up with a solution that didn’t mean leaving a table to clear for an hour.

The solution is to use a combination of SHOW CREATE and DROP. DROPping a table is very quick indeed, so as long as you have the CREATE code to hand then it’s a simple matter to empty a table. The main thing to watch out for with InnoDB tables is foreign key constraints which are easily disabled.

Some sample code to use this from within PHP is shown below

function truncateTable($tableName)
{
   //Grab the code to create the table
   $sql = "show create table " . $tableName;
   $dataSet = DataHandler::loggedDbQuery($sql);
   $result = $dataSet->fetchRow();
   $createSQL = $result["Create Table"] . ";
        SET FOREIGN_KEY_CHECKS=1;";
   //Drop the table. We have to disable foreign key
   //checks, which means running the whole thing
   //from the command line
   $sql = "SET FOREIGN_KEY_CHECKS=0;
      drop table " . $tableName . ";".
      $createSQL;
   DataHandler::multipleDbQueries($sql);
}

This is used in conjunction with a static method I’ve created to run a standard (single) SQL query from within PHP called DataHandler::loggedDbQuery (which works with PearDB, which is where the fetchRow() method comes from) and a multi-line query function I have developed and wrote about in Multiple SQL queries using MySQL and PHP and referred to as DataHandler::multipleDbQueries($sql).

Multiple SQL queries using MySQL and PHP

Something that I’ve had problems with using MySQL/PHP is the limitation of only being able to run one line of SQL at a time. Using something such as Microsoft SQL Server it’s possible to write multiple lines of SQL and run it all in a single database call. Until stored procedures (in MySQL 5) are available (i.e. when it seems that the database engine is ready for a live environment) I’ve put together the following static method ‘hack’ using PHP’s exec() function (assuming you’re using PHP 5’s object syntax – otherwise just paste the code into a regular function):

class MySQLInterface
{
   public static function multipleDbQueries($sql)
   {
      $file = fopen(TEMP_CSV_LOCATION .
         "temp_query.sql","a+");
      fwrite($file,$sql);
      fclose($file);
      exec("mysql -u " . DB_USERNAME .
         " --password=='" . DB_PASSWORD .
         "' " . DB_NAME . " < " .
         TEMP_CSV_LOCATION .
         "temp_query.sql");
   }
}

What this allows you to do is to pass in a SQL string and have it executed as if it was being run from the command line. This is especially useful if you need to disable foreign keys for some reason. e.g

$sql = "SET FOREIGN_KEY_CHECKS=0;
   drop table oldTable;
   SET FOREIGN_KEY_CHECKS=1;"

MySQLInterface::multipleDbQueries($sql)

Downsides of this are

  • Requires command line access to mysql
  • Liable to SQL injection

Since i’m working within an internal system I have control over both of these and the code seems to work particularly well.

Optimising MySQL a query with packed keys

I’ve been learning more about MySQL lately and particularly optimising
SQL queries on large tables. Large, in this case, being at the moment
hundreds of thousands of rows but soon to be millions. One of the
problems I’ve had is that MySQL sometimes decides not to use an index
even when a handy one seems to have been created for it. The root of
this appears to be to that with B-tree indexes if there are a large number of records with similar looking values then the MySQL engine may decide that it’s just as much effort using the index as to search the whole table.

The answer appears to be adding PACK_KEYS = 1 to the end of a create
table, or running the SQL command ALTER TABLE MyTable PACK_KEYS = 1 once
the table has been created. In effect, this takes account of the
similarity of adjacent keys. In our case we have a large column of field
type bigint(21) where the starting digits of the index are timestamp
generated. So, at present, we end up with a few tens of thousand rows
all starting with 108xx. Enabling packed keys means not only that the
index is smaller as MySQL only needs to store the differences between
keys (plus an extra byte to keep track of where the similarity starts)
but also that the index is actually of some use i.e. doesn’t become a
large, flat structure.

One down side of using packed keys is that inserts are slower, but given
that the system we are building is inserting each row once and then (in
theory) never touching it again that’s a small price to pay. The other major drawback, however, is that packed keys only works on MyISAM tables at present and not InnoDB. This actually isn’t much use to me as the large inserts we occassionally have to do would end up with MyISAM locking the table for perhaps an hour or more.

Pack keys
reference in the MySQL manual

PHP 5 class constants and subclasses

Another one of those ‘I wish PHP 5 did this…’ moments has occurred to
me with class constants. The addition of constants is good but the
problem is when it comes to subclassing. The code on the PHP 5 site:

class Foo {
   const constant = "constant";
}

echo "Foo::constant = " . Foo::constant . "\n";

is fine. Of course, what you really want to have is a method to give
you the constant in case you want to change the workings later:

class Foo {
   const constant = "constant";

   public function getConstant(){
      return self::constant;
   }
}
$foo = new Foo();
echo "$foo->constant = " . $foo::getConstant() . "\n";

and this works too. The problem is that if you decide a subclass
needs a different constant value, so we add

class Bar extends Foo {
   const constant = "bar constant";
}

If we then call

$bar = new Bar();
$bar->getConstant();

then the value returned is “get_constant” i.e. the value of the
constant in the parent class. This is because Bar has no handler for
getConstant() so it uses its parent. That’s fine, but now we’re in the
parent context then Foo::constant is returned through the reference to
self::. The way to get round this (that I have found) is to put a copy
of the getConstant() method in each of the subclasses. This kind of
defeats the purpose of inheritance in this case.

class Bar extends Foo {
   const constant = "bar constant";

   public function getConstant(){
      return self::constant;
   }
}

Now we call

$bar = new Bar();
$bar->getConstant();

and the correct value is returned. Of course, the other way round is
not to use constants at all but to put the value in a private variable,
but then what use are constants?

MySQL Truncate slow performance problems

I was having problems with a MySQL TRUNCATE taking a long time on a very
large table (with foreign keys). I had thought that TRUNCATE ran more
quickly than DELETE but, according to the MySQL manual, that isn’t the
case with InnoDB tables. In this case there is no difference between
TRUNCATE and DELETE and it’s recommended to drop the table and then
re-create it. That sounds like a very high risk operation to me, but
given that the TRUNCATE statement is taking an hour and a half to run
then it looks like I’ve got some code to write.

TRUNCATE in the MySQL documentation

Hiring an ASP.Net developer

My company is looking for an ASP.Net developer, so here seems as good a
place as any to put an ad. Job description below:

London based Internet development company seeks permanent ASP.NET
developer with an absolute minimum of 1.5 years production experience.

Desktop application programming and web services experience are a must
as is extensive VB.NET and SQL Server 2000 knowledge. ‘Can-do’ attitude,
a problem ownership mindset, excellent common sense and the ability to
communicate clearly to internal and external clients are also
pre-requisites.

Exposure to SOAP and programming languages such as C#, XSL and PHP would
be a distinct advantage.

This is an excellent opportunity to join an ambitious, young and
profitable business at the beginning of an exciting growth phase.

No agencies.

Email: asp dot net at exponetic.com

PHP 5 garbage collection

The object-oriented features of PHP 5 are a really positive step forward
for the language. One of the biggest improvements, to my mind, is that
you no longer have to choose to pass things around by reference with a
liberal smattering of ‘&’ symbols: references are now the default way of
passing objects.

One problem I have come across, though, is that the reference counting
feature of PHP’s garbage collection
(http://www.zend.com/zend/art/ref-count.php) means that objects with
mutual references are not deleted even when I thought the object was out
of existence. E.g:

class ParentObject()
{
  protected $childObject;

  function __construct()
  {
    $this->childObject = new ChildObject($this);
  }
}

class ChildObject()
{
  protected $parentObject;

  //Pass in a reference to the parent
  //object and store it internally
  function __construct($parentObject)
  {
    $this->parentObject = $parentObject;
  }
}

Then if I call $foo = new ParentObject(); then it automatically creates
a child object with a reference to the parent. The parent also keeps a
reference to its child. If I then unset($foo); the two objects are still
referencing each other and so are not deleted. The only way I’ve found
to clear this is to create a new method (which I call destroy()) to
delete references to the child. Calling destroy() on the parent first
calls destroy() on its child, which dereferences the parent, and then
the parent dereferences the child. So the classes are now:

class ParentObject()
{
  protected $childObject;

  function __construct()
  {
    $this->childObject = new ChildObject($this);
  }

  public function destroy()
  {
    $this->childObject->destroy();
    unset($this->childObject);
  }
}

class ChildObject()
{
  protected $parentObject;

  //Pass in a reference to the parent object
  //and store it internally
  function __construct($parentObject)
  {
    $this->parentObject = $parentObject;
  }

  public function destroy()
  {
    unset($this->parentObject);
  }
}

And I have to call

$foo->destroy();
unset($foo);

To clear the thing out completely.

This can cause a number of problems which I won’t go into in detail here
(they occur in more complex design patterns), but suffice to say that
there are a number of occassions where I don’t necessarily want to
destroy a child at the same time as a parent, or vice-versa. E.g. a
child references multiple parents. The end result is that I’m writing
code to deal with garbage collection where it is having a big effect on
memory and just leaving it out where it doesn’t seem to make as much
difference. This suffices for a known set of data but doesn’t feel very
satisfactory in terms of future-proofing.

I would appreciate it if anyone else has a better way of doing things.