Harmony Backup - algorithmic improvements

I spent most of my ANZAC day modifying and testing Harmony Backup's code. I wanted to increase performance by reducing disk access.

Without question, the most "expensive" computational function is accessing secondary storage, like a hard disk drive. You can improve performance by using a RAID array but even still you're only reducing the problem slightly. Because of the mechanical nature of hard drives (think about how a record player reads a vinyl record and you'll have a good mental picture of how a hard drive works) they're always going to be the bottleneck in your application if the application is disk-intensive.

By changing the algorithm that controls the order and nature of Harmony Backup's disk access I've significantly improved the program execution time and replaced two very disk-heavy steps with one less-intensive step.

Traditionally, if you have two variables that you want to swap you'll use a third temporary variable. Consider:

VarA = 1;
VarB = 2;
VarTemp = VarA;
VarA = VarB;
VarB = VarTemp;

With the backup schedule often every 2 hours (resulting in 12 backups) I'd previously delete the oldest backup, move all the backups down one and copy the newest backup as a base for the current active backup. This involved a delete and a copy, both of which are very disk intensive.

Now instead of deleting the oldest backup I move it to a temporary directory, move all the backups down one, move the temporary backup as the base for the current backup and sync it with the most recent. This involves two moves (which have very very low overheads as the data pointer is only moved, not the physical data) and a sync (which is significantly less intense than a delete or copy).

My initial testing shows that I've squeezed a 200% speed increase when dealing with systems of approximately 6GB, and I estimate that I'll see improvements of up to 800-1000% for larger systems.

A 10x speed improvement and significantly less strain on our disk array. Sounds like a good afternoon to me.

Update: 1:03AM, Thursday 26th April
It appears that I've cut the time taken to "start" a backup of hotrod by about 900%; it's down from an hour and a half to about 11 minutes. The actual data backup still takes the same time but it's now being initialized much, much faster.

Leave a Reply