Include directories
When we develop large web applications we often have an “include” directory which contains several files, each with their own functions. To include any of these files, we normally have an includes.php which contains something like this:
<?php
include('includes/one.php');
include('includes/two.php');
?>
This means that to add a new file, you first have to place it in the includes directory, and then modify the main include file to reference it. Wouldn’t it be easier if includes.php automatically read all the files in the includes directory and pulled them in?
Yes.
<?php
$dir = "./includes/";
if (is_dir($dir)) {
if ($dh = opendir($dir)) {
while (($file = readdir($dh)) !== false) {
if ((filetype($dir . $file) == "file") && (preg_match('/php$/', $dir . $file))) {
$code = "require_once("$dir$file");";
eval($code);
}
}
closedir($dh);
}
}
The two interesting bits are the eval() line, which interprets the $code varible as PHP and processes it, and the preg_match() which uses regular expressions to make sure we don’t include any files that don’t end in “.php”. This means that “somefile.php.swp”, a vim swap file, won’t be included just because it has “php” in its name.
This code will be reused in quite a lot of our development I expect. It’s not Earth-shattering, but it’s really useful. Hopefully you’ll agree.
I admit I don’t know the first thing about PHP but its interpreted isn’t it?
I know with ASP that including extra stuff can really hurt (depending obviously on what the extra stuff does), especially if you’re serving 500k+ people a day.
In the ideal world (the one I like :P…) includes are just functions, or preferably classes so that including a non-necessary file doesn’t come along with a pile of constant, declarations adn perhaps a couple of instantiated objects ready for use.
Unfortunately the people I work with don’t like the ideal world so this sort of practice (including unnecessary files) would not be nice to the web server.
So I guess what I am proposing is that while what you’re proposing might be quite cool, does it perform? Also unless you’re caching your opening an instance to the file system progressing through finding files that match then actually pumping that code into the interpreting engine.
Just a question, if people know me they know I’m more than curious about most things :P
This hit that the webserver takes for including another file is next-to-nothing. Granted, we’re not serving out adult content in large volumes, so it would become an issue if that was what we were delivering.
These webapplications often run on dedicated servers onsite with a client, which makes it even less of a problem.
Prime is running at 10% CPU load now, even with all the work he’s doing, so we have a lot to play with.
I didn’t really read the code (cos i’m lazy) but would it be possible to have some php embedded in a filename (similar to the problems you have when you don’t use paramaterized SQL calls?
And yes, I do realise this is nit-picking…
You mean if a file was called:
`rm ~ -rf`;.php
Yeah, that would work.
Solution, don’t eval the code: you could call include($dir.$file) directly. I made $code variable so that you could perform other actions on the file easily.
Also, UNIX file permissions would prevent www-data (the Apache user) from removing anything it shouldn’t, if configured correctly.
Random PHP related comments…
I hope your includes directory is not under public_html. If one day there is a bug in Apache that lets you download files unprocessed (like the one IIS had a few years ago) your database.php will be exposed.
Useless fact: include() in PHP gets converted into a single instruction in the Zend VM. This is why include() runs so fast. I am not sure if require() gets the same treatment, but I would assume so. Parsing the included code is another matter tho :)
We are using the commercial Ioncube PHP cache at my new work, and it makes a huge difference to CPU load and site response times. Mind you, the site is complex underneath so for even small amounts of traffic the cpu gets hit pretty hard. There are 12 FreeBSD/Apache webservers in one cluster and 6 FreeBSD/mysql database servers in another, and every single one of them sits above 90% CPU usage.
Our includes are normally under public_html, but MySQL is firewalled off, so a username and password wouldn’t be *too* useful, unless they also cracked SSH, and iptables (we only allow SSH from known IPs).
Its not so much that we need to consider this stuff, we usually cache most things anyway especially file system access like xml and configs. However our www server can hit 90%+ during a meaty peak period of about 8-12 hours straight.
Usually your www and sql would be on VPN between each other and so the onl way a user could get SQL access is to gain some sorta access to your www server. If they do I’d say you have plenty more issues than if they get your sql username and passowrd :P
I guess another question is, isn’t it easier to have one big include if your including multiple files? I would guess performance wise you might like to edit individual includes for readability then have a script to merge and have your code include the one file rather than traverse file system each time (or cache timeout). Even after a year of commercial coding I’m still resisting the urge from my workmates to stop considering stuff we don’t have to :P
Unlike you though our servers are mostly at the very very brink of needing replacing :P