It always sucks to write a "sorry for not posting" blog post and so here is the plan for the next few months. PINQ development is temporarily on hold. It's at a point where I like the ideas in it but where the implementation of the packages system is overly complex. My plans are to take a bunch of stuff out of the database and model/gateway areas and move them permanently into the PQL-specific stuff. This makes a lot of sense, especially since I think it would be worthwhile to make PQL as module-like as possible so that I can release it alone for those interested in it.
On the personal development side of things, I have been buying up a huge number of books which I intend to read over the next few/many months. The books that I have queued up to read align nicely with my long-term goals. I just finished reading The C Programming Language (a.k.a. K&R) and am now getting into ANSI Common Lisp. After ANSI CL I plan on reading through Practical Common Lisp (which if you are a longtime reader then you will know I've previously read parts of this book). My main motivation for getting into Common Lisp is that I want to be able to read The Art of the Metaobject Protocol and know exactly what's going on in it.
That's about all there is on the Lisp side of things. I then get into the more systems programmy things that I'm into: language design, compilers, garbage collection, etc. Not too long ago I started reading the dragon book (Compilers: Principles, Techniques, and Tools) and have paused reading of that. It's a dense book so I want to ease into it a bit more. I intend to do that by first reading Programming Language Pragmatics and Compiler Design in C. After those I think I will be ready to finish off the dragon book. Finally, I'm going to jump into Virtual Machines and Garbage Collection: Algorithms for Automatic Dynamic Memory Management
Throughout all of this, I've got side reading of The Pragmatic Programmer and Beautiful Code to read into.
This is my really long term reading schedule. I can imagine reading these books could take me well over a year-- especially now that my third year of university is about to start. School is an increasingly important part of my life, as is my job as a residence advisor at my school and so all of this together will likely limit my blogging ability. However, expect a few random posts now and then, especially around statutory holidays.
Finally, if anyone is ever looking for PHP help, I'm a regular on the SitePoint forums so either create a thread or send me a private message. My username is: Peter Goodman.
See you all on the other side!
I was a bit bored so I whipped up this little diagram in OmniGraffle of the control flow within the PINQ framework and an application. Most things will seem familiar to programmers who regularly use MVC frameworks. It's actually slightly more complex than what is going on in there; however, I was limited to using 20 object with the demo version of OmniGraffle and so I was rather limited on what I could show.
The one interesting bit in here is the yield resource exception. As I've written in previous posts, it changes control from one controller to another.
One amazing feature about Python is that it is self-documenting. All classes and functions can have documentation strings and they all have access to them using the __doc__ property. Most Python doc strings follow the conventions in PEP 257 and so when printed out they are usually quite nice.
I decided that I wanted a feature like Python's help() function in PINQ and so I made one using PHP's Reflection API. If you remember (and read my articles) I wrote an earlier blog post about manually rebuilding classes using the Reflector classes.
First, here's an example of the output from my help() function on the Dictionary class in PINQ:
class Dictionary
| An array-like class for mapping keys to values.
|
| Author: Peter Goodman
|
| Constructor:
|
| Dictionary([array $default_values]) <==> dict($default_values)
|
| Public Methods:
|
| offsetExists(...)
| $d->offsetExists($key) <==> isset($d[$key])
|
| Check if an entry in this dictionary exists for a given key.
|
| offsetGet(...)
| $d->offsetGet($key) <==> $d[$key]
|
| Get the value (entry) for a specific key in the dictionary. If the key
| does not exist in the dictionary then this function will return NULL.
|
| offsetSet(...)
| $d->offsetSet($key, $val) <==> $d[$key] = $val
| $d->offsetSet(NULL, array $val) <==> $d[] = $val
|
| Map a key in the dictionary to a specific value.
|
| Note: When specifying the key as NULL, val must be an array.
|
| offsetUnset(...)
| $d->offsetUnset($key) <==> unset($d[$key])
|
| Remove a (key,value) entry in the dictionary.
|
| toArray(...)
| $d->toArray(void) -> array
|
| Return the (key,value) pairs in the dictionary as an array mapping keys
| to values.
|
| Class Descendants:
|
| InnerRecord
| Loader
| ConfigLoader
| PackageLoader
| ModelDictionary
| PinqRouteParser
| ReadOnlyDictionary
| StackOfDictionaries
| View
| PinqView
As you can see, it gets the PHP Doc Block comments for classes and methods and displays them nicely. It also shows the class constants. I decided not to show class properties because I usually don't do doc-block style commenting on them and so I didn't need them. Another thing this class does is function the extending/implementing classes for an interface/class and shows it as a tree of class descendants. This follows nicely from the idea of self-documenting code as it allows the programmer to discover new and important classes.
I'm currently in the process of re-documenting PINQ's source code to be more Pythonesque as I rather like the format and it suits the help() function well. This, of course, does not mean that my help() function (and its associated functions) do not work for non-PINQ classes because they do.
You can check out/download the code from PINQ's Google Code project repository at http://code.google.com/p/pinq-php/source/browse/trunk/system/functions/reflection.php. The help function works for classes, objects, functions, and class/object methods (using two arguments instead of one).
Recently I've implemented a very nice feature in PINQ: the ability to yield control to a different controller. The way it works is by throwing a specific exception and the catch for that exception is within a do-while loop. Inside that loop is the code that parses a route, instantiate a controller and dispatch to an action. The catch changes the route to parse and tells the loop to iterate one more time, thus allowing the programmer to change the controller/action being executed.
This feature's main use case is error handling. With a proper error controller set up, such as the one below:
<?php
class ErrorController extends PinqController {
public $layout_file = 'error';
public function ANY_403() { set_http_status(403); }
public function ANY_404() { set_http_status(404); }
public function ANY_405() { set_http_status(405); }
public function ANY_500() { set_http_status(500); }
}
I can then do: yield('/error/404') for example, or more easily do yield(ERROR_404);. This feature allows me to move the presentation logic for errors into controllers that are under the control of the programmer (ie: in the application directory) and also not make common errors not something that needs to be handled separately of the rest of the program.
Another use case is extending the class that causes this change in control flow. For example, my DatabaseException class now causes an error 500 when thrown as usual. Here is the code to do that:
<?php
class DatabaseException extends YieldControlException {
public function __construct($message) {
parent::__construct(ERROR_500);
$this->message = $message;
}
}
I think this is an extremely sweet solution to the problem of gracefully handling certain kinds of errors.
As I've been testing out lot's of things for my framework, I've come across a few neat little hacks. The first 1.5 hacks involves PHP's list() construct, and possible ArrayAccess as well.
<?php
list($foo->bar, $foo) = array(new Bar, new Foo);
Look closely at the order of variables in the list(). It seems that list actually sets the variables from right to left. It also indexes into the array numerically, which means that if instead of putting an array as the rvalue of list, you can put an object that implements ArrayAccess.
The second fun hack uses PHP's ArrayAccess interface (as before) but to overload PHP's array push syntaxt. Consider this valid piece of code that in PINQ merges an array into a dictionary:
<?php
$dict = new Dictionary;
$dict[] = array('foo' => 'bar');
This hack works using ArrayAccess::offsetSet, and recognizing when a NULL key is passed and handling it accordingly.
Obligatory Plug
The short answer is that you don't need to (in PINQ, using PQL). You can if you want to but PINQ will automatically do it for you.
The Problem
Alright, that's enough plugging of PINQ. I'm very excited to say that I just solved this problem and I'm going to explain how I did it. But first, I will set up an example if you're not quite clear of the problem I'm talking about. Consider the following two simplified SQL table defininitions:
CREATE TABLE posts (
id INT,
user_id INT,
name VARCHAR,
body TEXT
)
CREATE TABLE users (
id INT,
name VARCHAR
)
If I do a database query and select from both the posts and users tables at the same time (linking the two, of course), then their 'id' and 'name' columns will conflict. PHP's mysql_fetch_assoc will return roughly the following array:
array(
[id] => 1,
[user_id] => 1,
[name] => 'user name',
[body] => 'the body text of the post'
)
Those are terrible results! We've ended up losing lots of valuable post information because user information has overwritten it. We could solve the problem by returning a numerically indexed array, but then we would lose the convenience of indexing into the result by column names (or aliases for that matter). Alright, so how can we solve this problem?
Solving the Problem
The Normal Way
The obvious and normal way of doing this is manually aliasing the conflicting columns of one of the tables in the SELECT statement. This is a fine solution and isn't too bad. But, it takes the convenience out of doing a simple SELECT posts.*, users.* FROM ... which is honestly, super convenient.
How PINQ does it, and how it benefits you
The PINQ solution is to actually do roughly the same thing. First, PINQ figures out what the conflicting columns are going to be. This can be done with a series of calls to php's array_intersect_key and array_merge, that is, assuming we know all of the columns for a given table (which we assume and which we do in the models). The conflicts array ends up being this:
array(
[id] => 1,
[name] => 1,
)
We start by expanding any SELECT *'s into all the columns that need to be selected from and how they are going to be aliased. We then go over all of the column name to alias mappings and check if any of the aliases exist in the conflicts array using a simple isset(). If they do, we prefix the aliases with the model name. Here's what we end up with:
array(
[posts_id] => 5,
[user_id] => 1,
[posts_name] => 'my first post!!!1!',
[body] => 'the body text of the post'
[users_id] => 1,
[users_name] => 'user name',
)
That's all good and fine, but it's not a complete solution. If we were to stop here then we would need to know when columns conflict and index them with their new prefixes. That is undesirable behavior. So lets look at the result and see what we notice. Immediately we can see that the results come out in the order that we selected them (remember posts.* then users.*). If we assume that when we select the columns that we group them by their tables then we can count on there being this nice order to the result array (a fair assumption using PQL in PINQ). We've observerd that there is this order to the results, but that still doesn't get us any closer to getting rid of those prefixes we added to those columns.
What if we injected some dummy columns into the query to tell us where the columns for one table end and another begin? Further, we need to know when NOT to do this whole separating process. For example, if in PINQ we use straight SQL in a query then none of our assumptions will hold water and so we might end up mangling the results in an unfortunate way. Remember, the goal is to lose no data!
The solution is clear: we need sentinel columns in there to tell us:
- When to perform the splitting up and processing of a query
- Where to split the row data into different models
- How to identify the prefixes on the unambiguous columns so that we can remove them
We can do step 1 with one sentinel, and steps 2 and 3 at the same time by including the model name (which is the prefix on ambiguous columns) in the sentinel to show where a table begins/ends. Here is our desired result array:
array(
[__pql__] = 1
[__posts] = 1
[posts_id] => 5,
[user_id] => 1,
[posts_name] => 'my first post!!!1!',
[body] => 'the body text of the post'
[__users] = 1
[users_id] => 1,
[users_name] => 'user name',
)
Hey, we can actually do that in SQL in a very simple way.
SELECT 1 AS __pql__, 1 AS __posts, posts.*, 1 AS __users, users.* FROM posts, users WHERE posts.user_id=users.id
That is in essence what is done in PQL (except that the posts.* and users.* are expanded to be unambiguous and prefixed). Now that we have the desired row data, all that is left is to separate it into the different contained models and remove any unwanted prefixes.
Why and what are the benefits?
This seems like a feat of over-engineering. We've solved a minor problem with complicated solution that requires extra processing of records from the database. What are the advantages? First, lets look at how the traditional ORM works. We would first get a row from the posts model, and then call a relationship on it, such as $post->users. Calling that relationship on each record means one extra query per record. One of the goals of PQL is to shift the work of calling these relationships on objects (and thus doing queries) to constructing the relationships in the database queries then sorting out the data when its needed (note: my intention is to also support the traditional ORM style of calling on relationships). Still, the question "why?" remains. I was prompted to this idea by seeing these relationships being called in large loops. The number of database queries piles up like crazy and then things start slowing down.
An Example of PQL
So, here is an example of a pql query that does all of the above (from within a controller method):
$db = $this->import('db.my_db');
$posts = $db->findAll(
from('posts')->select(ALL)->
from('users)->select(ALL)->link('posts', 'users')
);
foreach($posts as $post) {
echo $post['id']; // this is the ambiguous id which resolves to the user id
echo $post->users['id']; // this is the user id from within the columns
echo $post->posts['id']; // this is the post id
}
As you can see here, my earlier assumption that the select columns would be in order comes true with PQL. The reason is because PQL removes ambiguity from queries by having the programmer select columns from a specific model.
EDIT: I have already come up with a simpler method that I didn't recognize earlier. Instead of calculating the conflicts I prefix every column and then chop off the prefixes later. Regardless, most of the above article still holds and I feel as though others might find the process interesting.
So, originally PINQ stood for [P]HP [IN]tegrated [Q]uery; however, I liked the name so much that I've named my new framework PINQ and now the LINQ-like stuff in is called PQL. So, here are the basic ideas I'm pushing in the (in development) framework. First, a non-traditional ORM. Everything ORM-ish is actually based around PQL. The ORM and PQL are also not limited to databases! Also, multiple database connections are *stupidly easy* to manage. In fact there is just so many things that I'm excited about that you really need to look at the code. I've been commenting the hell out of what I'm doing. If anything, I suggest any onlookers of the code take a look at the following directories/files:
- /system/model/
- /system/pql/*
- /system/packages/route-parser.php
- /system/other/ (the linear-state files, this is just a temporary directory btw)
The assumption that this framework is MVC (based on some of the class names and whatnot) is more apparent than real. I am not trying to adhere to any MVC rules, I'm just writing it the way it comes to mind and makes sense. In fact, I'm thinking of renaming my Controller class to something that's actually representative of what it is (as it doesn't really control anything).
Anyhow, the license I'm probably going with is MIT, and you can find the code (as I work on it!) at its google code project page: http://code.google.com/p/pinq-php/. If you do look into it, I suggest you look into it as a curiosity, not as something to deploy with (at least now). It's in heavy development and I sometimes introduce funny bugs. Regardless, there are some very neat pieces of code in it. Enjoy!
In my free time I've been playing around with a few neat ideas. The main one is LINQ inspired querying for PHP. Suffice to say, I have made some very solid progress at this domain-specific language and can now make SQL queries that have complex joins in them. Here an example with some foreshadowing to where I'm going with all of this: http://codepad.org/OEVsQmwt.
Update: My Current progress on PHP LINQ: http://codepad.org/qr9SvC2Y.
It's summer again. Summer means coding PHP at work and having time to my own things. Speaking of which, here's a fun little hack I just came up with: http://codepad.org/aNW13yC9.
<?php
error_reporting(E_ALL | E_STRICT);
/**
* Class that becomes a super global variable.
* @author Peter Goodman
*/
final class SuperGlobal implements ArrayAccess {
public $globals = array();
public function __construct(&$globals) {
$this->globals = &$globals;
}
public function offsetGet($key) {
return isset($this->globals[$key]) ? $this->globals[$key] : NULL;
}
public function offsetSet($key, $val) {
echo 'hiii';
$this->globals[$key] = $val;
}
public function offsetUnset($key) {
unset($this->globals[$key]);
}
public function offsetExists($key) {
return isset($this->globals[$key]);
}
}
// create a new $GLOBALS array, populated with our new superglobals
$super_globals = array('_SERVER' => new SuperGlobal($_SERVER),
'_GET' => new SuperGlobal($_GET),
'_POST' => new SuperGlobal($_POST),
'_REQUEST' => new SuperGlobal($_REQUEST),
// ...
);
// overwrite the $GLOBALS array, then extract the new super globals by
// reference into the current scope, overwriting the shorthand to the
// normal superglobals.
$GLOBALS = &new SuperGlobal($super_globals);
extract($super_globals, EXTR_OVERWRITE | EXTR_REFS);
// function to test if the overwriting of the superglobals worked as
// expected
function test_scoping() {
print_r($GLOBALS);
print_r($_SERVER);
}
test_scoping();
More to come soon.
- PHP
by Peter Goodman on May 28, 2008 @ 8:47pm
Many people stay away from nested sets. The whole idea of a node having a left and right identifier to determine nesting confuses people. That or people dislike it because they feel that it de-normalizes their database. In this article I am not going to go into how to implement the mptt algorithm; you can find my own full implementation of mptt in my code folder. Instead, I am going to show how to determine the depth of a node at runtime (in PHP) with only the knowledge of the left and right ids of a node. A quick note: my solution is an iterator and uses neither recursion nor lookbacks or lookaheads.
I would expect that this would be the usual solution to the problem. Most people tend to put the heavy lifting in SQL by doing a COUNT of every node with a left id that is less than the current one and a right id that is greater than the current one (imagine the roots coming off of the trunk of a tree, this starts at the base of a root and looks up to the trunk). That is a sufficient solution; however, it is expensive for a large tree. The solution itself also requires that the depth be calculated for every node in advance of getting the nodes--regardless of whether or not we will use/display them.
So, we know that the data we are working with is a tree. Trees can be represented in many different ways in code. One such way is with a stack. If you remember, I used a stack to parse HTML as the pushing/popping function of a stack cleanly represents entering a new branch and then leaving one. Our approach is known: we will use a stack. But, how will a stack be applied to a result set of nodes?
First, the ordering of the nodes is crucial. Our end goal might be to make a threaded look for the tree (hence the usefulness of knowing the depth) and getting the nodes out in their threading order is simple: they need to be in ascending order of their left ID. If you do not understand why then consult this image on the MySQL website and look closely at the left/right ids. So, we can easily fulfill the first requirement: the nodes need to be in their threading order.
Second, it can be easily shown that for any node on the tree, the formula: (right_id - left_id - 1) / 2 will yield the total number of nodes below it. The total part is important! If the nodes were laid out in a flat array, then given our first assumption--that the nodes are in their threading order--then we can further assume that the result of this formula means that the following X number of nodes in the array after the current node (X being the total number of children of a given node) will be at least one level deeper than our current node.
Okay, so we have an inkling of how this will work. We get to a node, and we know that the next X nodes are children. That's simple enough; however, for nesting above one level, the thinking process gets more confusing: how do we keep track of the number of children there is and that we've already looped over if every parent of a given node includes the number of the children of a given node. Unfortunately, this last sentence might be poorly worded, so I will try to illustrate the problem differently. Consider the following tree:
+----- child a (1) +----- child b (0)
root (3) ---|
+----- child c (0)
array(
'root',
'child a',
'child b',
'child c',
)
Beside each node I have the total number of children that the node has in braces. After that I have a simplified version of the array of the nodes as they would be available from a result set. So, when we get to root we know that the next three nodes are children (child a, child b, child c). When we get to child a we know that the next one node is a child (child b). At child a, we know that we've consumed 1/3 of the children of root. When we're at child b, we know that we've consumed 1/1 children of child b, but how are we to say that we've now consumed 2/3 of the children of root? Finally, when we get to child c, we have consumed 3/3 of the children of root, and can now say that we are back to the same level as root is on.
So, we know that as we go we need to consume nodes. Given that we know how many children a node has, as we iterate over those children, we can reduce that counter by 1 for each node visited so that when the counter hits zero, we know that we're done looking at children and we're back to the parent nodes depth. The only problem is introduced with more than one levels of nesting. So we consume nodes and that affects the counter on the parent node, but it also needs to affect the counters on all parent nodes way up in the tree. Updating the counters of every parent node each time a node is consumed would be a waste of resources. Instead, it can easily be done when we're done looking at all the children of a given node. If we store the number of children twice--once acting as a counter that will be consumed, the other acting as a reference for how many total children were consumed--then we can accurately update parent counters as we go along.
The final solution (in pseudo code) is that at every node we encounter, we push pair($counter, $counter) onto the stack. As we encounter child nodes, we consume a node by reducing top($stack)[0] by one, and leave top($stack)[1] alone. By definition, count($stack) is the current depth that we are on. When top($stack)[0] < = 0, we do $temp = pop($stack) and then account for all the nodes we just consumed by using the second counter: top($stack)[0] -= $temp[1]. We clean up by constantly looking is top($stack)[0] <= 0 and popping those parts off.
That pseudo-code version is lacking but illustrates all of what actually needs to be done. For some, this might have originally seemed like a difficult problem (from the programming side, not from the SQL side). However, the solution is quite simple when we look at the problem for what it is: a tree, and not explicitly what we want: the depth of a node. Go check out the solution to how to find the depth of nodes for a nested set tree in my code folder (mostly comments, includes example at the bottom of the file).
<?php
error_reporting(E_STRICT);
/**
* Given a flat array of nodes in a mptt (nested set) tree, in ascending order
* of their left id, figure out what level each node is on. The only things
* required to figure out what level a node is on is its left and right ids.
* This iterator uses a stack to keep track of levels.
*
* @author Peter Goodman
* @copyright Copyright (c) 2008, Peter Goodman. All rights reserved.
*/
class NestedSetIterator extends IteratorIterator {
const LEFT_ID = 'left',
RIGHT_ID = 'right',
LEVEL = '_level';
private $nodes,
$level,
$stack;
/* Constructor, take in an array or iterator. If an array is passed, turn
it into an iterator.
*/
public function __construct(&$nodes = NULL, $num = NULL) {
if(!($nodes instanceof Iterator) && is_array($nodes))
$nodes =& new ArrayIterator($nodes);
parent::__construct($nodes);
$this->rewind();
}
/* Push an item onto the stack. The stack holds arrays of pairs that have
the number of children of a given node. Pair[0] is reduced when an item
is consumed and pair[1] is kept to reduce the parent stack elements
pair[0] when the current top stack element is popped.
*/
private function push(&$node) {
if(!is_array($node) && !($node instanceof ArrayAccess))
throw new Exception("Node is not an array nor has array access.");
$num = ($node[self::RIGHT_ID] - $node[self::LEFT_ID] - 1) / 2;
$this->stack[] = array($num, $num);
$this->level++;
}
/* Pop a pair off the stack and update the number of consumed nodes of the
new top node.
*/
private function pop() {
$level = array_pop($this->stack);
$this->level--;
/* We have just popped a level off of the stack. Consume works by
decreasing the number of nodes on the top of the stack, but each
stack elem knows the *total* number of nodes below it, and so
consuming nodes on the top of the stack doesn't account for lower
elements (parent nodes). Therefor we decrease the current top of
stack (after the pop) by how many total nodes there were originally
in the node elem we just popped off.
*/
$this->consume($level[1]);
}
/* Reduce the number of nodes consumed for the top pair of the stack. */
private function consume($num = 1) {
if(isset($this->stack[$this->level]))
$this->stack[$this->level][0] -= $num;
}
/* Clean up after stack elements that have no more nodes to consume. */
private function cleanup() {
/* Move down the stack elems until we hit a level that still has nodes
that haven't been consumed.
*/
for($i = $this->level; isset($this->stack[$i]); $i--) {
if($this->stack[$i][0] > 0)
break;
$this->pop();
}
}
/*
* Iterator methods
*/
/* Get the current node */
public function ¤t() {
$node = parent::current();
/* Use cached version of level so that it doesn't need to go through
the normal consume/push/pop phase.
*/
if(isset($node[self::LEVEL]))
return $node;
/* Cache doesn't exist, consume the node and push a new element onto
the stack.
*/
$this->consume();
$this->push(&$node);
/* Store the level in an unique name */
$node[self::LEVEL] = $this->level;
return $node;
}
/* Move to the next node. */
public function next() {
$this->cleanup();
parent::next();
}
/* Reset everything. */
public function rewind() {
$this->stack = array();
$this->level = -1;
parent::rewind();
}
}