Apache PageKit

Most of the services on WSO's site were written by students in a framework called Apache PageKit. Applications written for it take advantage of WSO's log-in system. Current examples are the WSO Facebook, the old blogs, and PhotoShare. If you have an idea for a new web service and want to know what it takes to make it happen, you came to the right page.

If you want to try some of this stuff out, you should first get your hands on a copy of the code.

This article will explain the basics of PageKit and WSO's implementation of it.

Learning the Languages

You'll need to know Perl and SQL to do anything with the back-end, and HTML to do anything with the front-end. One good way to learn these is to look over the existing code (follow the steps in How to hack on the WSO site). Here are some additional resources and references:


A great place to get started is Brent Yorgey's Crash Course in Perl. It's a 30-page guide to getting started with Perl 5. You'll also want a more comprehensive reference of some sort. Programming Perl ("The Camel Book") is regarded as the Perl Bible. You can borrow it from Schow [1], buy it on Amazon [2], or see if it's one of the Books You're Welcome to Borrow.


The WSO site runs PostgreSQL 8.0, a powerful open-source database. You might get started with their tutorial [3], and at some point you will definitely want to become familiar with their language guide [4].


Good references are the Cascading Style Cheatsheet and the Blooberry Guide to HTML.

MVCC: Your New Best Friend

All PageKit applications are divided into three parts: the Model, the View, and the Content.

The Model is the left-brain. It is Perl code that handles incoming requests and interacts with the database. If your application is located at wso/hotornot, then its model will be called hotornot.pm. hotornot.pm will have a Perl subroutine corresponding to every page and form action. The Model sends its data to the View.

The View is the right-brain. It formats the data into the HTML templates to produce the pretty, data-rich pages that you see. Each *.tmpl file corresponds to a single web page. A typical template file will look pretty much like an HTML file, but you will see these additional tags:

       <MODEL_* xxx>
       <CONTENT_* xxx>
       <PKIT_COMPONENT xxx IMAGE="xxx.gif" TITLE="WSO/Hot or Not" etc.>

We'll go over those in a minute.

Finally, the Content is a bullet lodged into the back of PageKit's brain, the kind you can't pull out or else you'll bleed to death. The Content makes me hate PageKit. A Content file is required for every web page. Now in theory, Content is supposed to be XML files with information that doesn't change much, like a policy or a static list, which is interpolated into the View templates. But XML is probably the least human-friendly data encoding language I know, so in reality everybody just sticks their "Content" straight into the View templates. But even if a page does not use anything from the XML file, it still has to have an XML file or else the site will break. If you learn that principle, if you write it on the ceiling above your bed and memorize it, you will save yourself much suffering.

Ok, let's talk a little bit more in-depth about all of these pieces. Well, not the Content, I've said everything I want to say about that.

The Code

Like I said, each page or action corresponds to a subroutine. PageKit subroutines always begin like this:

  sub mysubroutine {
    my $model = shift;

$model is a variable that's going to give us useful stuff, including:

  $model->dbh (the database handler)
  $model->pkit_user (the current user's ID)

In addition, it will give us access to form variables and URL parameters. For example, if someone requests a page that ends in ?id=544, then we can get access to the value of id with this variable:

  $model->input( 'id' )

Same thing applies to submitted forms. If a form element is called 'name', then $model->input( 'name' ) will give us what the user entered.

Usually, you'll use these variables to do something useful, like yank something from the database. Your database interaction should take place in Lib. Lib is a separate folder, located at wsonet/site/Model/WSOKit/Lib in the code tree, and by convention it does all of the heavy lifting so that other PageKit applications can take advantage of its subroutines.

Then, once you've got what you need from the database (or maybe you inserted something), you probably want to send some sort of data back to the user. This can be a message or just database info. To do this, you'll write something like:

  $model->output( message => "Information submitted" );

and then the template will have access to a variable called message.

If you're displaying a single record, you'll want to call "output" for every piece of information that's being requested, like name and phone number and dating status and all that. Life is a little easier because you can pack them all into one method call like this:

  $model->output( name => "Jing",
              status => "negotiable",
              number => "wouldn't you like to know" );

Now suppose we want a bunch of information from a bunch of records. It would be a huge pain to call output on every attribute of every record. Instead, we can call output on a reference to an array of pseudo-hashes. It sounds complicated, but look at some code, and you'll see how to do it. A simple example:

  sub hotties {
    my $model = shift;
    my $hotties = $model->dbh->selectall_arrayref( 
                    "SELECT id, name, hotness FROM hotties ORDER BY hotness", 
                    { Slice => {} } ); # Slice tells it to make pseudo-hashes
                                       # rather than an array
    $model->output( hotties => $hotties );

selectall_arrayref is a sweet function that gives us exactly the data structure we want: an arrayref of pseudo-hashes. That just means it's an array of records, each of which has attributes. This is perfect for viewing a bunch of anything: search results, current ride offers, whatever.

Note: some unenlightened methods in the code base don't use selectall_arrayref. They use like three method calls followed by a while loop. Don't be deceived, because all that stuff isn't necessary unless you need to do further processing on the data before you send it to the View.

OK, we're ready to stick our data into the View.

The Templates

These are in the View. If you're looking at a copy of the code, they're in wsonet/site/View/Default. We'll go over the special template tags one by one.


This tag pulls the variable xxx from the Model. So if your Model code contains this:

$model->output( name => "Buckwheat" );

and your View template contains this:

My name is <MODEL_VAR name>.

The final HTML will look like this:

My name is Buckwheat.

As you can see, this tag's behavior is pretty straightforward.

<MODEL_IF xxx>

This tag checks to see whether the variable xxx has a value, and manipulates the template accordingly. Typically, a MODEL_IF statement will have three tags:

       <MODEL_IF xxx>

Here's how they work: If xxx is a variable defined by the Model that Perl evaluates to true, then all of the HTML between <MODEL_IF> and <MODEL_ELSE> will be inserted into the template. Otherwise ("else"), all of the code between <MODEL_ELSE> and </MODEL_IF> will be inserted into the template. If <MODEL_ELSE> is absent and xxx is false, then no code will be inserted into the template.

A powerful feature of <MODEL_IF> statements is that they can be nested. For example, let's say your Model looks like this:

 $model->output( name => "Jan" );
 $model->output( is_ninja => 1 );
 $model->output( advice => "The better the code, the sparser the documentation." );

Suppose you want to display Jan's info only if he has a name, and suppose that you want to display Jan's advice if and only if he is, in fact, a ninja. In your template you will write:

 <MODEL_IF name>
 Name: <MODEL_VAR name>
  <MODEL_IF is_ninja>
   <br />Ninja advice: <MODEL_VAR advice>

The final HTML looks like:

 Name: Jan
 <br />Ninja advice: The better the code, the sparser the documentation.

It can be easy to confuse yourself when lots of MODEL_IF's are tucked inside one another, like those Russian dolls. You will find that your life will be made easier with generous spacing, clean indentation, and, yes, the occasional comment.


This is the part where we use those crazy pseudo-hashes we talked about earlier. The stuff between <MODEL_LOOP> tags will be inserted once for each item in the array that it corresponds to. The cool part is that we can stick <MODEL_VAR> tags inside <MODEL_LOOP>, and they will be the values for the corresponding keys in the pseudo-hash. So to continue our example, if we're given a "hotties" arrayref, we can write something like:

 <MODEL_LOOP hotties>
 <tr><td><MODEL_VAR name></td><td><MODEL_VAR hotness></td></tr>

And that will give us a row in a table for each record. Neat, huh.


There are <CONTENT_VAR>, <CONTENT_IF>, and <CONTENT_LOOP> tags that behave similar to the model tags, except they pull their data from the XML files. They're not used that often, but if you need them you can poke around the code to see examples of their use.


This tag lets you include another template smack-dab in the middle of the current one. The Model of this included template is also called, if present.

Nota Bene

The tags are evaluated in this order:


This means that you cannot use information from the Model to pull out specific data from the Content. If you ever think you absolutely must, you are probably better off putting it all into the Model.

Final notes

That should be enough to get you started. Look around the existing code to see how it's done. The official PageKit documentation blows, but here it is: [5]. Make friends with the PostgreSQL documentation. Keep your database calls to a minimum. If you're making complicated database queries that take a long time, learn how to cache data. And for Christ's sake, comment your code.