Getting Started

I have a domain name and an idea for a web application. This is not new, I have had them for a few months now. Since I have very little time to devote to personal projects and abundant initiative has not been my strong point as of late, the idea has not moved past that starting point. This is the main reason that I have started this blog. It is intended to keep the initiative going, document progress and share my experience in developing a web application.

Purpose
My day-to-day job involves working with, supporting and administering large enterprise applications. I rarely get the opportunity to work with the technology that I both admire and enjoy working with.

I’m referring to OSS or, Open Source Software.

I am also a manager and, increasingly, my day takes me further away from dealing directly with technology and this is my attempt to dive back in.

My rules for this project are simple, it can only contain OSS (and “cloud” solutions on the periphery) and as little of the tools I use daily at my existing job.

Caveat
I will be keeping both the site and idea private for now. This is not to prevent others from using it (as it is not that original) but to build suspense and provide a way out if efforts come to nought.

Technology
Enough pre-amble, on to business. This is a web application (I know, original huh?). It will require a domain name, hosting, a language/framework (backend and frontend), a database platform and version control.

As I am a regular reader of Hacker News, I’ve tried to select technologies that are both established and bleeding edge. Some may change as development progresses but, for now, this is what I’ve selected:

Hosting
I have evaluated a bunch of hosting options. From Google App Engine (GAE) and Amazon AWS/EC2 to Slicehost and Linode and eventually settled on Linode.

GAE and AWS are both excellent options and may be worth pursuing in the unlikely event that scalability becomes an issue later on, but the control and cost of VPS hosting options like Slicehost and Linode was the tipping point. Although the pricing models for Slicehost and Linode are similar, the latter was a few bucks cheaper per month which, given the nature of the project, was important.

On Linode, I plan on using a Ubuntu LTS VPS and crafting a Linode StackScript to both ease and standardize server deployment.

Language/Framework
This is a loaded conversation and one that I will expand on in later posts. As each selection also involves additional plugins and/or packages I will most likely have separate posts for each. For now, here is a brief rundown:

Backend – Python/Django
Front-end – HTML5/CoffeeScript/jQuery

Database Platform
I use relational database management systems all day, every day. Boring! This decision rested solely on my rule to avoid technology that I use daily and experiment with new and different tools. It may not work out, but I’m going to start with MongoDB and MongoEngine for Python/Django support.

More on this in a later post.

Version Control
This one is a no-brainer. All the cool kids are using Git, so I will jump right in.

That’s my starting point. Tomorrow, I will begin working on the StackScript to deploy my Linode. My post will contain a breakdown of all of the tools I plan on deploying on the server and links to source material that helped guide the way.

Hosting on Linode

Starting with yesterday’s post, I listed a number of requirements for getting a web application up and running. The first is finding a hosting provider.

As I mentioned, I compared a number of providers such as Google App Engine and Amazon Web Services to Virtual Private Server (VPS) providers such as Slicehost and Linode.

Due to cost, ability to configure any way you like and some fairly strong recommendations from other sources, I settled on Linode. I originally signed up with them over 7 months ago to handle this project and through fits and starts have redeployed a number of VPS’s with them in that time. Their dashboard is excellent, the StackScript functionality is great, and their uptime has been fantastic.

I should mention that I’ve chosen to house my VPS in their London hosting facility. This decision was made due to issues that users were reporting in their US hosting facilities that did impact uptime.

Deployment
There are a number of hosting options provided with Linode. For the sake of price, I have opted for the cheapest hosting option they provide — The Linode 512. This is offered for $19.95/mth and hardware, storage and bandwidth allocation is more than adequate for development purposes.

Linode (and I’m sure others) provides the ability to customize server deployments through StackScripts. Once you have established the hardware specifications through the plan you selected, you now have the ability to choose a particular Linux distribution. For this project, I am going to use a Ubuntu LTS release which is currently Ubuntu 10.04 LTS 64-bit.

StackScript
Once my StackScript is done, I will hopefully remember to update this post with a link to the published version.

As the root user is created as you are deploying your VPS, the StackScript will run under that user. From a unix security standpoint, it is highly recommended that you create an alternate user and deny SSH capabilities to the root user. First, add the alternate user to the sudoers file and then disable root login. More on that later.

It is helpful to include the first published StackScript. It contains a number of useful functions that you can call that will help setup your VPS. I will refer to those functions throughout this post.

source  # StackScript Bash Library
source  # Excellent helper script by nigma

system_update
goodstuff

In order for the installations below to complete, the universe source must be added to aptitude.

sed -i 's/^#\(.*deb.*\) universe/\1 universe/' /etc/apt/sources.list

Let’s start with some essential Linux tools. Since the StackScript is running as root, you will notice that sudo is conspicuously absent during most of the aptitude commands. It has to be installed first!

apt-get -y install build-essential gcc sudo screen htop locate logcheck logcheck-database logwatch

For security, the excellent fail2ban and Uncomplicated Firewall provide a strong foundation for analyzing web traffic and simplifying the firewall configuration on a Linux server. For a more in-depth source on securing a Ubuntu server, check out this article.

apt-get -y fail2ban ufw

Since Python is the language of choice, a number of additional packages and dependencies are required both for Django and other packages that will be installed. For instance, the software-properties dependency is a requirement for git. I am also favouring the pip package manager over easy_install as it will automatically handle dependency installation.

apt-get -y install python python-dev python-setuptools python-software-properties python-pip
...
pip install --upgrade pip
pip install --upgrade virtualenv

Mercurial is required for the Django non-relational database release (django-nonrel) so add it for the hg commands to work.

apt-get -y install mercurial meld

To compartmentalize the Django environment, I am going to deploy all of the python packages (except pip and virtualenv itself) in a virtualenv. The benefit to this approach is that it will enable testing different versions easier as each Django project can contain separate package versions.

Memcached is intended to alleviate some of the database dependencies for dynamic web applications. I am going to deploy it using apt and pip rather than the manual download and extraction process.

apt-get -y install memcached
...
pip install python-memcached

Rather than stick with the standard Apache/mod-wsgi route for hosting Django, I thought I would check out Nginx and Green Unicorn (gunicorn) as an alternative.

apt-get -y install nginx
...
pip install gunicorn

I have setup Nginx and gunicorn using the information available in this post. The only addition I made was based on re-loading project files when they have been modified.

This post recommended added a pid file argument to the gunicorn_django call in the script. So, this is the full shell script that I am using:

#!/bin/bash
set -e
cd /path/to/project
source ../bin/activate
exec /path/to/bin/gunicorn_django --workers 2 --log-level=debug --log-file=/var/log/gunicorn/site.log --pid=/tmp/gunicorn.pid --daemon

The –daemon argument is essential for Upstart to monitor the service, and the –pid argument allows you to reload the application via the following command:

sudo kill -HUP `cat /tmp/gunicorn.pid`

Version control is essential to any project. As I mentioned in my first post, I am opting for Git as it is what all the cool kids are using these days. Instructions on installing Git on Ubuntu is available in the official documentation).

apt-get -y install git-core gitosis

Now onto Django. As in the memcached example, a version for Django is specified as part of the StackScript deployment. This is to ensure that any future servers will always have the same version.

Since I am using the django-nonrel release, I am following these instructions on the installation of django-nonrel, djangotoolbox and the mongodb-engine packages.

hg clone http://bitbucket.org/wkornewald/django-nonrel
cd django-nonrel && python setup.py install

hg clone http://bitbucket.org/wkornewald/djangotoolbox
cd djangotoolbox && python setup.py install

git clone https://github.com/django-mongodb-engine/mongodb-engine
cd mongodb-engine && python setup.py install

One of the reasons I selected Django over other Python web frameworks (such as Flask, Bottle etc.) is the abundant packages that are available to extend core functionality. Walking through excellent resources such as Django Packages and others), I have opted to add the following as they will hopefully speed up the development process.

pip install south
pip install geopy
pip install django-social-auth
pip install django-easy-maps
pip install django-annoying
pip install PyCrypto
pip install Fabric

I ran into issues with django-debug-toolbar on pages not defined in the urls.py file. This was addressed in the latest version (0.9.0-dev at the time of writing) so rather than use pip, I cloned it from git.

git clone https://github.com/django-debug-toolbar/django-debug-toolbar
cd django-debug-toolbar && python setup.py install

I will elaborate on these specific packages in a later post.

Now on to the database components. As I said in my first post, I am going to try and use MongoDB with Django. This is probably one of the riskiest decisions that I have made for this project as I am not able to find a lot of success success stories on making Django and MongoDB work well together. I am particularly concerned about the potential loss of the admin tools in Django as it is one of the most compelling reasons for choosing it in the first place.

apt-get -y install mongodb

Clean up
I have focused mainly on the packages that I am installing as part of the VPS build. However, as I stated earlier, there are a lot of other steps to take when building a server that are essential.

Here are some suggestions (once I publish the StackScript, it will contain code to do most of the following):

  • Setup bash as the default shell
  • Set a hostname, proper timezone and locales on the server
  • Create an additional user and add them to the sudoers file
  • Disable root login via SSH
  • Use the Uncomplicated Firewall (ufw) to only allow the absolutely essential ports (80, 443, 22) and make sure the firewall is enabled
  • Finally, always update and patch your essential server components!

In my next post I am going to dive into the Nginx, gunicorn, Django and memcached configuration before starting to work on the project.

Using Git and Setting up Django

Now that the server is setup and Nginx is handling inbound requests and proxying them to gunicorn/Django it is high time to setup the first git repository and start configuring your Django environment.

Git
I have been using SVN for years and possess a passing familiarity with CVS but I decided to give Git a shot for this particular project. As I mentioned earlier, all the cool kids are using it and more often than not, I find myself browsing through OSS repositories on there all the time.

It is fairly straightforward to get going. Once I setup a paid account (the $7/mth plan), I followed the instructions on how to create the SSH key, setup Git on the server and create my first repository. Once this was done, I committed the root of the Django project folder (with README) into the repository.

Now onto configuring Django.

django-admin.py
I noticed fairly early on that the django-admin.py command was not working correctly. I did some searching around and found the following guide which had a helpful suggestion about adding the DJANGO_SETTINGS_MODULE environment variable to the activate script for virtualenv.

I ran the following commands in the project folder (outside of the virtualenv, replace myproject with your project name) to add this environment variable:

ln -s `pwd` ../lib/python2.6/site-packages/`basename \`pwd\``
export DJANGO_SETTINGS_MODULE=myproject.settings
echo "!!" >> ../bin/activate

Database
As I said in my first post, I am going to try and use MongoDB as the backend database for Django using django-nonrel, djangotoolbox and mongodb-engine.

Using this helpful tutorial as a guide,

Memcached
Using the helpful information provided in this post, I have added the following to the Django settings.py file:

CACHE_BACKEND = 'memcached://127.0.0.1:11211/'

And in the middleware classes section (at the top and bottom):

'django.middleware.cache.UpdateCacheMiddleware',
...
'django.middleware.cache.FetchFromCacheMiddleware',

South
Update: It looks like South does not work with either MongoDB or django-nonrel and may not be required anyway. As soon as I add it to the INSTALLED_APPS section Django will not start.

South improves on the shipped Django schema migration capabilities. Again, with MongoDB, I’m not sure how this particular component will work.

According to the installation guide, you just need to add ‘south’ to the INSTALLED_APPS section (at the very bottom).

django-debug-toolbar
Once I installed the 0.9.0-dev release, everything went fine. There is an issue with 0.8.5 wherein in will display an error on pages that are not defined in the urls.py file.

The django-debug-toolbar is an excellent tool for Django developers. It will display helpful information to a specified address (in the browser) that will help in debugging issues during development.

You need to add the following to the middleware section in your settings.py file:

'debug_toolbar.middleware.DebugToolbarMiddleware',

Just make sure that the memcached FetchFromCacheMiddleware class is left at the bottom of the list.

Then specify an INTERNAL_IPS directive that includes the IP addresses for all of the machines that you are using for development.

Finally, add ‘debug_toolbar’ to the INSTALLED_APPS section.

django-social-auth
The Django-SocialAuth plugin enables authentication via third party providers such as Facebook, Google, Twitter etc.

‘social_auth’ needs to be added to the INSTALLED_APPS section in settings.py.

AUTHENTICATION_BACKENDS need to be specified. I added all of the examples in the install documentation:

AUTHENTICATION_BACKENDS = (
    'social_auth.backends.twitter.TwitterBackend',
    'social_auth.backends.facebook.FacebookBackend',
    'social_auth.backends.google.GoogleOAuthBackend',
    'social_auth.backends.google.GoogleOAuth2Backend',
    'social_auth.backends.google.GoogleBackend',
    'social_auth.backends.yahoo.YahooBackend',
    'social_auth.backends.contrib.linkedin.LinkedinBackend',
    #'social_auth.backends.contrib.LiveJournalBackend',
    'social_auth.backends.contrib.orkut.OrkutBackend',
    'social_auth.backends.contrib.FoursquareBackend',
    'social_auth.backends.OpenIDBackend',
    'django.contrib.auth.backends.ModelBackend',
)

Watch out for issues in the documentation. I have to leave the LiveJournalBackend line commented out and remove the orkut reference in the FoursquareBackend line.

Again, using the documentation, I have added the following to the settings.py file as well (with the keys specified):

TWITTER_CONSUMER_KEY   = ''
TWITTER_CONSUMER_SECRET      = ''
FACEBOOK_APP_ID              = ''
FACEBOOK_API_SECRET          = ''
LINKEDIN_CONSUMER_KEY        = ''
LINKEDIN_CONSUMER_SECRET     = ''
ORKUT_CONSUMER_KEY           = ''
ORKUT_CONSUMER_SECRET        = ''
GOOGLE_CONSUMER_KEY          = ''
GOOGLE_CONSUMER_SECRET       = ''
GOOGLE_OAUTH2_CLIENT_KEY     = ''
GOOGLE_OAUTH2_CLIENT_SECRET  = ''
FOURSQUARE_CONSUMER_KEY      = ''
FOURSQUARE_CONSUMER_SECRET   = ''

LOGIN_URL          = '/login-form/'
LOGIN_REDIRECT_URL = '/logged-in/'
LOGIN_ERROR_URL    = '/login-error/'

SOCIAL_AUTH_ERROR_KEY = 'social_errors'
SOCIAL_AUTH_COMPLETE_URL_NAME  = 'complete'
SOCIAL_AUTH_ASSOCIATE_URL_NAME = 'associate_complete'
SOCIAL_AUTH_DEFAULT_USERNAME = 'new_social_auth_user'
SOCIAL_AUTH_EXTRA_DATA = False
SOCIAL_AUTH_EXPIRATION = 'expires'

Finally, add the following to the urls.py file:

url(r'', include('social_auth.urls')),

The Social_Auth section in the Django admin will fail with the following message when trying to access Associations, Nonces or User social auths.

Exception Type:	TemplateSyntaxError
Exception Value:
Caught DatabaseError while rendering: This query is not supported by the database.
Exception Location:	build/bdist.linux-i686/egg/djangotoolbox/db/basecompiler.py in check_query, line 272

Not surprising as it is most likely attempting a join behind the scenes. I have forked the social_auth repository on github for fun to see if I can figure it out. For now, I will leave it enabled to see if authentication works anyway.

django-easy-maps
django-easy-maps makes deploying Google Maps much easier in Django projects. As I plan on using them in this web application, I thought I would check out this plugin.

According to the documentation you need to add ‘easy_maps’ to INSTALLED_APPS and then specify your Google key in EASY_MAPS_GOOGLE_KEY in settings.py.

Admin, views.py and urls.py
Since we are using MongoDB, the SITE_ID in settings.py must be an ObjectID string or else the following error is issued:

AutoField (default primary key) values must be strings representing an ObjectId on MongoDB (got u’1′ instead). Please make sure your SITE_ID contains a valid ObjectId string.

I added the following to the INSTALLED_APPS sections:

'django_mongodb_engine',
'djangotoolbox',

Adding djangotoolbox here is not just recommended it is required if you plan on creating or editing users in the Django admin panel. The following command is supposed to return the ObjectID value:

python manage.py tellsiteid

Unfortunately, this did not work for me so I went over into the MongoDB engine and found the ObjectID with the following (dbname is the one specified in settings.py):

/usr/bin/mongo dbname
db.django_site.find()

This will return on ObjectID which you can add to your settings.py file by commenting out the existing value and replace it with:

SITE_ID = u'OBJECTIDVALUE'

Now, you will need to enable the admin module, create your first views.py file and make a few more changes to the urls.py file to have a working site.

To enable the admin module uncomment the following in settings.py:

'django.contrib.admin',

And uncomment the following in urls.py:

from django.contrib import admin
admin.autodiscover()
...
url(r'^admin/', include(admin.site.urls)),

Now, create a views.py file in the root of your project folder and add the following:

from django import forms
from django.shortcuts import render_to_response
from django.http import HttpResponse
from django.http import HttpResponseRedirect
from django.http import Http404
from django.core.mail import send_mail
from django.contrib import auth
from django.contrib.auth.decorators import login_required
from django.template import RequestContext
from django.core.urlresolvers import reverse

def home(request):
    return render_to_response('home.html', locals(), context_instance=RequestContext(request))

render_to_response is a Django shortcut which will render a defined template (home.html) and pass in context variables (locals() and RequestContext(request)). This is why I included the import statements above.

Now, where does it find home.html? You need to specify a templates folder in settings.py in the TEMPLATE_DIRS section. Make sure you use a fully qualified path (without trailing space). In this case, I created a templates folder under the project path (in the same folder as urls.py, views.py and settings.py). Create a home.html file in that folder.

That’s it! Reload gunicorn using the following command:

kill -HUP `cat /tmp/gunicorn.pid`

You should now see a functioning page!

Commit and syncdb
Finally, commit the recent changes to the Git repository and synchronize the Django database.

git add -A
git commit -m 'Enter comment here'

In order to run the syncdb command, you need to be in the virtualenv for the path to be setup correctly. So:

source bin/activate
python manage.py syncdb