Underscore.js extend vs. jQuery extend

April 2, 2013

Or more succintly: _.extend vs. $.extend.

They both try the same thing and that is copy the keys from a series of objects into the first object. They extend the first object with the properties in the other objects, like this:

    var a = { k0 : 0 },
        b = { k1 : 1, k2 : 2 };
        c = { k3 : 3, k4 : 4 };
    _.extend(a, b, c);

After the extend, a will be:

       k0 : 0,
       k1 : 1,
       k2 : 2,
       k3 : 3,
       k4 : 4

It can be used to extend objects, and it commonly used for “inheritance” : extending one object’s prototype with the properties and functions of another object. Here’s an example from backbone.js, which extends the Model’s prototype:

   _.extend(Model.prototype, Events, { 
             //more implementation here...

One major difference between underscore.js extend and jquery extend is in the way they deal with sub-objects or nested objects. Jquery does a deep copy of sub-objects, while underscore.js does a shallow copy. You need to keep this in mind as problems caused by shallow copies can be hard to troubleshoot.

Have a look at this code:

    var a = {}, b = {},
        c = { k1 : 1,
              kobj : { k2 : 2 }
    _.extend(a, c);
    _.extend(b, c);
    a.kobj.k2 = 10;
    console.log("a.kobj.k2:", a.kobj.k2);
    console.log("c.kobj.k2:", c.kobj.k2);

Because _.extend does a shallow copy, both a.kobj and c.kobj point to the same object. So changing one, affects the other. The code above shows 10 for both a.kobj.k2 and c.kobj.k2.

Replace _.extend with $.extend and the code does a deep copy and the code above displays a.kobj.k2: 10 and c.kobj.k2: 2.


A funny story – featuring UTC offset

March 28, 2013

I don’t normally write about mistakes I make.

You know mistakes! You’re supposed to accept them, learn from them and never talk about them, especially online where your customers can read about them. No! You’re supposed to project an image of self-confidence, invulnerability and super human coding abilities.

Well, this mistake is funny and relatively harmless. I just have to write about it.

I’ve been writing before about the implementation of our CRM sync service. With every refresh, that sync service sends two things to the server: the timestamp of the last sync (to only get the delta from there) and the utc offset of the client (provided by the browser). This UTC offset is only reliable for getting the utc offset for that session, not to be reused as the UTC offset in general, and that’s because:

  1. The browser’s UTC offset is a naive offset, it’s not a timezone. In particular, it doesn’t know about daylight saving (although it applies it if in effect at the time of the request)
  2. The user might be travelling and using the service from a hotel in a totally different timezone

Anyways. We only use it in order to associate the dates and times with words like Today, Yesterday, etc. in the current session.

On the server, which uses python by the way, I use to have this naive handling of the UTC offset.

        utcoffset = int( request.GET.get('utcoffset', 0) )
        utcoffset = 0

The defaulting to 0 should never happen, as this parameter is always sent and the UTC offset, populated in javascript, should always be available in the browser. The try/except was more of a “just in case”.

So one day, I’ve decided that try/except doesn’t make sense, for the reasons highlighted above. Further more, I didn’t want the exception to be swallowed, I wanted to know about it. Django has a nice feature where you get an email every time an exception is thrown and not handled.

So I took that out, and my code now looked like this:

    utcoffset = int( request.GET.get('utcoffset', 0) )

Great, I thought. However, in my ignorance, I totally forgot about half hour timezones. India for example has a 6.5 UTC offset and there are half-hour timezones in Canada and Australia.

I caught this one pretty quickly when someone from India used the website and passed in a 6.5 offset. That line started throwing and flooding me with emails for every failure. Now this sync service actually polls for updates, so you can imagine I got quite a few emails.

Luckily for me, the fix was straightforward:

    utcoffset = float( request.GET.get('utcoffset', 0) )

Using a float instead of an int.

I suppose the moral of the story is twofold:

  1. Never swallow exceptions as the code might end up doing something you have not intended (using UTC offset 0 for half-hour timezones) and you will not know about it to fix it
  2. Learn about timezones

django url rules and spaces

February 28, 2013

I’ve been caught out by the copy’n’paste programming style where you just copy snippets of code, test them and off you go, without too much thinking. It’s so appealing this copy’n’paste programming style. You’re essentially encapsulating complexity within that snippet, which you trust because you’re copying and pasting from a reference source or from a place in the code where the snippet “proved itself” to work.

The problem is of course that the new context you’re placing it in can be slightly different from the context where you’re copying from. Thus, bugs appear.

Recently I’ve been caught out by a simple django URL rule which I’ve copied from the django reference website. Here’s the rule:

(r’^operationstatus/(?P<status>\w+)$’, ‘myapp.backbone_view’)

This does not match statuses with spaces in them, e.g.:


You get a 404.

My quick solution was to change the regex to:

(r’^operationstatus/(?P<status>.*)$’, ‘myapp.backbone_view’)

which works as expected.

Paginate a Backbone.js collection

January 28, 2013

When you have too many results, you have to paginate, we all know that. With backbone.js there are different approaches, depending on whether you have all your data in the collection, or do you paginate “server-side” – that is, via calling a .fetch() on the collection every time you move to a new page, so essentially only storing one page at the time.
You could hold multiple pages, for example store page 1, currentPage – 1, currentPage, currentPage + 1, and the last page, in order to optimize the most common operations: move first, move previous, move next, move last.

In this article, I’m going to tackle a simpler scenario, when all the data is in the collection (in memory). No server round trips will be needed. In a subsequent article, I will build on this, to implement something more advanced. So let’s get started.

My first step was to enhance the collection so it can iterate over the selected page. I’ve added a so called “partialEach”, like .each, but only iterating over the given page.

Backbone.Collection.prototype.partialEach = function(offset, maxItemsPerPage, iterator, context) {
	for (var l = this.length; maxItemsPerPage !== 0 && offset < l; offset++) {
		var model = this.at(offset);
		if( model ) {
			iterator.call(context, model, offset, this);

   offset          - the offset within the colection of the element to start from (index of the first element on the page)
   maxItemsPerPage - the number of items per page
   iterator        - the callback, the function to call for each item (this will be used to render the element, or build the DOM or the html string for rendering)
   context         - a context to be passed back to the callback

Now, why do this, instead of a simple for loop, from offset to offset + maxItemsPerPage?
Because a simple pagination is generally not good enough. What if the user wants to filter the results and you have to paginate the filtered results? In that case, the for loop (offset to offset + maxItemsPerPage) doesn’t work anymore, as not all the items within that range will be included in the filter.

To support filters, I have modified the function above like this:

Backbone.Collection.prototype.partialEach = function(offset, maxItemsPerPage, iterator, context) {
	for (var l = this.length; maxItemsPerPage !== 0 && offset < l; offset++) {
		var model = this.at(offset);
		if( model && this.filterFunc( model, offset, this ) ) {
			iterator.call(context, model, offset, this);

A filterFunc is just a function that takes a model and returns true or false. It has to be set on the Backbone.Collection.prototype in the similar way and then it can use a filterObject you can set on each individual collection, with the details of the actual search.

Now, a view that wants the items for a particular page needs to calculate the offset.
So how can a view calculate it?

For the scenario where there are no filters, it’s quite easy:

    offset = pageNumber * itemsPerPage;

with pageNumber starting from 0 to totalPages – 1.

But when you have a filter, it is not that straightforward. For this scenario, I will introduce the concept of a pageCache.
A pageCache will store for each page, the first index of the items on that page. This is the index where the search (filtering) should start from, it doesn’t mean that the first item will be included in the filter.

So, for the first page, pageCache will have:

   pageCache = { 0 : 0 }

First page starts (page number 0), starts from index 0. This will be true for all filters.
Rather than calculate all the others, we will be lazy here, for performance reasons, and only populate the pageCache as the user is searching.
Once we have the pageCache, we can calculate the offset as follows:

	getPageOffset: function(){
			return this.pageNumber * this.itemsPerPage;
		} else {
				this.pageCache = { 0 : 0 };
				return 0;
			} else {
				return this.pageCache[this.pageNumber];

This function is optimized for the scenario where the view/collection has no filter.
Then it gets the offset from the pageCache, it also builds the pageCache if it doesn’t exist.
With this offset, the view can then call into the partialEach to get the items it wants.

As for the lazy update, once the view iterates over the items, it keeps track of the lastIndex for each page, and then updates the pageCache:

	updatePageCache: function(pageNo, lastIndex){
			this.pageCache[pageNo + 1] = lastIndex + 1;

        lastIndex is the last index on the page that has just been displayed, therefore the iteration on next page (pageNo + 1) will start from lastIndex + 1

This is just a rough implementation for you to get an idea. There are more exercises left for the user:

  • pageCache needs to be reset when the filter changes
  • determining whether there are more items that match the filter is not implemented (this needs to be done to know whether to show the Next button or not)
  • how do you deal with new items being inserted in the collection? The pagination technique above recovers on the second pass only (this might be sufficient)
  • how do you optimize operations like Move Last, which would require to iterate the whole list if there is a filter on and the pageCache is not populated


October 19, 2012

I’ve played a bit (a lot!) with backbone.js recently and it’s a great little framework, I love it. It is so easy to input javascript and get spaghetti that backbone.js, although very simple, helps quite a bit. It helps by giving you an (sort of) MVC structure to your code (a backbone!) and a REST-ful API to persist the app to the server.

While I love it, again, it’s very simplistic and recently I discovered one nasty bug – which I tried to convince the developers of backbone.js that it’s a problem, without much success: https://github.com/documentcloud/backbone/issues/1640

The issue is the collections keep a key/value object/dictionary mapping ids to models. It is called _byId. It is used by .get(id) on a collection, but it’s also used for internal features – like detection of duplicate models when you do a collection.add(). It’s fair to say it’s there to optimize the lookup of a model by id in the collection – a very common operation. Conceptually, it can be argued that duplicating the id-to-model relationship data is a recipe for disaster (duplicated in _byId and within the model), but I am not a purist myself, so I don’t have a problem with that.

In backbone.js, the id is supposed to represent the id of the object on the server. So an id is a unique identifiers across all sessions and across clients & server, and it does mean your model is persisted on the server. Contrast this with a cid, which is a client id, just there for the convenience of being able to refer to objects while they’re not persisted (they don’t have an id) and populated always for models.

Now the problem with _byId is the way it gets updated. When a model is saved, the request goes to the server (via ajax / REST api) and the server persists the model and returns the id. Upon receiving the id, backbone.js automatically updates the model with the id. It also uses a trigger/event on the model to update the collection’s _byId. This is still not a problem.

What is a problem is that the user can turn off all events, by doing a save with { silent : true }. No events will be triggered and the _byId collection will not be updated.

Now this is a classic example of having an internal private data structure (optimization in this case): _byId, relying on an external public feature (events) which the user of the API can turn on/off. This is a big problem because it affects the consistency of the internal data and because this error is not detected early and the point of failure is removed from the root cause. The failures you get with this are failures to find models within the collection, failures to detect and prevent duplicates to be added to the collection. Needless to say it is time consuming to troubleshoot problems like these and this is exactly what I found.

In the end, due to this problem not being accepted as a problem, I had to fix it on my side. And what’s worse is that I had to put it on the client side and not in backbone.js – because I wanted to avoid branching off and having problems every time I want to upgrade to a new version. So, I had to update the _byId mapping myself, a very ugly hack and one that is bound to fail if _byId semantics change.

backbone.js folks, if you’re reading this, please reconsider and fix this issue 🙂

Install Django on Arvixe

March 31, 2012

Recently I had to spend a few good hours trying to figure out how to install Django on a shared host at Arvixe. I thought I’d document it for others and for my future reference, as I’m sure I’ll forget all this and don’t want to have to re-discover it.

It’s all done in a shell, so if you don’t have shell, you have to request it.

Step 1: Get python 2.7

mkdir temp
cd temp
wget http://www.python.org/ftp/python/2.7.2/Python-2.7.2.tgz
tar -xzf Python-2.7.2.tgz

Step 2: Compile and install Python

cd Python-2.7.2
make altinstall prefix=~ exec-prefix=~
cd ~/bin
ln -s python2.7 python

Add an alias to your .bashrc
vi .bashrc
add this line: alias python=’~/bin/python’

Login again, and check by running python -V
You should see: Python 2.7.2

Step 3: Install Django

Get DjangoX.X.tar.gz in temp
tar -xzf DjangoX.X.tar.gz
python setup.py install

At this point you can check it was installed correctly:
import django

Step 4: Install setuptools (dependency of MySQLdb at step 5)

Download setuptools-0.6c11-py2.7.egg from http://pypi.python.org/pypi/setuptools#downloads
Install with:
sh setuptools-0.6c11-py2.7.egg

Step 5: Install MySQLdb (mysql-python)

Download it MySQL-python-1.2.3.tar.gz from http://sourceforge.net/projects/mysql-python/files/mysql-python/1.2.3/
tar -xzf MySQL-python-1.2.3.tar.gz
cd MySQL-python-1.2.3
python setup.py build
python setup.py install

Step 6: Create a MySQL via the arvixe cpanel

This will be something like: _

Change the settings.py in django:
‘ENGINE’: ‘django.db.backends.mysql’
‘NAME’: _ (e.g. ‘mikep_mysqldb’)
‘USER’: your_arvixe_user,
‘PASSWORD’: your_arvixe_password,
‘HOST’: ”,
‘PORT’: ”

Step 7: Setup the cgi script

I’ve used this cgi script: https://code.djangoproject.com/attachment/ticket/2407/django.cgi
Place it in ~/public_html/cgi-bin

cd ~/public_html/cgi-bin
wget https://code.djangoproject.com/raw-attachment/ticket/2407/django.cgi

There are 3 lines you have to change:
1: #!/home//bin/python <- this is the new python you've just installed
95: sys.path.append("/home//djangorepo”)
96: os.environ[‘DJANGO_SETTINGS_MODULE’] = ‘djangoapp.settings’

The above assumes that you have installed your django site in djangorepo and your django app is called djangoapp.
In other words, you have something like this:
/home//djangorepo/djangoapp (in this you have manage.py, settings.py, etc.)

This script needs to be executable! If you don’t have, you’ll get HTTP 500 and you can’t see the Apache error log, so you’ll be stuck.

chmod 755 ~/public_html/cgi-bin/django.cgi

Step 8: Add the .htaccess

Place a file call .htaccess in ~/public_html

AddHandler cgi-script .cgi
RewriteEngine On
RewriteRule ^/(cgi-bin.*)$ /$1 [QSA,L,PT]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /cgi-bin/django.cgi/$1 [QSA,L]

That’s it! Point your browser to your url and enjoy!

Javascript OOP

March 18, 2012

Javascript does not have classes. It’s not an easy concept to grasp if you’re coming from OOP languages like Java, C++, C#, etc. Javascript works with objects.

var obj = {};

That’s an object.

Objects can be manipulated at any time, you can add properties, or functions (methods).

obj.name = “Object1”;
obj.method = function() { alert(this.name); }

Objects can also be created with the new operator, which is something that might ring a few bells for the Java, C++, etc. developers:

var person = new Person(“Alice”);

Person is a function like the following:

var Person = function(name) {
this.name = name;
this.showName = function() { alert(this.name); }

We can create as many persons as we’d like with the method above. One drawback is that the function showName is created every time, even if it doesn’t change. Enter prototype.

Prototype is just an object that every javascript object gets a reference to. If a name (property, method) cannot be resolved in the object, the prototype is also searched. So to avoid the previously mentioned drawback, we can re-write Person like this:

var Person = function(name) {
this.name = name;

Person.prototype.showName = function() { alert(this.name); }

You can add any methods like that and they all get inherited by every instance of Person you create with the new operator.

I’m back

March 4, 2012

I haven’t wrriten in more than a year. I’ve been very busy in my new job. I’m mostly working in Python now and doing a bit of Java. On the side, I’m working on a website, a pet project which I’m writing in django with jquery on the client side. I’m planning it as a one page javascript app. The first thing that puzzled me when starting with jquery was this weird $ sign everywhere. What is it?

I was used to $ prepending variable names from shell programming.

I was also used to $ from regular expressions

I was also remembering $ from that sign we’re all chasing up in this rat race.

In turns out there’s no mystery at all. $ is just another function – javascript allows functions names to start with $, _ or letters, but no numbers. So $ is a special function, also known as jquery. $ makes it more succint, especially if you’re trying to minimize the size of your javascript scripts.

Outlook custom actions

February 9, 2011

Recently I wanted to write a small Outlook plugin to solve a problem I was having (mainly dealing with support). I looked at the various options of writing code that interacts with Outlook. And here’s a quick, non comprehensive list:
1. VBA – not too bad, but unfortunately not portable – it’s not possible to easily export, import, install, distribute this code (Outlook is not advanced as Excel). So with VBA you can write your own macros or import .bas modules or VBA code manually
2. Outlook plugins – this uses the Office Extensibility interface – which all the Office plugins share. This is pretty good, as it gives you access to the full object model
3. Custom actions – this uses the rules wizard (in the actions screen, you can select to perform a custom action). I was puzzled by this, and even more intrigued when I couldn’t find any documentation on this. It seems Microsoft have deprecated the custom actions. They still support them, but they have removed both the documentation and the header files.
They want to encourage developers to write Outlook plugins instead.

C# Excel plugin getting disabled

November 15, 2010

Excel disables a plugin (I’m referring to COM and Automation addins here) in two ways:

1. Hard disable – when the plugin causes the host (Excel) to crash

In this case, the COM addin will be in the disabled list and the user will get a message next time it starts Excel asking whether the user still wants that addin or not.
The way Excel does this is by putting the plugin on a black list before calling the OnConnection method (IDTExtensibility2). If the method crashes the host, next time Excel starts up, it finds the offending plugin in the black list.
If the method returns fine, Excel removes it from the black list.
It’s difficult to crash the host from .net, but not impossible. Most of the disables in .net are however…

2. Soft disable – when the plugin returns a failure HRESULT from OnConnection

In .net that means that an exception escapes from OnConnection. The .net framework converts this into an HRESULT when the method returns.
In order to fix that, you need to look into the following areas:
– catch every exception in OnConnection (duh!)
– look at any members part of your class implementing IDTExtensibility2 (any constructor there which throws will generate this condition)
– also look at any static members that can potentially throw
– look at any problems loading dependent assemblies needed by your assembly (this can also cause this)
– if you haven’t written a wrapper for your plugin, and your plugin runs as the mscoree.dll plugin, then check if other similar plugin haven’t been disabled (Excel disables mscoree which disables all plugins running under that – check the MS website on how to write a wrapper and avoid this problem)