Archive for the ‘Programming’ Category

Outlook custom actions

February 9, 2011

Recently I wanted to write a small Outlook plugin to solve a problem I was having (mainly dealing with support). I looked at the various options of writing code that interacts with Outlook. And here’s a quick, non comprehensive list:
1. VBA – not too bad, but unfortunately not portable – it’s not possible to easily export, import, install, distribute this code (Outlook is not advanced as Excel). So with VBA you can write your own macros or import .bas modules or VBA code manually
2. Outlook plugins – this uses the Office Extensibility interface – which all the Office plugins share. This is pretty good, as it gives you access to the full object model
3. Custom actions – this uses the rules wizard (in the actions screen, you can select to perform a custom action). I was puzzled by this, and even more intrigued when I couldn’t find any documentation on this. It seems Microsoft have deprecated the custom actions. They still support them, but they have removed both the documentation and the header files.
They want to encourage developers to write Outlook plugins instead.


Bloomberg by Bloomberg

October 14, 2008

A while ago I finished reading Bloomberg by Bloomberg. It’s a book about the history of Bloomberg narrated by Michael Bloomberg himself.
It’s a frustrating book, sometime I felt I don’t want to read it anymore. The main problem is Mike’s style that seems a hard sell of his company and himself ultimately. It’s also full of pride reflected in the style, well deserved pride, but still, frustrating when reading it in a book. People want real content not marketing blah blah.
Anyway, he’s making some good points about software development in the financial services. And one thing he mentions is the “agile” methodology that they used at Bloomberg, emphasizing shipping products on time, even if they’re not perfect, even if they have to cut down features, and from there on, working in quick iterations.
It’s how the banking industry in general works. First because the business wants tangible results. Second because bonuses are paid annually.

It’s the model to be followed by all entrepreneurs I think. It was the same model followed by Microsoft.

OO in Javascript

October 10, 2008 = function(id)
{ = "object"; = id;

That’s a “class”. To create an instance of that, you do the usual:

  var myObject = new;

In order to emulate an inheritance hierarchy, Javascript has this special property called prototype. You can set the prototype of an object to any other object. Object, not class. That’s a bit unusual from an OO perspective.
So it’s not proper OO, it’s an emulation of OO. It relies on the way the Javascript engine resolves properties and methods.
When you do: object.property1 or object.method1() Javascript looks in the object and sees if there is a local property and local method. If there isn’t one, it checks the prototype, then the prototype’s prototype until either the name is resolved or the chain ends.
It takes a bit of getting used to, but it’s still a very useful way of reusing code.

Javascript eval

September 4, 2008

I’ve seen a strange thing today. Mind you, things are strange just because we don’t understand them.

I was doing an eval on a JSON object:

someStr = “{ \”property\”:\”value\”};
obj = eval(someStr);

Firefox’ Javascript engine threw an exception at this point: Syntax error: invalid label. What the heck?

I looked it up online and found the solution on a blog. Replace the eval expression with:

obj = eval(“(” + someStr + “)”);

This looks weird and unexpected. In Python you can easily do an eval(someStr) and that returns a dictionary. No problem. So why the brackets in Javascript?

I started looking up the answer, but everyone was just saying that the brackets are there to work around an ambiguity in Javascript language. But what?!?

And then I found the first clue in MSDN:

Note The extra parentheses are used make eval unconditionally treat the source input like an expression. This is especially important for objects. If you try to call eval with a string containing JSON text that defines an object, such as the string “{}” (meaning an empty object), then it simply returns undefined as the parsed result. The parentheses force the JavaScript parser to see the top-level curly braces as the literal notation for an Object instance rather than, say, curly braces defining a statement block. Incidentally, the same problem does not occur if the top-level item is an array, as in eval(“[1,2,3]”). For sake of uniformity, however, JSON text should always be surrounded with parentheses prior to calling eval so that there is no ambiguity about how to interpret the source.

So, if you don’t add brackets, the Javascript engine things that the curly braces indicate a bunch of code, as in:

var a = 10;

as opposed to a JSON object. In other words, the brackets forces an expression, rather than code. And that’s because:

( { var a = 10; } )

is invalid in Javascript, whereas surrounding an expression in brackets is OK.

Basic Monte Carlo

August 22, 2008

I’ve started looking at Monte Carlo methods recently, just for fun. Monte Carlo is a method that attempts to give a probabilistic solution to complex problems. Its premise is that by using random numbers, you can get an idea of how the solution looks like.
The most basic example that I’ve tried is to calculate the value of PI. The story goes like this…

You take a circle circumscribed by a square. Something like a darts board. Imagine now that you throw darts at it randomly. Some of the darts will land within the circle, some others will land in the square but outside the circle. The probabilities of these 2 events are proportional to the areas of the respective areas.

Here’s a good explanation of the theory and the formula:

A quick an dirty implementation that I came up with in 10 minutes in Python:

import random
from math import sqrt

radius = 100
hits_total = 0
hits_within = 0
decimals = 4
good_enough_pi = 3.1415926535897932384
approximate_pi = 0

def calculate():
    x = random.randint(0, radius)
    y = random.randint(0, radius)
    global hits_total, hits_within
    hits_total += 1
    if sqrt(x*x + y*y) <= radius:
    hits_within += 1

while(int((approximate_pi - good_enough_pi)
                  * (10 ** decimals)) != 0):
    approximate_pi = 4 * ( float(hits_within) / float(hits_total) )

print "Calculating PI with 2 decimals took %d tries" % hits_total
print "PI~=%s" % str(approximate_pi)

When running this algo, I’ve seen that perhaps 3 out 4 times it runs it completes really quickly for 3/4 decimals. The other one time it takes ages and it doesn’t converge. So in that sense is not reliable. It does converge but it can take millions of tries.
Not to mention trying to get more decimals.

Here I’m just comparing the approximate PI with a pre-calculated PI, but imagine applying Monte Carlo to problems when you can’t pre-calculate (real problems in other words). Well, the problem is that it is difficult to know when your result is good enough and you can stop.

Maybe you can use another Monte Carlo method to determine the probability of a Monte Carlo method to be reliable and converge quickly. Applying Monte Carlo methods recursively…

As I said, basic stuff…

Python and accidental function overloading

August 21, 2008

Today I discovered one dark side of Python. If you accidentally have two functions with the same name, then the last one is considered, and the first one is like it never existed. There is no overloading based on parameters (I was about to say types :-). Python will just call the second function (second as it appears in the source code). If the number of parameters don’t match, you will get a runtime error, but if the number of parameters match, it’ll be executed silently.
My accidental overloading resulted from the following sequence of events:

1. I had a function DoOperation
2. At a later date I realized I need to break that into 2
where DoOperation still does the main processing, but DoOperationAck sends the result or acknowledgement to whoever is interested in the result.
The only problem is I had forgotten to rename, so I ended up with 2 DoOperation(s).

While this was a programming defect I’ve introduced, Python surely didn’t make it easy for me to catch that.


any variant?

August 11, 2008

Array in C and std::vector in C++ provide contiguous memory for elements of the same type (and implicitly the same size). These two properties are extremely important and because of them, one can index in an array/vector like this:

arr[4] = 10;

That means:
a. lookup the forth element in the array – that’s a arr + 4 * size_of_the_element
b. set the found location to the new value 10

As such, the size of the element needs to be the same for all elements for this to work.
Needless to say this indexing is an important optimization as it’s a quick lookup operation.

However, many times the need for working with heterogenous elements (different type/size) in the same array/vector arises. While the array/vector doesn’t support that natively, two solutions exist that allow us to do just that.

The first is using unions. Unions are C/C++ data structures that can hold different data types with the size of the union being the size of the biggest element. Example:

union U
    char c;
    int  i;
    void* ptr;

The size of the structure will be the max(sizeof(char), sizeof(i), sizeof(void*)). On 32 bit platforms that will be 4 bytes.
This approach allows you to set any of the components of the union individually but internally you will be referring to the same memory location.
Unions are used to implement the VARIANT type in Windows and other variant types. I remember I implemented a CVariant class about 6-7 years ago when I wrote a GUI library/framework for WinCE.
The catch here is that a union doesn’t tell you which component is ‘valid’, so type information needs to be stored in the Variant class too (which loses some of that static-type compiler ‘intelligence’). You will have to programmatically check the type and do stuff depending on the type.

The other approach is to use pointers. Pointers to objects have the same size irrespective of the type of the object. However, these pointers still need to have the same type in order to use them in containers. A standard C approach would be to use void*, but that’s just too error prone and very much not-C++.
In C++, an approach would be to have a base class and then a templated class that derives from it. The base class cannot be templated, as that would generate different types which would make it impossible to use in an array/vector. For example:

class base
    virtual base* clone() = 0;

template <class T>
class deriv : public base
     deriv(const T& t);
     virtual deriv<T>* clone() = 0;

When you construct such a pointer, you use the deriv class with the actual type, however, when you place it in the container, you place it as a base pointer. As such you’re container will look like this:

std::vector<base*> container;

Alternatively, you can just use boost::any which implements this scheme to give you any type. Unlike the union solution, this second solution allocates instances on the heap/free store. The union can store instances on the stack or on the heap. The limitation that comes with that is that the union can only hold the types that are in the union (in our example: char, int and void* only). boost::any on the other side can host any type.


August 7, 2008

One thing that caught me out recently (and not just me from what google tells me) is Python’s treating of empty data structures as false. A sort of NULL pointers in C. Coming from a C/C++ background, I love writing code like this:

def func(param):
if param:
print “param is not empty”
print “param is empty”

param can be an empty dictionary {}, an empty list [], an empty tuple, or an empty string “”. I love this syntax.
Python treats these empty data structures as False.
However, Python also treats None like False. None is possibly closer to what NULL is in C.

The only problem is when you write code where you need to distinguish between 3 states:
a. non-empty data structure
b. empty data structure
c. not found (or any other error)

Solving this sort of puzzles doesn’t map very well to a boolean True/False representation. Below there’s some code that I wrote recently and got me into trouble. To preserve anonymity, I’ve changed the names of the main actors 🙂


mobile_number = person.get(“mobile_number”, None)
if mobile_number:
print “Mobile number is:”, mobile_number
print “No mobile number?!”


Assuming person is a dictionary, this line of code tries to get the value for the “mobile_number” key, or, if the key doesn’t exist, return None. Pretty decent, until you realize that you cannot distinguish between the case where the mobile_number key is missing or the mobile_number is the empty string.

Maybe in this case it’s not important to distinguish between the None value or the empty string. But you can surely imagine a case where an empty string would be a valid value, but not a missing key.

The solution is to explicitly check for None as in the example below:


mobile_number = person.get(“mobile_number”, None)
if mobile_number is None:
print “Mobile number is missing”
elif mobile_number:
print “Mobile number is:”, mobile_number
print “Mobile number is empty”


C++ Smart pointers

July 30, 2008

What are pointers? A way of indirectly refer a memory location together with a hint about the type, a hint that the compiler uses to restrict operations on the memory location. They can point to compiler controlled memory (e.g. stack) or programmer controlled memory (dynamically allocated on heap/free store).
The main benefit offered by dynamically allocated memory is that they allow programmers to control the scope of a certain memory location, be it object/POD type/raw memory. In contrast, the scope of automatic or stack based objects/memory is maintained by the compiler and the scope is very clear (basically the inner most {} block).
With the benefit comes the main drawback … that it’s the programmer’s responsibility to allocate and de-allocate. And since that’s not always trivial, in C++ (which doesn’t have garbage collection), smart people have invented smart pointer.

A smart pointer is composed of a part that’s dynamically allocated (the pointer) and a part that’s on the stack which gives it the ‘smartness’. When the part on the stack goes out of scope (controlled by the compiler), it drags out of scope the dynamic part as well. Its purpose is to avoid leaking the memory pointed by the pointer in the ubiquitous scenario where the programmer doesn’t cover all the corner cases (and the corner cases increased dramatically with the introduction of exceptions in C++).

A C++ smart pointer looks something like:

class smartPtr
   smartPtr(pointer* ptr)
     store ptr
     take ownership of the pointed memory/object
     deallocate ptr (calls destructors of objects)

Taking the ownership of the allocated memory is the how the smart pointer operates. However, there are problems: What if two smart pointers try to get the ownership of the same allocated memory. Well, that’s bad and it should not be allowed. There are two main cases when this scenario can happen:

1. The programmer explicitly creates two smart pointers and initializes them with the same pointer
This is obviously a programming error and not much we can do about it except educating programmers and firing the ones that fail to learn.

2. The compiler creates copy of the smartPtr (via copy constructor, or operator=)
The compiler creates such copies when the programmer explicitly calls these two operations or in some cases where it needs to create implicit copies (returning by value, passing parameters by value).
In this case, the smart pointer implementer CAN do something about it. And there are 3 main things he can do:

a. Disallow copy constructor/operator= for the smart pointers
This is the safest option. However, it’s also the most limiting and users will soon shun the implementation because of that.

b. Transfer ownership to the new smart pointer from the old
As such, even if you have multiple copies, only one actually has the ownership. This work nicely when returning values from functions, as you want to transfer the ownership in that case anyway. It doesn’t work well with passing parameters to functions as the programmer needs to remember that after passing it by value, she can’t do anything else with the smart pointer. It’s also impossible to use in STL containers as STL is allowed to make copies of objects it holds in containers. Such a smart pointer will lose the ownership of the pointed memory in such operations.

An example of such approach is the std::auto_ptr part of STL.

c. Maintain a reference count
This strategy is must better as the smart pointer tracks how many pointers point to the allocated memory/object. It only deallocates when the last smart pointer goes out of scope. The only problem with this approach is when you end up with circular dependencies between reference counted objects. When there is such a circular dependency, none of the memory allocated in the circular dependency is de-allocated. These circular dependencies are not uncommon.

An example of such approach is the boost::shared_ptr part of boost.

A solution – part III (final)

July 22, 2008

So adding everything up…

We’re going to read the extents one by one into memory. As we read an extent, we’ll break it up into a set of non-overlapping extents, each holding a counter of how many overlapping extents contain it.
Having non-overlapping extents means that in fact we’re ordering the extents. In the previous example, we have A < C < B < D. This ordering allows us to use data structures that order their elements, a binary tree for instance. Each ‘side’ of the extent becomes a node in the binary tree. And each node can store the number of extents including the extent either to its right or to its left. To be able to use lower_bound, we’ll store the number of extents including the extent to the left of the node value.

As such, in our example, we’ll have a binary tree like this:

 /   \
A     B

And the value they store are:
node_value(A) = 0 (there are no extents to the left of A, in other words in (-inifinity, A)
node_value(C) = 1 (there is only one extent overlapping the [A, C) extent
node_value(B) = 2 (there are 2 extents overlapping [C, B) extent)
node_value(D) = 1 (…)

For our extents, we will have to build a data structure like this. Let’s assume we have a number x that we want to check against our extents. Let’s assume that we have

A < C < x < B < D

The result in this case should be 2. Remember that our overlapping extents are [A, B) and [C, D) where A < C < B < D.
That means that x belongs in both [A, B) and [C, D) extent.
Let’s see how to get this result with our data structure. Let’s see what happens if we call lower_bound(x) on our data structure. From MSDN: lower_bound “Returns an iterator to the first element in a map with a key value that is equal to or greater than that of a specified key.”
So, calling lower_bound(x) will return an iterator pointing to B. If we get the value associated with node B, that is node_value(B), we get our result: 2.

I leave it as an exercise for you to write the code in your favourite language.

The lower_bound algorithm has logarithmic complexity, which is within our constraints. It might be possible to improve this solution by using hash tables which have amortized constant complexity. However, there’s one problem, hash tables are not ordered as binary trees are, as such, it is not possible to get from a node B to its immediate neighbours C and D for example. The solution presented here relies on that, as whenever a new extent is inserted in the binary tree, it needs to update the ‘overlapping counters’ of all the nodes within the extent.

If you find a better solution, please let me know.