This post was written by Federico Pereiro, JavaScript Developer for Toptal.

Declarative programming is, currently, the dominant paradigm of an extensive and diverse set of domains such as databases, templating and configuration management.

In a nutshell, declarative programming consists of instructing a program on what needs to be done, instead of telling it how to do it. In practice, this approach entails providing a domain-specific language (DSL) for expressing what the user wants, and shielding them from the low-level constructs (loops, conditionals, assignments) that materialize the desired end state.

While this paradigm is a remarkable improvement over the imperative approach that it replaced, I contend that declarative programming has significant limitations, limitations that I explore in this article. Moreover, I propose a dual approach that captures the benefits of declarative programming while superseding its limitations.

CAVEATThis article emerged as the result of a multi-year personal struggle with declarative tools. Many of the claims I present here are not thoroughly proven, and some are even presented at face value. A proper critique of declarative programming would take considerable time, effort, and I would have to go back and use many of these tools; my heart is not in such an undertaking. The purpose of this article is to share a few thoughts with you, pulling no punches, and showing what worked for me. If you’ve struggled with declarative programming tools, you might find respite and alternatives. And if you enjoy the paradigm and its tools, don’t take me too seriously.

If declarative programming works well for you, I’m in no position to tell you otherwise.

You can love or hate declarative programming, but you cannot afford to ignore it.
You can love or hate declarative programming, but you cannot afford to ignore it.

The Merits Of Declarative Programming

Before we explore the limits of declarative programming, it is necessary to understand its merits.

Arguably the most successful declarative programming tool is the relational database (RDB). It might even be the first declarative tool. In any case, RDBs exhibit the two properties that I consider archetypical of declarative programming:

  • A domain specific language (DSL): the universal interface for relational databases is a DSL named Structured Query Language, most commonly known as SQL.
  • The DSL hides the lower level layer from the user: ever since Edgar F. Codd’s original paper on RDBs, it is plain that the power of this model is to dissociate the desired queries from the underlying loops, indexes and access paths that implement them.

Before RDBs, most database systems were accessed through imperative code, which is heavily dependent on low-level details such as the order of records, indexes and the physical paths to the data itself. Because these elements change over time, code often stops working because of some underlying change in the structure of the data. The resulting code is hard to write, hard to debug, hard to read and hard to maintain. I’ll go out a limb and say that most of this code was in, all likelihood, long, full of proverbial rats’ nests of conditionals, repetition and subtle, state-dependent bugs.

In the face of this, RDBs provided a tremendous productivity leap for systems developers. Now, instead of thousands of lines of imperative code, you had a clearly defined data scheme, plus hundreds (or even just tens) of queries. As a result, applications had only to deal with an abstract, meaningful and lasting representation of data, and interface it through a powerful, yet simple query language. The RDB probably raised the productivity of programmers, and companies that employed them, by an order of magnitude.

What are the commonly listed advantages of declarative programming?

Proponents of declarative programming are quick to point out the advantages. However, even they admit it comes with trade-offs.
Proponents of declarative programming are quick to point out the advantages. However, even they admit it comes with trade-offs.
  1. Readability/usability: a DSL is usually closer to a natural language (like English) than to pseudocode, hence more readable and also easier to learn by non-programmers.
  2. Succinctness: much of the boilerplate is abstracted by the DSL, leaving less lines to do the same work.
  3. Reuse: it is easier to create code that can be used for different purposes; something that’s notoriously hard when using imperative constructs.
  4. Idempotence: you can work with end states and let the program figure it out for you. For example, through an upsert operation, you can either insert a row if it is not there, or modify it if it is already there, instead of writing code to deal with both cases.
  5. Error recovery: it is easy to specify a construct that will stop at the first error instead of having to add error listeners for every possible error. (If you’ve ever written three nested callbacks in node.js, you know what I mean.)
  6. Referential transparency: although this advantage is commonly associated with functional programming, it is actually valid for any approach that minimizes manual handling of state and relies on side effects.
  7. Commutativity: the possibility of expressing an end state without having to specify the actual order in which it will be implemented.

While the above are all commonly cited advantages of declarative programming, I would like to condense them into two qualities, which will serve as guiding principles when I propose an alternative approach.

  1. A high-level layer tailored to a specific domain: declarative programming creates a high-level layer using the information of the domain to which it applies. It is clear that if we’re dealing with databases, we want a set of operations for dealing with data. Most of the seven advantages above stem from the creation of a high-level layer that is precisely tailored to a specific problem domain.
  2. Poka-yoke (fool-proofness): a domain-tailored high-level layer hides the imperative details of the implementation. This means that you commit far fewer errors because the low-level details of the system are simply not accessible. This limitation eliminates many classes of errors from your code.

Two Problems With Declarative Programming

In the following two sections, I will present the two main problems of declarative programming: separateness and lack of unfolding. Every critique needs its bogeyman, so I will use HTML templating systems as a concrete example of the shortcomings of declarative programming.

The Problem With DSLs: Separateness

Imagine that you need to write a web application with a non-trivial number of views. Hard coding these views into a set of HTML files is not an option because many components of these pages change.

The most straightforward solution, which is to generate HTML by concatenating strings, seems so horrible that you will quickly look for an alternative. The standard solution is to use a template system. Although there are different types of template systems, we will sidestep their differences for the purpose of this analysis. We can consider all of them to be similar in that the main mission of template systems is to provide an alternative to code that concatenates HTML strings using conditionals and loops, much like RDBs emerged as an alternative to code that looped through data records.

Let’s suppose we go with a standard templating system; you will encounter three sources of friction, which I will list in ascending order of importance. The first is that the template necessarily resides in a file separate from your code. Because the templating system uses a DSL, the syntax is different, so it cannot be in the same file. In simple projects, where file counts are low, the need to keep separate template files may duplicate or treble the amount of files.

I open an exception for Embedded Ruby templates (ERB), because those are integrated into Ruby source code. This is not the case for ERB-inspired tools written in other languages since those templates must also be stored as different files.

The second source of friction is that the DSL has its own syntax, one different from that of your programming language. Hence, modifying the DSL (let alone writing your own) is considerably harder. To go under the hood and change the tool, you need to learn about tokenizing and parsing, which is interesting and challenging, but hard. I happen to see this as a disadvantage.

How can one vizualise a DSL? It’s not easy, but let’s just say a DSL is a clean, shiny layer on top of low-level constructs.
How can one vizualise a DSL? It’s not easy, but let’s just say a DSL is a clean, shiny layer on top of low-level constructs.

You may ask, “Why on earth would you want to modify your tool? If you are doing a standard project, a well-written standard tool should fit the bill.” Maybe yes, maybe no.

A DSL never has the full power of a programming language. If it did, it wouldn’t be a DSL anymore, but rather a full programming language.

But isn’t that the whole point of a DSL? To not have the full power of a programming language available, so that we can achieve abstraction and eliminate most sources of bugs? Maybe, yes. However, most DSLs start simple and then gradually incorporate a growing number of the facilities of a programming language until, in fact, it becomes one. Template systems are a perfect example. Let’s see the standard features of template systems and how they correlate to programming language facilities:

  • Replace text within a template: variable substitution.
  • Repetition of a template: loops.
  • Avoid printing a template if a condition is not met: conditionals.
  • Partials: subroutines.
  • Helpers: subroutines (the only difference with partials is that helpers can access the underlying programming language and let you out of the DSL straightjacket).

This argument, that a DSL is limited because it simultaneously covets and rejects the power of a programming language, is directly proportional to the extent that the features of the DSL are directly mappable to the features of a programming language. In the case of SQL, the argument is weak because most of the things SQL offers are nothing like what you find in a normal programming language. At the other end of the spectrum, we find template systems where virtually every feature is making the DSL converge towards BASIC.

Let’s now step back and contemplate these three quintessential sources of friction, summed up by the concept of separateness. Because it is separate, a DSL needs to be located on a separate file; it is harder to modify (and even harder to write your own), and (often, but not always) needs you to add, one by one, the features you miss from a real programming language.

Separateness is an inherent problem of any DSL, no matter how well designed.

We now turn to a second problem of declarative tools, which is widespread but not inherent.

Another Problem: Lack Of Unfolding Leads To Complexity

If I had written this article a few months ago, this section would have been named Most Declarative Tools Are #@!$#@! Complex But I Don’t Know Why. In the process of writing this article I found a better way of putting it: Most Declarative Tools Are Way More Complex Than They Need To Be. I will spend the rest of this section explaining why. To analyze the complexity of a tool, I propose a measure called the complexity gap. The complexity gap is the difference between solving a given problem with a tool versus solving it in the lower level (presumably, plain imperative code) that the tool intends to replace. When the former solution is more complex than the latter, we are in presence of the complexity gap. By more complex, I mean more lines of code, code that’s harder to read, harder to modify and harder to maintain, but not necessarily all of these at the same time.

Please note that we’re not comparing the lower level solution against the best possible tool, but rather against no tool. This echoes the medical principle of “First, do no harm”.

Signs of a tool with a large complexity gap are:

  • Something that takes a few minutes to describe in rich detail in imperative terms will take hours to code using the tool, even when you know how to use the tool.
  • You feel you are constantly working around the tool rather than with the tool.
  • You are struggling to solve a straightforward problem that squarely belongs in the domain of the tool you are using, but the best Stack Overflow answer you find describes a workaround.
  • When this very straightforward problem could be solved by a certain feature (which does not exist in the tool) and you see a Github issue in the library that features a long discussion of said feature with +1s interspersed.
  • A chronic, itching, longing to ditch the tool and do the whole thing yourself inside a _ for- loop_.

I might have fallen prey to emotion here since template systems are not that complex, but this comparatively small complexity gap is not a merit of their design, but rather because the domain of applicability is quite simple (remember, we’re just generating HTML here). Whenever the same approach is used for a more complex domain (such as configuration management) the complexity gap may quickly turn your project into a quagmire.

That said, it is not necessarily unacceptable for a tool to be somewhat more complex than the lower level it intends to replace; if the tool yields code that is more readable, concise and correct, it can be worth it t. It’s an issue when the tool is several times more complex than the problem it replaces; this is flat-out unacceptable. Brian Kernighan famously stated that, “Controlling complexity is the essence of computer programming.” If a tool adds significant complexity to your project, why even use it?

The question is, why are some declarative tools so much more complex than they need be? I think it would be a mistake to blame it on poor design. Such a general explanation, a blanket ad-hominem attack on the authors of these tools, is not fair. There has to be a more accurate and enlightening explanation.

Origami time! A tool with a high-level interface to an abstract lower level has to unfold the higher level from the lower one.
Origami time! Origami time! A tool with a high-level interface to an abstract lower level has to unfold the higher level from the lower one.

My contention is that any tool that offers a high level interface to abstract a lower level must unfold this higher level from the lower one. The concept of unfolding comes from Christopher Alexander’s magnum opus, The Nature of Order – in particular Volume II. It is (hopelessly) beyond the scope of this article (not to mention my understanding) to summarize the implications of this monumental work for software design; I believe its impact will be huge in years to come. It is also beyond this article to provide a rigorous definition of unfolding processes. I will use here the concept in a heuristic way.

An unfolding process is one that, in a stepwise fashion, creates further structure without negating the existing one. At every step, each change (or differentiation, to use Alexander’s term) remains in harmony with any previous structure, when previous structure is, simply, a crystallized sequence of past changes.

Interestingly enough, Unix is a great example of the unfolding of a higher level from a lower one. In Unix, two complex features of the operative system, batch jobs and coroutines (pipes), are simply extensions of basic commands. Because of certain fundamental design decisions, such as making everything a stream of bytes, the shell being a userland programand standard I/O files, Unix is able to provide these sophisticated features with minimal complexity.

To underline why these are excellent examples of unfolding, I would like to quote a few excerpts of a 1979 paper by Dennis Ritchie, one of the authors of Unix:

On batch jobs:

… the new process control scheme instantly rendered some very valuable features trivial to implement; for example detached processes (with &) and recursive use of the shell as a command. Most systems have to supply some sort of special batch job submissionfacility and a special command interpreter for files distinct from the one used interactively.

On coroutines:

The genius of the Unix pipeline is precisely that it is constructed from the very same commands used constantly in simplex fashion.

UNIX pioneers Dennis Ritchie and Ken Thompson created a powerful demonstration of unfolding in their OS. They also saved us from a dystopian all-Windows future.
UNIX pioneers Dennis Ritchie and Ken Thompson created a powerful demonstration of unfolding in their OS. They also saved us from a dystopian all-Windows future.

This elegance and simplicity, I argue, comes from an unfolding process. Batch jobs and coroutines are unfolded from previous structures (commands run in a userland shell). I believe that because of the minimalist philosophy and limited resources of the team that created Unix, the system evolved stepwise, and as such, was able to incorporate advanced features without turning its back on to the basic ones because there weren’t enough resources to do otherwise.

In the absence of an unfolding process, the high level will be considerably more complex than necessary. In other words, the complexity of most declarative tools stem from the fact that their high level does not unfold from the low level they intend to replace.

This lack of unfoldance, if you forgive the neologism, is routinely justified by the necessity to shield the user from the lower level. This emphasis on poka-yoke (protecting the user from low level errors) comes at the expense of a large complexity gap that is self-defeating because the extra complexity will generate new classes of errors. To add insult to injury, these classes of errors have nothing to do with the problem domain but rather with the tool itself. We would not go too far if we describe these errors as iatrogenic.

Declarative templating tools, at least when applied to the task of generating HTML views, are an archetypical case of a high level that turns its back on the low level it intends to replace. How so? Because generating any non-trivial view requires logic, and templating systems, especially logic-less ones, banish logic through the main door and then smuggle some of it back through the cat door.

Note: An even weaker justification for a large complexity gap is when a tool is marketed as magic, or something that just works, the opaqueness of the low level is supposed to be an asset because a magic tool is always supposed to work without you understanding why or how. In my experience, the more magical a tool purports to be, the faster it transmutes my enthusiasm into frustration.

But what about the separation of concerns? Shouldn’t view and logic remain separate? The core mistake, here, is to put business logic and presentation logic in the same bag. Business logic certainly has no place in a template, but presentation logic exists nevertheless. Excluding logic from templates pushes presentation logic into the server where it is awkwardly accommodated. I owe the clear formulation of this point to Alexei Boronine, who makes an excellent case for it in this article.

My feeling is that roughly two thirds of the work of a template resides in its presentation logic, while the other third deals with generic issues such as concatenating strings, closing tags, escaping special characters, and so on. This is the two-faced low level nature of generating HTML views. Templating systems deal appropriately with the second half, but they don’t fare well with the first. Logic-less templates flat out turn their back on this problem, forcing you to solve it awkwardly. Other template systems suffer because they truly need to provide a non-trivial programming language so their users can actually write presentation logic.

To sum up; declarative templating tools suffer because:

  • If they were to unfold from their problem domain, they would have to provide ways to generate logical patterns;
  • A DSL that provides logic is not really a DSL, but a programming language. Note that other domains, like configuration management, also suffer from lack of “unfoldance.”

I would like to close the critique with an argument that is logically disconnected from the thread of this article, but deeply resonates with its emotional core: We have limited time to learn. Life is short, and on top of that, we need to work. In the face of our limitations, we need to spend our time learning things that will be useful and withstand time, even in the face of fast changing technology. That is why I exhort you to use tools that don’t just provide a solution but actually shed a bright light on the domain of its own applicability. RDBs teach you about data, and Unix teaches you about OS concepts, but with unsatisfactory tools that don’t unfold, I’ve always felt I was learning the intricacies of a sub-optimal solution while remaining in the dark about the nature of problem it intends to solve.

The heuristic I suggest you to consider is, value tools that illuminate their problem domain, instead of tools that obscure their problem domain behind purported features.

The Twin Approach

To overcome the two problems of declarative programming, which I have presented here, I propose a twin approach:

  • Use a data structure domain specific language (dsDSL), to overcome separateness.
  • Create a high level that unfolds from the lower level, to overcome the complexity gap.

dsDSL

A data structure DSL (dsDSL) is a DSL that is built with the data structures of a programming language. The core idea is to use the basic data structures you have available, such as strings, numbers, arrays, objects and functions, and combine them to create abstractions to deal with a specific domain.

We want to keep the power of declaring structures or actions (high level) without having to specify the patterns that implement these constructs (low level). We want to overcome the separateness between the DSL and our programming language so that we are free to use the full power of a programming language whenever we need it. This is not only possible but straightforward through dsDSLs.

If you asked me a year ago, I would have thought that the concept of dsDSL was novel, then one day, I realized that JSON itself was a perfect example of this approach! A parsed JSON object consists of data structures that declaratively represent data entries in order to get the advantages of the DSL while also making it easy to parse and handle from within a programming language. (There might be other dsDSLs out there, but so far I haven’t come across any. If you know of one, I would really appreciate your mentioning it in the comments section.)

Like JSON, a dsDSL has the following attributes:

  1. It consists of a very small set of functions: JSON has two main functions, parse and stringify.
  2. Its functions most commonly receive complex and recursive arguments: a parsed JSON is an array, or object, which usually contains further arrays and objects inside.
  3. The inputs to these functions conform to very specific forms: JSON has an explicit and strictly enforced validation schema to tell valid from invalid structures.
  4. Both the inputs and the outputs of these functions can be contained and generated by a programming language without a separate syntax.

But dsDSLs go beyond JSON in many ways. Let’s create a dsDSL for generating HTML using Javascript. Later I will touch on the issue of whether this approach may be extended to other languages (spoiler: It can definitely be done in Ruby and Python, but probably not in C).

HTML is a markup language composed of tags delimited by angle brackets (< and >). These tags may have optional attributes and contents. Attributes are simply a list of key/value attributes, and contents may be either text or other tags. Both attributes and contents are optionals for any given tag. I’m simplifying somewhat, but it is accurate.

A straightforward way to represent an HTML tag in a dsDSL is by using an array with three elements: – Tag: a string. – Attributes: an object (of the plain, key/value type) or undefined (if no attributes are necessary). – Contents: a string (text), an array (another tag) or undefined (if there’s no contents).

For example, <a href="views">Index</a> can be written as ['a', {href: 'views'}, 'Index'].

If we want to embed this anchor element into a div with class links, we can write: ['div', {class: 'links'}, ['a', {href: 'views'}, 'Index']].

To list several html tags at the same level, we can wrap them in an array:

[
['h1', 'Hello!'],
['a', {href: 'views'}, 'Index']
]

The same principle may be applied to creating multiple tags within a tag:

['body', [
['h1', 'Hello!'],
['a', {href: 'views'}, 'Index']
]]

Of course, this dsDSL won’t get us far if we don’t generate HTML from it. We need a generate function which will take our dsDSL and yield a string with HTML. So if we run generate (['a', {href: 'views'}, 'Index']), we will get the string <a href="views">Index</a>.

The idea behind any DSL is to specify a few constructs with a specific structure which is then passed to a function. In this case, the structure that makes up the dsDSL is this array, which has one to three elements; these arrays have a specific structure. If generate thoroughly validates its input (and it is both easy and important to thoroughly validate input, since these validation rules are the precise analog of a DSL’s syntax), it will tell you exactly where you went wrong with your input. After a while, you’ll start to recognize what distinguishes a valid structure in a dsDSL, and this structure will be highly suggestive of the underlying thing it generates.

Now, what are the merits of a dsDSL in contraposition to a DSL?

  • A dsDSL is an integral part of your code. It leads to lower line counts, file counts, and an overall reduction of overhead.
  • dsDSLs are easy to parse (hence easier to implement and modify). Parsing is merely iterating through the elements of an array or object. Likewise, dsDSLs are comparatively easy to design because instead of creating a new syntax (that everybody will hate) you can stick with the syntax of your programming language (which everybody hates but at least they already know it).
  • A dsDSL has all the power of a programming language. This means that a dsDSL, when properly employed, has the advantage of both a high and a low level tool.

Now, the last claim is a strong one, so I’m going to spend the rest of this section supporting it. What do I mean by properly employed? To see this in action, let’s consider an example in which we want to construct a table to display the information from an array named DATA.

var DATA = [
{id: 1, description: 'Product 1', price: 20,  onSale: true,  categories: ['a']},
{id: 2, description: 'Product 2', price: 60,  onSale: false, categories: ['b']},
{id: 3, description: 'Product 3', price: 120, onSale: false, categories: ['a', 'c']},
{id: 4, description: 'Product 4', price: 45,  onSale: true,  categories: ['a', 'b']}
]

In a real application, DATA will be generated dynamically from a database query.

Moreover, we have a FILTER variable which, when initialized, will be an array with the categories we want to display.

We want our table to:

  • Display table headers.
  • For each product, show the fields: description, price and categories.
  • Don’t print the id field, but add it as an id attribute for each row. ALTERNATE VERSION: Add an id attribute to each tr element.
  • Place a class onSale if the product is on sale.
  • Sort the products by descending price.
  • Filter certain products by category. If FILTER is an empty array, we will display all products. Otherwise, we will only display the products where the category of the product is contained within FILTER.

We can create the presentation logic that matches this requirement in ~20 lines of code:

function drawTable (DATA, FILTER) {
var printableFields = ['description', 'price', 'categories'];
DATA.sort (function (a, b) {return a.price - b.price});
return ['table', [
['tr', dale.do (printableFields, function (field) {
return ['th', field];
})],
dale.do (DATA, function (product) {
var matches = (! FILTER || FILTER.length === 0) || dale.stop (product.categories, true, function (category) {
return FILTER.indexOf (category) !== -1;
});
return matches === false ? [] : ['tr', {
id: product.id,
class: product.onSale ? 'onsale' : undefined
}, dale.do (printableFields, function (field) {
return ['td', product [field]];
})];
})
]];
}

I concede this is not a straightforward example, however, it represents a fairly simple view of the four basic functions of persistent storage, also known as CRUD. Any non-trivial web application will have views that are more complex than this.

Let’s now see what this code is doing. First, it defines a function, drawTable, to contain the presentation logic of drawing the product table. This function receives DATA and FILTER as parameters, so it can be used for different data sets and filters. drawTable fulfills the double role of partial and helper.

   var drawTable = function (DATA, FILTER) {

The inner variable, printableFields, is the only place where you need to specify which fields are printable ones, avoiding repetition and inconsistencies in the face of changing requirements.

   var printableFields = ['description', 'price', 'categories'];

We then sort DATA according to the price of its products. Notice that different and more complex sort criteria would be straightforward to implement since we have the entire programming language at our disposal.

   DATA.sort (function (a, b) {return a.price - b.price});

Here we return an object literal; an array which contains table as its first element and its contents as the second. This is the dsDSL representation of the <table> we want to create.

   return ['table', [

We now create a row with the table headers. To create its contents, we use dale.do which is a function like Array.map, but one that also works for objects. We will iterate printableFields and generate table headers for each of them:

      ['tr', dale.do (printableFields, function (field) {
return ['th', field];
})],

Notice that we have just implemented iteration, the workhorse of HTML generation, and we didn’t need any DSL constructs; we only needed a function to iterate a data structure and return dsDSLs. A similar native, or user-implemented function, would have done the trick as well.

Now iterate through the products contained in DATA.

      dale.do (DATA, function (product) {

We check whether this product is left out by FILTER. If FILTER is empty, we will print the product. If FILTER is not empty, we will iterate through the categories of the product until we find one that is contained within FILTER. We do this using dale.stop.

         var matches = (! FILTER || FILTER.length === 0) || dale.stop (product.categories, true, function (category) {
return FILTER.indexOf (category) !== -1;
});

Notice the intricacy of the conditional; it is precisely tailored to our requirement and we have total freedom for expressing it because we are in a programming language rather than a DSL.

If matches is false, we return an empty array (so we don’t print this product). Otherwise, we return a <tr> with its proper id and class and we iterate through printableFields to, well, print the fields.


return matches === false ? [] : ['tr', {
id: product.id,
class: product.onSale ? 'onsale' : undefined
}, dale.do (printableFields, function (field) {
return ['td', product [field]];

Of course we close everything that we opened. Isn’t syntax fun?

         })];
})
]];
}

Now, how do we incorporate this table into a wider context? We write a function named drawAll that will invoke all functions that generate the views. Apart from drawTable, we might also have drawHeaderdrawFooter and other comparable functions, all of which will return dsDSLs.

var drawAll = function () {
return generate ([
drawHeader (),
drawTable (DATA, FILTER),
drawFooter ()
]);
}

If you don’t like how the above code looks, nothing I say will convince you. This is a dsDSL at its best. You might as well stop reading the article (and drop a mean comment too because you’ve earned the right to do so if you’ve made it this far!). But seriously, if the code above doesn’t strike you as elegant, nothing else in this article will.

For those who are still with me, I would like to go back to the main claim of this section, which is that a dsDSL has the advantages of both the high and the low level:

  • The advantage of the low level resides in writing code whenever we want, getting out of the straightjacket of the DSL.
  • The advantage of the high level resides in using literals that represent what we want to declare and letting the functions of the tool convert that into the desired end state (in this case, a string with HTML).

But how is this truly different from purely imperative code? I think ultimately the elegance of the dsDSL approach boils down to the fact that code written in this way mostly consists of expressions, instead of statements. More precisely, code that uses a dsDSL is almost entirely composed of:

  • Literals that map to lower level structures.
  • Function invocations or lambdas within those literal structures that return structures of the same kind.

Code that consists mostly of expressions and which encapsulate most statements within functions is extremely succinct because all patterns of repetition can be easily abstracted. You can write arbitrary code as long as that code returns a literal that conforms to a very specific, non-arbitrary form.

A further characteristic of dsDSLs (which we don’t have time to explore here) is the possibility of using types to increase the richness and succinctness of the literal structures. I will expound on this issue on a future article.

Might it be possible to create dsDSLs beyond Javascript, the One True Language? I think that it is, indeed, possible, as long as the language supports:

  • Literals for: arrays, objects (associative arrays), function invocations, and lambdas.
  • Runtime type detection
  • Polymorphism and dynamic return types

I think this means that dsDSLs are tenable in any modern dynamic language (i.e.: Ruby, Python, Perl, PHP), but probably not in C or Java.

Walk, Then Slide: How To Unfold The High From The Low

In this section I will attempt to show a way for unfolding a high level tool from its domain. In a nutshell, the approach consists of the following steps

  1. Take two to four problems that are representative instances of a problem domain. These problems should be real. Unfolding the high level from the low one is a problem of induction, so you need real data to come up with representative solutions.
  2. Solve the problems with no tool in the most straightforward way possible.
  3. Stand back, take a good look at your solutions, and notice the common patterns among them.
  4. Find the patterns of representation (high level).
  5. Find the patterns of generation (low level).
  6. Solve the same problems with your high level layer and verify that the solutions are indeed correct.
  7. If you feel that you can easily represent all the problems with your patterns of representation, and the generation patterns for each of these instances produce correct implementations, you’re done. Otherwise, go back to the drawing board.
  8. If new problems appear, solve them with the tool and modify it accordingly.
  9. The tool should converge asymptotically to a finished state, no matter how many problems it solves. In other words, the complexity of the tool should remain constant, rather than growing with the amount of problems it solves.

Now, what the hell are patterns of representation and patterns of generation? I’m glad you asked. The patterns of representation are the patterns in which you should be able to express a problem that belongs to the domain that concerns your tool. It is an alphabet of structures that allows you to write any pattern you might wish to express within its domain of applicability. In a DSL, these would be the production rules. Let’s go back to our dsDSL for generating HTML.

The humble HTML tag is a good example of patterns of representation. Let’s take a closer look at these basic patterns.
The humble HTML tag is a good example of patterns of representation. Let’s take a closer look at these basic patterns.

The patterns of representation for HTML are the following:

  • A single tag: ['TAG']
  • A single tag with attributes: ['TAG', {attribute1: value1, attribute2: value2, ...}]
  • A single tag with contents: ['TAG', 'CONTENTS']
  • A single tag with both attributes and contents: ['TAG', {attribute1: value1, ...}, 'CONTENTS']
  • A single tag with another tag inside: ['TAG1', ['TAG2', ...]]
  • A group of tags (standalone or inside another tag): [['TAG1', ...], ['TAG2', ...]]
  • Depending on a condition, place a tag or no tag: condition ? ['TAG', ...] : [] / Depending on a condition, place an attribute or no attribute: ['TAG', {class: condition ? 'someClass': undefined}, ...]

These instances can be represented with the dsDSL notation we determined in the previous section. And this is all you need to represent any HTML you might need. More sophisticated patterns, such as conditional iteration through an object to generate a table, may be implemented with functions that return the patterns of representation above, and these patterns map directly to HTML tags.

If the patterns of representation are the structures you use to express what you want, the patterns of generation are the structures your tool will use to convert patterns of representation into the lower level structures. For HTML, these are the following:

  • Validate the input (this is actually is an universal pattern of generation).
  • Open and close tags (but not the void tags, like <input>, which are self-closing).
  • Place attributes and contents, escaping special characters (but not the contents of the <style> and <script> tags).

Believe it or not, these are the patterns you need to create an unfolding dsDSL layer that generates HTML. Similar patterns can be found for generating CSS. In fact, lith does both, in ~250 lines of code.

One last question remains to be answered: What do I mean by walk, then slide? When we deal with a problem domain, we want to use a tool that delivers us from the nasty details of that domain. In other words, we want to sweep the low level under the rug, the faster the better. The walk, then slide approach proposes exactly the opposite: spend some time on the low level. Embrace its quirks, and understand which are essential and which can be avoided in the face of a set of real, varied, and useful problems.

After walking in the low level for some time and solving useful problems, you will have a sufficiently deep understanding of their domain. The patterns of representation and generation will then arise naturally; they are wholly derived from the nature of the problem they intend to solve. You can then write code that employs them. If they work, you will be able to slide through problems where you recently had to walk through them. Sliding means many things; it implies speed, precision and lack of friction. Maybe more importantly, this quality can be felt; when solving problems with this tool, do you feel like you’re walking through the problem, or do you feel that you’re sliding through it?

Maybe the most important thing about an unfolded tool is not the fact that it frees us from having to deal with the low level. Rather, by capturing the empiric patterns of repetition in the low level, a good high level tool allows us to understand fully the domain of applicability.

An unfolded tool will not just solve a problem – it will enlighten you about the problem’s structure.

So, don’t run away from a worthy problem. First walk around it, then slide through it.

This post was written by Michael Karchevsky, C++ Developer for Toptal.

Competitions are a great way to level up machine learning skills. Not only do you get access to quality datasets, you are also given clear goals. This helps you focus on the important part: designing quality solutions for problems at hand.

I and a friend of mine recently took part in the N+1 fish, N+2 fish competition. This machine learning competition, with lots of image processing, requires you to process video clips of fish being identified, measured, and kept or thrown back into the sea.

an abstract image of machine learning used to identify and measure fish

In the article, I will walk you through how we approached the problem from the competition using standard image processing techniques and pre-trained neural network models. The performance of the submitted solutions was measured based on a certain formula. With our solution, we managed to secure 11th place.

For a brief introduction to machine learning, you can refer to this article.

About the Competition

We were provided with videos of one or many fish in each segment. These videos were captured on different boats fishing for ground fish in the Gulf of Maine.

The videos were collected from fixed-position cameras placed to look down on a ruler. A fish is placed on the ruler, the fisherman removes their hands from the ruler, and then the fisherman either discards or keeps the fish based on the species and size.

a sample video clip from the project

Performance Metric

There were three tasks that were important to this project. The ultimate goal was to create an algorithm that automatically generates annotations for the video files, where the annotations are comprised of:

  • The sequence of fish that appear
  • The species of each fish that appear in the video
  • The length of each fish that appear in the video

The organizers of the competition created an aggregated metric that gave a general sense of performance on all of these tasks. The metric was a simple weighted combination of an individual metric for each of the tasks. While there were certain weights, they recommended that we focus on a well-rounded algorithm that was able to contribute to all of the tasks!

You can learn more about how the overall performance metric is calculated from the performance metrics of each individual task from the official competition web page.

Designing a Machine Learning Solution

When working with machine learning projects dealing with pictures or videos, you will most likely be using convolutional neural networks. But, before we could use convolutional neural networks, we had to preprocess the frames and solve some other subtasks through different strategies.

For training, we used one nVidia 1080Ti GPU. A good chunk of our time was lost trying to optimize our code in order to stay relevant in the competition. We did, however, end up spending less time where it would have mattered more.

Stage 0: Finding Out the Number of Unique Boats

With silhouette analysis, finding the number of boats became a fairly trivial task. The steps were as follows, and leveraged some very standard techniques:

  1. Get some random frames from each video.
  2. Calculate statistics and Speeded Up Robust Features (SURF) for each image.
  3. Using silhouette analysis for K-means clustering, we can find the number of boats.

SURF detects points of interest in an image and generates feature descriptions. This approach is really robust, even with various image transformations.

Once the features of the points of interest in the image are known, K-means clustering is performed, followed by silhouette analysis to determine an approximate number of boats in the images.

Stage 1: Identifying Repeated Frames

Although the dataset contained separate video files, each video seemed to have some overlaps with other videos in the dataset. This is possibly because the videos were split from one long video and thus ended up having a few common frames at the start or end of each video.

a graphic representation of common frames

To identify such frames and remove them as necessary, we used some quick hash functions on the frames.

Stage 2: Locating the Ruler

By applying some standard image processing methods, we located the position of the ruler and its orientation. We then rotated and cropped the image to position the ruler in a consistent manner across all frames. This also allowed to us reduce the frame size tenfold.

Detected ruler (plotted on the mean frame):

a visual depiction of the ruler detection process, machine learning

Cropped and rotated area:

a photograph of the cropped ruler

Stage 3: Determining the Sequence of the Fish

Implementing this stage to determine the sequence of the fish took a majority of my time during this competition. Training new convolutional neural networks seemed too expensive, so we decided to use pre-trained neural networks.

For this, we chose the following neural networks:

These neural network models are trained on the ImageNet dataset.

We extracted only the convolutional layers of the models and passed through them the competition dataset. At the output, I had a fairly compact array of features.

Then, we trained the neural networks with only fully connected dense layers and predicted results for each pretrained model. After that, we averaged the result, and the results turned out quite poor.

We decided to replace it with Long short-term memory (LSTM) neural networks for better prediction where the input data was a sequence of five frames which were transformed with the pretrained models.

To merge the output of all the models, we used the geometric mean.

The fish detection pipeline was:

  1. Generate features with pretrained models.
  2. Predict the probability of fish appearance on a dense neural network.
  3. Generate LSTM features with pretrained models.
  4. Predict the probability of fish appearance on an LSTM neural network.
  5. Merge models using the geometric mean.

The result for one video looks something like this:

a sample fish detection result shown on a graph with frame index along the x-axis and probability along the y-axis

Stage 4: Identify the Species of the Fish

After spending a majority of the contest duration implementing the previous stage, we tried to make up for the lost time working with models from the previous stage to identify the species of the fish.

Our approach for that was roughly as follows:

  1. Add dense layers to convolutional pretrained models VGG16, VGG19, ResNet50, Xception, InceptionV3 layers (weights of convolutional layers were fixed).
  2. Train models with small image augmentation.
  3. Predict species with each model.
  4. Сonsolidate models by voting.

Stage 5: Detect the Length of the Fish

To determine the length of the fish, we used to neural networks. One of them was trained to identify the fish heads and the other was trained to identify fish tails. The lengths of the fish were approximated as the distance between the two points identified by the two neural networks.

photograph showing the distance between a fish head and tail

Complete Scheme

Here is what the overall scheme of the stages looked like:

flowchart depicting the complete scheme

The overall design was fairly simple as video frames were passed through the stages outlined above before combining the separate results.

UNDE

A Word on Credit Card Hacking

If you know me, or have read my previous post, you know that I worked for a very interesting company before joining Toptal. At this company, our payment provider processed credit card transactions in the neighborhood of $500k per day. Part of my job was to make our provider PCI-DSS compliant—that is, compliant with the Payment Card Industry – Data Security Standard.

It’s safe to say that this wasn’t a job for the faint of heart. At this point, I’m pretty intimate with Credit Cards (CCs), Credit Card hacking and web security in general. After all, our job was to protect our users’ data, to prevent it from being hacked, stolen or misused.

You could imagine my surprise when I saw Bennett Haselton’s 2007 article on Slashdot: Why Are CC Numbers Still So Easy to Find?. In short, Haselton was able to find Credit Card numbers through Google, firstly by searching for a card’s first eight digits in “nnnn nnnn” format, and later using some advanced queries built on number ranges. For example, he could use “4060000000000000..4060999999999999” to find all the 16 digit Primary Account Numbers (PANs) from CHASE (whose cards all begin with 4060). By the way: here’s a full list of Issuer ID numbers.

At the time, I didn’t think much of it, as Google immediately began to filter the types of queries that Bennett was using. When you tried to Google a range like that, Google would serve up a page that said something along the lines of “You’re a bad person”.

This is Google’s response to those trying to figure out how to find credit card numbers online.

About six months ago, while reminiscing with an old friend, this credit card number hack came to mind again. Soon-after, I discovered something alarming. Not terribly alarming, but certainly alarming—so I notified Google, and waited. After a month without a response, I notified them again to no avail.

With a minor tweak on Haselton’s old trick, I was able to Google Credit Card numbers, Social Security numbers, and any other sensitive information of interest.

So I notified Google, and waited. After a month without a response, I notified them again to no avail. With a minor tweak on Haselton’s old trick, I was able to Google Credit Card numbers, Social Security numbers, and any other sensitive information.

Bennett

Yesterday, some friends of mine (buhera.blog.hu and _2501) brought a more recent Slashdot post to my attention: Credit Card Numbers Still Google-able.

The article’s author, again Bennett Haselton, who wrote the original article back in 2007, claims that credit card numbers can still be Googled. You can’t use the number range query hack, but it still can be done. Instead of using simple ranges, you need to apply specific formatting to your query. Something like: “1234 5678” (notice the space in the middle). A lot of hits come up for this query, but very few are of actual interest. Among the contestants are phone numbers, zip-codes, and such. Not extremely alarming. But here comes the credit card hack twist.

The Methodology

I was curious if it was still possible to get credit card numbers online the way we could in 2007. As any good Engineer, I usually approach things using a properly construed and intelligent plan that needs to be perfectly executed with the utmost precision. If you have tried that method, you might know that it can fail really hard—in which case your careful planning and effort goes to waste.

In IT we have a tendency to over-intellectualize, even when it isn’t exactly warranted. I have seen my friends and colleagues completely break applications using seemingly random inputs. Their success rate was stunning and the effort they put into it was close to zero. That’s when I learned that to open a door, sometimes you just have to knock.

The Credit Card Hack

The previous paragraph was a cleverly disguised attempt to make me look like less of an idiot when I show off my “elite hacking skills”. Oops.

First, I tried several range-query-based approaches. Then, I looked at advanced queries and pretty much anything you might come up with in an hour or so. None of them yielded significant results.

And then I had a crazy idea.

What if there was a mismatch between the filtering engine and the actual back-end? What if the message I got from Google (“You are a bad person”) wasn’t from the back-end itself, but instead from a designated filtering engine Google had implemented to censor queries like mine?

It would make a lot of sense from an architectural perspective. And bugs like that are pretty common—we see them in ITSEC all the time, particularly in IDS/IPS solutions, but also in common software. There’s a filtering procedure that processes data and only gives it to the back-end if it thinks the data is acceptable/non-malicious. However, the back-end and the filtering server almost never parse the input in exactly the same way. Thus, a seemingly valid input can go through the filter and wreak havoc on the back-end, effectively bypassing the filter.

You can usually trigger this type of behavior by providing your input in various encodings. For example: instead of using decimal numbers (0-9), how about converting them to hexadecimal or octal or binary? Well, guess what…

Search for this and Google will tell you that you’re a bad person: “4060000000000000..4060999999999999”

Search for this and Google will be happy to oblige: “0xe6c8c69c9c000..0xe6d753e6ecfff”.

The only thing you need to do is to convert credit card numbers from decimal to hexadecimal. That’s it.

The results include…

  • Humongous CSV files filled with potentially sensitive information.

With this simple credit card hack, a major privacy invasion is waiting to happen.

  • Faulty e-commerce log files.

These faulty e-commerce log files make credit card lookup easy.

  • Sensitive information shared on hacker sites (and even Facebook).

How to hack credit cards is as simple as using hexadecimal.

It’s truly scary stuff.

I know this bug won’t inspire any security research, but there you have it. Google made this boo-boo and neglected to even write me back. Well, it happens. I don’t envy the security folks at the big G, though. They must have a lot of stuff to look out for. I’m posting about this credit card number hack here because:

  1. It’s relatively low impact.
  2. Anyone who’s interested and motivated will have figured this out by now.
  3. To quote Haselton, if the big players aren’t taking responsibility and acting on these exploits, then “the right thing to do is to shine a light on the problem and insist that they fix it as soon as possible”.

This trick can be used to look up phone numbers, SSNs, TFNs, and more. And, as Bennett wrote, these numbers are much much harder to change than your Credit Card, for which you can simply call your bank and cancel the card.

Sample Queries

WARNING: Do NOT Google your own credit card number in full!

Look for any CC PAN starting with 4060: 4060000000000000..4060999999999999 ? 0xe6c8c69c9c000..0xe6d753e6ecfff

Some Hungarian phone numbers from the provider ‘Telenor’? No problem: 36200000000..36209999999 ? 0x86db02a00..0x86e48c07f

Look for SSNs. Thankfully, these don’t return many meaningful results: 100000000..999999999 ? 0x5f5e100..0x3b9ac9ff

There are many, many more.

If you find anything very alarming, or if you’re curious about credit card hacking, please leave it in the comments or contact me by email at gergely@toptal.com or on Twitter at @synsecblog. Calling the police is usually futile in these cases, but it might be worth a try. The given merchant or the card provider is usually more keen to address the issue.

Where to Go From Here

Well, Google obviously has to fix this, possibly with the help of the big players like Visa and Mastercard. In fact, Haselton provides a number of interesting suggestions in the two articles linked above.

What you need to do, however (and why I’ve written this post), is spread the word. Credit Card fraud is a big industry, and simple awareness can save you from becoming a victim. Further, if you have an e-commerce site or handle any credit card processing, please make sure that you’re secure. PCI-DSS is a good guideline, but it is far from perfect. Plus, it is always a good idea to Google your site with the “site:mysite.com” advanced query, looking for sensitive numbers. There’s a very, very slim chance that you’ll find anything—but if you do, you must act on it immediately.

Also, a bit of friendly advice: You should never give out your credit card information to anyone. My advice would be to use PayPal or a similar service whenever possible. You can check out these links for further information:

And a few general tips: don’t download things you didn’t ask for, don’t open spam emails, and remember that your bank will never ask for your password.

By the way: If you think there’s no one stupid enough to fall for these credit card hacking techniques or give away their credit card information on the internet, have a look at @NeedADebitCard.

Stay safe people!

This article is written by Gergely Kalman and originally posted at Toptal

About the Author: With a background in IT-Security, Gergely has worked as lead developer for an Alexa Top 50 website serving several million unique visitors each month. He is a diligent and motivated worker who likes to dive in and get things done.

Introduction

Before digging into Node.js, you might want to read up on the benefits of using JavaScript across the stack which unifies the language and data format (JSON), allowing you to optimally reuse developer resources. As this is more a benefit of JavaScript than Node.js specifically, we won’t discuss it much here. But it’s a key advantage to incorporating Node in your stack.

JavaScript’s rising popularity has brought with it a lot of changes, and the face of web development today is dramatically different. The things that we can do on the web nowadays with JavaScript running on the server, as well as in the browser, were hard to imagine just several years ago, or were encapsulated within sandboxed environments like Flash or Java Applets.

As Wikipedia states: “Node.js is a packaged compilation of Google’s V8 JavaScript engine, the libuv platform abstraction layer, and a core library, which is itself primarily written in JavaScript.” Beyond that, it’s worth noting that Ryan Dahl, the creator of Node.js, was aiming to create real-time websites with push capability, “inspired by applications like Gmail”. In Node.js, he gave developers a tool for working in the non-blocking, event-driven I/O paradigm.

After over 20 years of stateless-web based on the stateless request-response paradigm, we finally have web applications with real-time, two-way connections.

In one sentence: Node.js shines in real-time web applications employing push technology over websockets. What is so revolutionary about that? Well, after over 20 years of stateless-web based on the stateless request-response paradigm, we finally have web applications with real-time, two-way connections, where both the client and server can initiate communication, allowing them to exchange data freely. This is in stark contrast to the typical web response paradigm, where the client always initiates communication. Additionally, it’s all based on the open web stack (HTML, CSS and JS) running over the standard port 80.

One might argue that we’ve had this for years in the form of Flash and Java Applets—but in reality, those were just sandboxed environments using the web as a transport protocol to be delivered to the client. Plus, they were run in isolation and often operated over non-standard ports, which may have required extra permissions and such.

With all of its advantages, Node.js now plays a critical role in the technology stack of many high-profile companies who depend on its unique benefits. The Node.js Foundation has consolidated all the best thinking around why enterprises should consider Node.js in a short presentation that can be found on the Node.js Foundation’s Case Studies page.

In this post, I’ll discuss not only how these advantages are accomplished, but also why you might want to use Node.js—and why not—using some of the classic web application models as examples.

How Does It Work?

The main idea of Node.js: use non-blocking, event-driven I/O to remain lightweight and efficient in the face of data-intensive real-time applications that run across distributed devices.

That’s a mouthful.

What it really means is that Node.js is not a silver-bullet new platform that will dominate the web development world. Instead, it’s a platform that fills a particular need.

What it really means is that Node.js is not a silver-bullet new platform that will dominate the web development world. Instead, it’s a platform that fills a particular need. And understanding this is absolutely essential. You definitely don’t want to use Node.js for CPU-intensive operations; in fact, using it for heavy computation will annul nearly all of its advantages. Where Node really shines is in building fast, scalable network applications, as it’s capable of handling a huge number of simultaneous connections with high throughput, which equates to high scalability.

How it works under-the-hood is pretty interesting. Compared to traditional web-serving techniques where each connection (request) spawns a new thread, taking up system RAM and eventually maxing-out at the amount of RAM available, Node.js operates on a single-thread, using non-blocking I/O calls, allowing it to support tens of thousands of concurrent connections (held in the event loop).

Diagram of traditional vs. Node.js server thread

A quick calculation: assuming that each thread potentially has an accompanying 2 MB of memory with it, running on a system with 8 GB of RAM puts us at a theoretical maximum of 4,000 concurrent connections (calculations taken from Michael Abernethy’s article “Just what is Node.js?”, published on IBM developerWorks in 2011; unfortunately, the article is not available anymore), plus the cost of context-switching between threads. That’s the scenario you typically deal with in traditional web-serving techniques. By avoiding all that, Node.js achieves scalability levels of over 1M concurrent connections, and over 600k concurrent websockets connections.

There is, of course, the question of sharing a single thread between all clients requests, and it is a potential pitfall of writing Node.js applications. Firstly, heavy computation could choke up Node’s single thread and cause problems for all clients (more on this later) as incoming requests would be blocked until said computation was completed. Secondly, developers need to be really careful not to allow an exception bubbling up to the core (topmost) Node.js event loop, which will cause the Node.js instance to terminate (effectively crashing the program).

The technique used to avoid exceptions bubbling up to the surface is passing errors back to the caller as callback parameters (instead of throwing them, like in other environments). Even if some unhandled exception manages to bubble up, tools have been developed to monitor the Node.js process and perform the necessary recovery of a crashed instance (although you probably won’t be able to recover the current state of the user session), the most common being the Forever module, or using a different approach with external system tools upstart and monit, or even just upstart.

NPM: The Node Package Manager

When discussing Node.js, one thing that definitely should not be omitted is built-in support for package management using the NPM tool that comes by default with every Node.js installation. The idea of NPM modules is quite similar to that of Ruby Gems: a set of publicly available, reusable components, available through easy installation via an online repository, with version and dependency management.

A full list of packaged modules can be found on the npm website, or accessed using the npm CLI tool that automatically gets installed with Node.js. The module ecosystem is open to all, and anyone can publish their own module that will be listed in the npm repository. A brief introduction to npm can be found in a Beginner’s Guide, and details on publishing modules in the npm Publishing Tutorial.

Some of the most useful npm modules today are:

  • express – Express.js, a Sinatra-inspired web development framework for Node.js, and the de-facto standard for the majority of Node.js applications out there today.
  • hapi – a very modular and simple to use configuration-centric framework for building web and services applications
  • connect – Connect is an extensible HTTP server framework for Node.js, providing a collection of high performance “plugins” known as middleware; serves as a base foundation for Express.
  • socket.io and sockjs – Server-side component of the two most common websockets components out there today.
  • pug (formerly Jade) – One of the popular templating engines, inspired by HAML, a default in Express.js.
  • mongodb and mongojs – MongoDB wrappers to provide the API for MongoDB object databases in Node.js.
  • redis – Redis client library.
  • lodash (underscore, lazy.js) – The JavaScript utility belt. Underscore initiated the game, but got overthrown by one of its two counterparts, mainly due to better performance and modular implementation.
  • forever – Probably the most common utility for ensuring that a given node script runs continuously. Keeps your Node.js process up in production in the face of any unexpected failures.
  • bluebird – A full featured Promises/A+ implementation with exceptionally good performance
  • moment – A lightweight JavaScript date library for parsing, validating, manipulating, and formatting dates.

The list goes on. There are tons of really useful packages out there, available to all (no offense to those that I’ve omitted here).

Examples of Where Node.js Should Be Used

CHAT

Chat is the most typical real-time, multi-user application. From IRC (back in the day), through many proprietary and open protocols running on non-standard ports, to the ability to implement everything today in Node.js with websockets running over the standard port 80.

The chat application is really the sweet-spot example for Node.js: it’s a lightweight, high traffic, data-intensive (but low processing/computation) application that runs across distributed devices. It’s also a great use-case for learning too, as it’s simple, yet it covers most of the paradigms you’ll ever use in a typical Node.js application.

Let’s try to depict how it works.

In the simplest example, we have a single chatroom on our website where people come and can exchange messages in one-to-many (actually all) fashion. For instance, say we have three people on the website all connected to our message board.

On the server-side, we have a simple Express.js application which implements two things: 1) a GET ‘/’ request handler which serves the webpage containing both a message board and a ‘Send’ button to initialize new message input, and 2) a websockets server that listens for new messages emitted by websocket clients.

On the client-side, we have an HTML page with a couple of handlers set up, one for the ‘Send’ button click event, which picks up the input message and sends it down the websocket, and another that listens for new incoming messages on the websockets client (i.e., messages sent by other users, which the server now wants the client to display).

When one of the clients posts a message, here’s what happens:

  1. Browser catches the ‘Send’ button click through a JavaScript handler, picks up the value from the input field (i.e., the message text), and emits a websocket message using the websocket client connected to our server (initialized on web page initialization).
  2. Server-side component of the websocket connection receives the message and forwards it to all other connected clients using the broadcast method.
  3. All clients receive the new message as a push message via a websockets client-side component running within the web page. They then pick up the message content and update the web page in-place by appending the new message to the board.
Diagram of client and server websockets in a Node.js application

This is the simplest example. For a more robust solution, you might use a simple cache based on the Redis store. Or in an even more advanced solution, a message queue to handle the routing of messages to clients and a more robust delivery mechanism which may cover for temporary connection losses or storing messages for registered clients while they’re offline. But regardless of the improvements that you make, Node.js will still be operating under the same basic principles: reacting to events, handling many concurrent connections, and maintaining fluidity in the user experience.

API ON TOP OF AN OBJECT DB

Although Node.js really shines with real-time applications, it’s quite a natural fit for exposing the data from object DBs (e.g. MongoDB). JSON stored data allow Node.js to function without the impedance mismatch and data conversion.

For instance, if you’re using Rails, you would convert from JSON to binary models, then expose them back as JSON over the HTTP when the data is consumed by Backbone.js, Angular.js, etc., or even plain jQuery AJAX calls. With Node.js, you can simply expose your JSON objects with a REST API for the client to consume. Additionally, you don’t need to worry about converting between JSON and whatever else when reading or writing from your database (if you’re using MongoDB). In sum, you can avoid the need for multiple conversions by using a uniform data serialization format across the client, server, and database.

QUEUED INPUTS

If you’re receiving a high amount of concurrent data, your database can become a bottleneck. As depicted above, Node.js can easily handle the concurrent connections themselves. But because database access is a blocking operation (in this case), we run into trouble. The solution is to acknowledge the client’s behavior before the data is truly written to the database.

With that approach, the system maintains its responsiveness under a heavy load, which is particularly useful when the client doesn’t need firm confirmation of a the successful data write. Typical examples include: the logging or writing of user-tracking data, processed in batches and not used until a later time; as well as operations that don’t need to be reflected instantly (like updating a ‘Likes’ count on Facebook) where eventual consistency (so often used in NoSQL world) is acceptable.

Data gets queued through some kind of caching or message queuing infrastructure (e.g., RabbitMQZeroMQ) and digested by a separate database batch-write process, or computation intensive processing backend services, written in a better performing platform for such tasks. Similar behavior can be implemented with other languages/frameworks, but not on the same hardware, with the same high, maintained throughput.

Diagram of a database batch-write in Node.js with message queuing

In short: with Node, you can push the database writes off to the side and deal with them later, proceeding as if they succeeded.

DATA STREAMING

In more traditional web platforms, HTTP requests and responses are treated like isolated event; in fact, they’re actually streams. This observation can be utilized in Node.js to build some cool features. For example, it’s possible to process files while they’re still being uploaded, as the data comes in through a stream and we can process it in an online fashion. This could be done for real-time audio or video encoding, and proxying between different data sources (see next section).

PROXY

Node.js is easily employed as a server-side proxy where it can handle a large amount of simultaneous connections in a non-blocking manner. It’s especially useful for proxying different services with different response times, or collecting data from multiple source points.

An example: consider a server-side application communicating with third-party resources, pulling in data from different sources, or storing assets like images and videos to third-party cloud services.

Although dedicated proxy servers do exist, using Node instead might be helpful if your proxying infrastructure is non-existent or if you need a solution for local development. By this, I mean that you could build a client-side app with a Node.js development server for assets and proxying/stubbing API requests, while in production you’d handle such interactions with a dedicated proxy service (nginx, HAProxy, etc.).

BROKERAGE – STOCK TRADER’S DASHBOARD

Let’s get back to the application level. Another example where desktop software dominates, but could be easily replaced with a real-time web solution is brokers’ trading software, used to track stocks prices, perform calculations/technical analysis, and create graphs/charts.

Switching to a real-time web-based solution would allow brokers to easily switch workstations or working places. Soon, we might start seeing them on the beach in Florida.. or Ibiza.. or Bali.

APPLICATION MONITORING DASHBOARD

Another common use-case in which Node-with-web-sockets fits perfectly: tracking website visitors and visualizing their interactions in real-time.

You could be gathering real-time stats from your user, or even moving it to the next level by introducing targeted interactions with your visitors by opening a communication channel when they reach a specific point in your funnel. (If you’re interested, this idea is already being productized by CANDDi.)

Imagine how you could improve your business if you knew what your visitors were doing in real-time—if you could visualize their interactions. With the real-time, two-way sockets of Node.js, now you can.

SYSTEM MONITORING DASHBOARD

Now, let’s visit the infrastructure side of things. Imagine, for example, an SaaS provider that wants to offer its users a service-monitoring page (e.g., GitHub’s status page). With the Node.js event-loop, we can create a powerful web-based dashboard that checks the services’ statuses in an asynchronous manner and pushes data to clients using websockets.

Both internal (intra-company) and public services’ statuses can be reported live and in real-time using this technology. Push that idea a little further and try to imagine a Network Operations Center (NOC) monitoring applications in a telecommunications operator, cloud/network/hosting provider, or some financial institution, all run on the open web stack backed by Node.js and websockets instead of Java and/or Java Applets.

Note: Don’t try to build hard real-time systems in Node (i.e., systems requiring consistent response times). Erlang is probably a better choice for that class of application.

Where Node.js Can Be Used

SERVER-SIDE WEB APPLICATIONS

Node.js with Express.js can also be used to create classic web applications on the server-side. However, while possible, this request-response paradigm in which Node.js would be carrying around rendered HTML is not the most typical use-case. There are arguments to be made for and against this approach. Here are some facts to consider:

Pros:

  • If your application doesn’t have any CPU intensive computation, you can build it in Javascript top-to-bottom, even down to the database level if you use JSON storage Object DB like MongoDB. This eases development (including hiring) significantly.
  • Crawlers receive a fully-rendered HTML response, which is far more SEO-friendly than, say, a Single Page Application or a websockets app run on top of Node.js.

Cons:

  • Any CPU intensive computation will block Node.js responsiveness, so a threaded platform is a better approach. Alternatively, you could try scaling out the computation [*].
  • Using Node.js with a relational database is still quite a pain (see below for more detail). Do yourself a favour and pick up any other environment like Rails, Django, or ASP.Net MVC if you’re trying to perform relational operations.
[*] An alternative to these CPU intensive computations is to create a highly scalable MQ-backed environment with back-end processing to keep Node as a front-facing ‘clerk’ to handle client requests asynchronously.

Where Node.js Shouldn’t Be Used

SERVER-SIDE WEB APPLICATION W/ A RELATIONAL DB BEHIND

Comparing Node.js with Express.js against Ruby on Rails, for example, there is a clean decision in favour of the latter when it comes to relational data access.

Relational DB tools for Node.js are still in their early stages; they’re rather immature and not as pleasant to work with. On the other hand, Rails automagically provides data access setup right out of the box together with DB schema migrations support tools and other Gems (pun intended). Rails and its peer frameworks have mature and proven Active Record or Data Mapper data access layer implementations, which you’ll sorely miss if you try to replicate them in pure JavaScript.[*]

Still, if you’re really inclined to remain JS all-the-way (and ready to pull out some of your hair), keep an eye on Sequelize and Node ORM2—both are still immature, but they may eventually catch up.

[*] It’s possible and not uncommon to use Node solely as a front-end, while keeping your Rails back-end and its easy-access to a relational DB.

HEAVY SERVER-SIDE COMPUTATION/PROCESSING

When it comes to heavy computation, Node.js is not the best platform around. No, you definitely don’t want to build a Fibonacci computation server in Node.js. In general, any CPU intensive operation annuls all the throughput benefits Node offers with its event-driven, non-blocking I/O model because any incoming requests will be blocked while the thread is occupied with your number-crunching.

As stated previously, Node.js is single-threaded and uses only a single CPU core. When it comes to adding concurrency on a multi-core server, there is some work being done by the Node core team in the form of a cluster module [ref: http://nodejs.org/api/cluster.html]. You can also run several Node.js server instances pretty easily behind a reverse proxy via nginx.

With clustering, you should still offload all heavy computation to background processes written in a more appropriate environment for that, and having them communicate via a message queue server like RabbitMQ.

Even though your background processing might be run on the same server initially, such an approach has the potential for very high scalability. Those background processing services could be easily distributed out to separate worker servers without the need to configure the loads of front-facing web servers.

Of course, you’d use the same approach on other platforms too, but with Node.js you get that high reqs/sec throughput we’ve talked about, as each request is a small task handled very quickly and efficiently.

Conclusion

We’ve discussed Node.js from theory to practice, beginning with its goals and ambitions, and ending with its sweet spots and pitfalls. When people run into problems with Node, it almost always boils down to the fact that blocking operations are the root of all evil—99% of Node misuses come as a direct consequence.

In Node, blocking operations are the root of all evil—99% of Node misuses come as a direct consequence.

Remember: Node.js was never created to solve the compute scaling problem. It was created to solve the I/O scaling problem, which it does really well.

Why use Node.js? If your use case does not contain CPU intensive operations nor access any blocking resources, you can exploit the benefits of Node.js and enjoy fast and scalable network applications. Welcome to the real-time web.

This post was written by Tomislav Capan, JavaScript Developer for Toptal.

Operations research, the science of using computers to make optimal decisions, has been used and applied in many areas of software and programming. To give a few examples, logistics companies use it to determine the optimal route to get from point A to point B, power companies use it to determine power production schedules, and manufacturing companies use it to find optimal staffing patterns for their factories.

Mixed-integer programming

Today, we will explore the power of operations research by walking through a hypothetical problem. Specifically, we will use mixed-integer programming (MIP) to determine the optimal staffing pattern for a hypothetical factory.

Operations Research Algorithms

Before we dive into our example problem, it is helpful to go over some basic concepts in operations research and understand a bit of the tools we will be using today.

Linear Programming Algorithms

Linear programming is an operations research technique used to determine the best outcome in a mathematical model where the objective and the constraints are expressed as a system of linear equations. An example linear programming model might look like this:

Maximize a + b (objective)
Subject to:
a <= 2 (constraint 1)
b <= 3 (constraint 2)

In our very simple example, we can see that the optimal outcome is 5, with a = 2 and b = 3. While this is a rather trivial example, you can probably imagine a linear programming model that utilizes thousands of variables and hundreds of constraints.

A good example might be an investment portfolio problem, where you might end up with something like the below, in pseudo-code:

Maximize <expected profit from all stock investments>
Subject to:
<investment in the technology sector must be between 10% - 20% of portfolio>
<investment in the consumer staples must not exceed investment in financials>
Etc.
...

Which would be rather complex and difficult to solve by hand or trial and error. Operations research software will use several algorithms to solve these problems quickly. If you’re interested in the underlying algorithms, I recommend learning about the simplex method here and the interior point method here.

Integer Programming Algorithms

Integer programming is like linear programming with an additional allowance for some or all of the variables to be integer values. While this may not seem like a large improvement at first, it allows us to solve many problems that could have remained unsolved using linear programming alone.

One such problem is the knapsack problem, where we are given a set of items with assigned values and weights and are asked to find the highest value combination of items we can fit into our knapsack. A linear programming model will not be able to solve this because there is no way to express the idea that you can either put an item into your knapsack or not, but you cannot put part of an item into your knapsack—every variable is a continuous variable!

An example knapsack problem might have the following parameters:

Object Value (units of $10) Size (generic units)
Camera 5 2
Mysterious figurine 7 4
Huge bottle of apple cider 2 7
French horn 10 10

And the MIP model will look like this:

Maximize 5a + 7b + 2c + 10d (objective: maximize value of items take)
Subject to:
2a + 4b + 7c + 10d <= 15 (space constraint)

The optimal solution, in this case, is a=0, b=1, c=0, d=1, with the value of the total item being 17.

The problem we will solve today will also require integer programming since an employee at a factory can either be scheduled for a shift or not—for the sake of simplicity, you cannot schedule an employee for 2/3 of a shift at this factory.

To make integer programming possible, several mathematical algorithms are used. If you are interested in the underlying algorithms, I recommend studying the cutting-planes algorithm and the branch-and-bound algorithm here.

Example Problem: Scheduling

Problem Description

Today, we will explore the problem of staffing a factory. As the management of the factory, we will want to minimize labor costs, but we want to ensure sufficient coverage for every shift to meet production demand.

Suppose we have five shifts with the following staffing demands:

Shift 1 Shift 2 Shift 3 Shift 4 Shift 5
1 person 4 people 3 people 5 people 2 people

And, suppose we have the following workers:

Name Availability Cost per Shift ($)
Melisandre 1, 2, 5 20
Bran 2, 3, 4, 5 15
Cersei 3, 4 35
Daenerys 4, 5 35
Theon 2, 4, 5 10
Jon 1, 3, 5 25
Tyrion 2, 4, 5 30
Jaime 2, 3, 5 20
Arya 1, 2, 4 20

To keep the problem simple, let us assume for the moment that availability and cost are the sole concerns.

Our Tools

For today’s problem, we will use a piece of open source branch-and-cut software called CBC. We will interface with this software using PuLP, which is a popular operations research modeling library for Python. You can find information about it here.

PuLP allows us to construct MIP models in a very Pythonic fashion and integrates nicely with existing Python code. This is very useful as some of the most popular data manipulation and analysis tools are written in Python, and most commercial operations research systems require extensive data processing before and after the optimization.

To demonstrate the simplicity and elegance of PuLP, here is the knapsack problem from earlier and solved in PuLP:

import pulp as pl
# declare some variables
# each variable is a binary variable that is either 0 or 1
# 1 means the item will go into the knapsack
a = pl.LpVariable("a", 0, 1, pl.LpInteger)
b = pl.LpVariable("b", 0, 1, pl.LpInteger)
c = pl.LpVariable("c", 0, 1, pl.LpInteger)
d = pl.LpVariable("d", 0, 1, pl.LpInteger)
# define the problem
prob = pl.LpProblem("knapsack", pl.LpMaximize)
# objective function - maximize value of objects in knapsack
prob += 5 * a + 7 * b + 2 * c + 10 * d
# constraint - weight of objects cannot exceed 15
prob += 2 * a + 4 * b + 7 * c + 10 * d <= 15
status = prob.solve()  # solve using the default solver, which is cbc
print(pl.LpStatus[status])  # print the human-readable status
# print the values
print("a", pl.value(a))
print("b", pl.value(b))
print("c", pl.value(c))
print("d", pl.value(d))

Run this, and you should get the output:

Optimal
a 0.0
b 1.0
c 0.0
d 1.0

Now onto our scheduling problem!

Coding Our Solution

The coding of our solution is fairly straightforward. First, we will want to define our data:

import pulp as pl
import collections as cl
# data
shift_requirements = [1, 4, 3, 5, 2]
workers = {
"Melisandre": {
"availability": [0, 1, 4],
"cost": 20
},
"Bran": {
"availability": [1, 2, 3, 4],
"cost": 15
},
"Cersei": {
"availability": [2, 3],
"cost": 35
},
"Daenerys": {
"availability": [3, 4],
"cost": 35
},
"Theon": {
"availability": [1, 3, 4],
"cost": 10
},
"Jon": {
"availability": [0, 2, 4],
"cost": 25
},
"Tyrion": {
"availability": [1, 3, 4],
"cost": 30
},
"Jaime": {
"availability": [1, 2, 4],
"cost": 20
},
"Arya": {
"availability": [0, 1, 3],
"cost": 20
}
}

Next, we will want to define the model:

# define the model: we want to minimize the cost
prob = pl.LpProblem("scheduling", pl.LpMinimize)
# some model variables
cost = []
vars_by_shift = cl.defaultdict(list)
for worker, info in workers.items():
for shift in info['availability']:
worker_var = pl.LpVariable("%s_%s" % (worker, shift), 0, 1, pl.LpInteger)
vars_by_shift[shift].append(worker_var)
cost.append(worker_var * info['cost'])
# set the objective to be the sum of cost
prob += sum(cost)
# set the shift requirements
for shift, requirement in enumerate(shift_requirements):
prob += sum(vars_by_shift[shift]) >= requirement

And now we just ask it to solve and print the results!

status = prob.solve()
print("Result:", pl.LpStatus[status])
results = []
for shift, vars in vars_by_shift.items():
results.append({
"shift": shift,
"workers": [var.name for var in vars if var.varValue == 1],
})
for result in sorted(results, key=lambda x: x['shift']):
print("Shift:", result['shift'], 'workers:', ', '.join(result['workers']))

Once you run the code, you should see the following outputs:

Result: Optimal
Shift: 0 workers: Arya_0
Shift: 1 workers: Melisandre_1, Bran_1, Theon_1, Arya_1
Shift: 2 workers: Bran_2, Jon_2, Jaime_2
Shift: 3 workers: Bran_3, Daenerys_3, Theon_3, Tyrion_3, Arya_3
Shift: 4 workers: Bran_4, Theon_4

Crank Up the Difficulty a Bit: Additional Constraints

Even though the previous model was interesting and useful, it does not fully demonstrate the power of mixed-integer programming. We could also easily write a for loop to find the cheapest x workers for every shift, where x is the number of workers needed for a shift. To demonstrate how MIP can be used to solve a problem that would be challenging to solve in an imperative fashion, let us examine what would happen if we add a few extra constraints.

Additional Constraints

Suppose that, due to new labor regulations, no workers can be assigned to more than 2 shifts. To account for the increased work, we have recruited the help of Dothraki Staffing Group, who will supply up to 10 Dothraki workers for each shift at a rate of $40 per shift.

Additionally suppose that, due to some ongoing interpersonal conflicts outside of our factory, Cersei and Jaime are unable to work any shifts alongside either Daenerys or Jon. Additionally, Jaime and Cersei are unable to work any shifts together. Finally, Arya, who has particularly complex interpersonal relationships, is unable to work in the same shift with Jaime, Cersei, or Melisandre.

The addition of these two new constraints and resources by no means makes the problem unsolvable through imperative means, but it makes the solution much harder, as one will have to account for opportunity costs of scheduling a worker to a particular shift.

Solution

With mixed-integer programming, however, it is much easier. We simply need to amend our code like so:

Define the ban list and some constraints:

ban_list = {
("Daenerys", "Jaime"),
("Daenerys", "Cersei"),
("Jon", "Jaime"),
("Jon", "Cersei"),
("Arya", "Jaime"),
("Arya", "Cersei"),
("Arya", "Melisandre"),
("Jaime", "Cersei")
}
DOTHRAKI_MAX = 10
DOTHRAKI_COST = 45

Populate variables for implementing the ban and max shift constraints:

for worker, info in workers.items():
for shift in info['availability']:
worker_var = pl.LpVariable("%s_%d" % (worker, shift), 0, 1, pl.LpInteger)
# store some variable data so we can implement the ban constraint
var_data = (worker,)
vars_by_shift[shift].append((worker_var, var_data))
# store vars by variable so we can implement the max shift constraint
vars_by_worker[worker].append(worker_var)
cost.append(worker_var * info['cost'])

Add the Dothraki variables:

for shift in range(len(shift_requirements)):
dothraki_var = pl.LpVariable('dothraki_%d' % shift, 0, DOTHRAKI_MAX, pl.LpInteger)
cost.append(dothraki_var * DOTHRAKI_COST)
dothrakis_by_shift[shift] = dothraki_var

We will also need a slightly modified loop for computing shift and ban requirements:

# set the shift requirements
for shift, requirement in enumerate(shift_requirements):
prob += sum([var[0] for var in vars_by_shift[shift]]) + dothrakis_by_shift[shift] >= requirement
# set the ban requirements
for shift, vars in vars_by_shift.items():
for var1 in vars:
for var2 in vars:
worker_pair = var1[1][0], var2[1][0]
if worker_pair in ban_list:
prob += var1[0] + var2[0] <= 1
# set the labor standards:
for worker, vars in vars_by_worker.items():
prob += sum(vars) <= 2

And finally, to print the results:

status = prob.solve()
print("Result:", pl.LpStatus[status])
results = []
for shift, vars in vars_by_shift.items():
results.append({
"shift": shift,
"workers": [var[1][0] for var in vars if var[0].varValue == 1],
"dothrakis": dothrakis_by_shift[shift].varValue
})
for result in sorted(results, key=lambda x: x['shift']):
print("Shift:", result['shift'], 'workers:', ', '.join(result['workers']), 'dothrakis hired:', int(result['dothrakis']))

And we should be good to go. Run the code and you should see output as below:

Result: Optimal
Shift: 0 workers: Arya dothrakis hired: 0
Shift: 1 workers: Melisandre, Theon, Tyrion, Jaime dothrakis hired: 0
Shift: 2 workers: Bran, Jon dothrakis hired: 1
Shift: 3 workers: Bran, Daenerys, Theon, Tyrion, Arya dothrakis hired: 0
Shift: 4 workers: Melisandre, Jaime dothrakis hired: 0

And there we have it, a result that respects the banned workers’ list, follows labor regulations, and uses Dothraki workers judiciously.

Conclusion

Today, we explored using mixed-integer programming to make better decisions. We discussed the underlying algorithms and techniques used in operations research and looked at an example problem that is representative of how mixed-integer programming is used in the real world.

I hope this article inspired you to learn more about operations research and made you think about how this technology can be applied to your projects. We’ve only really seen the tip of the iceberg when it comes to the exciting world of optimization algorithms and operations research.

You can find all the code related to this article on GitHub. If you would like to discuss more, share your comments below!

This post was written by Shanglung Wang, Python Developer for Toptal.

Inexperienced programmers often think that the Java automatic garbage collection completely frees them from worrying about memory management. This is a common misperception: while the garbage collector does its best, it’s entirely possible for even the best programmer to fall prey to crippling memory leaks. Let me explain.

A memory leak occurs when object references that are no longer needed are unnecessarily maintained. These leaks are bad. For one, they put unnecessary pressure on your machine as your programs consume more and more resources. To make things worse, detecting these leaks can be difficult: static analysis often struggles to precisely identify these redundant references, and existing leak detection tools track and report fine-grained information about individual objects, producing results that are hard to interpret and lack precision.

In other words, leaks are either too hard to identify, or identified in terms that are too specific to be useful.

There actually four categories of memory issues with similar and overlapping symptoms, but varied causes and solutions:

  • Performance: usually associated with excessive object creation and deletion, long delays in garbage collection, excessive operating system page swapping, and more.
  • Resource constraints: occurs when there’s either to little memory available or your memory is too fragmented to allocate a large object—this can be native or, more commonly, Java heap-related.
  • Java heap leaks: the classic memory leak, in which Java objects are continuously created without being released. This is usually caused by latent object references.
  • Native memory leaks: associated with any continuously growing memory utilization that is outside the Java heap, such as allocations made by JNI code, drivers or even JVM allocations.

In this memory management tutorial, I’ll focus on Java heaps leaks and outline an approach to detect such leaks based on Java VisualVM reports and utilizing a visual interface for analyzing Java technology-based applications while they’re running.

But before you can prevent and find memory leaks, you should understand how and why they occur. (Note: If you have a good handle on the intricacies of memory leaks, you can skip ahead.)

Memory Leaks: A Primer

For starters, think of memory leakage as a disease and Java’s OutOfMemoryError (OOM, for brevity) as a symptom. But as with any disease, not all OOMs necessarily imply memory leaks: an OOM can occur due to the generation of a large number of local variables or other such events. On the other hand, not all memory leaks necessarily manifest themselves as OOMs, especially in the case of desktop applications or client applications (which aren’t run for very long without restarts).

Think of memory leakage as a disease and the OutOfMemoryError as a symptom. But not all OutOfMemoryErrors imply memory leaks, and not all memory leaks manifest themselves as OutOfMemoryErrors.

Why are these leaks so bad? Among other things, leaking blocks of memory during program execution often degrades system performance over time, as allocated but unused blocks of memory will have to be swapped out once the system runs out of free physical memory. Eventually, a program may even exhaust its available virtual address space, leading to the OOM.

Deciphering the OutOfMemoryError

As mentioned above, the OOM is a common indication of a memory leak. Essentially, the error is thrown when there’s insufficient space to allocate a new object. Try as it might, the garbage collector can’t find the necessary space, and the heap can’t be expanded any further. Thus, an error emerges, along with a stack trace.

The first step in diagnosing your OOM is to determine what the error actually means. This sounds obvious, but the answer isn’t always so clear. For example: Is the OOM appearing because the Java heap is full, or because the native heap is full? To help you answer this question, let’s analyze a few of the the possible error messages:

  • java.lang.OutOfMemoryError: Java heap space
  • java.lang.OutOfMemoryError: PermGen space
  • java.lang.OutOfMemoryError: Requested array size exceeds VM limit
  • java.lang.OutOfMemoryError: request <size> bytes for <reason>. Out of swap space?
  • java.lang.OutOfMemoryError: <reason> <stack trace> (Native method)

“Java heap space”

This error message doesn’t necessarily imply a memory leak. In fact, the problem can be as simple as a configuration issue.

For example, I was responsible for analyzing an application which was consistently producing this type of OutOfMemoryError. After some investigation, I figured out that the culprit was an array instantiation that was demanding too much memory; in this case, it wasn’t the application’s fault, but rather, the application server was relying on the default heap size, which was too small. I solved the problem by adjusting the JVM’s memory parameters.

In other cases, and for long-lived applications in particular, the message might be an indication that we’re unintentionally holding references to objects, preventing the garbage collector from cleaning them up. This is the Java language equivalent of a memory leak. (Note: APIs called by an application could also be unintentionally holding object references.)

Another potential source of these “Java heap space” OOMs arises with the use of finalizers. If a class has a finalize method, then objects of that type do not have their space reclaimed at garbage collection time. Instead, after garbage collection, the objects are queued for finalization, which occurs later. In the Sun implementation, finalizers are executed by a daemon thread. If the finalizer thread cannot keep up with the finalization queue, then the Java heap could fill up and an OOM could be thrown.

“PermGen space”

This error message indicates that the permanent generation is full. The permanent generation is the area of the heap that stores class and method objects. If an application loads a large number of classes, then the size of the permanent generation might need to be increased using the -XX:MaxPermSize option.

Interned java.lang.String objects are also stored in the permanent generation. The java.lang.String class maintains a pool of strings. When the intern method is invoked, the method checks the pool to see if an equivalent string is present. If so, it’s returned by the intern method; if not, the string is added to the pool. In more precise terms, the java.lang.String.intern method returns a string’s canonical representation; the result is a reference to the same class instance that would be returned if that string appeared as a literal. If an application interns a large number of strings, you might need to increase the size of the permanent generation.

Note: you can use the jmap -permgen command to print statistics related to the permanent generation, including information about internalized String instances.

“Requested array size exceeds VM limit”

This error indicates that the application (or APIs used by that application) attempted to allocate an array that is larger than the heap size. For example, if an application attempts to allocate an array of 512MB but the maximum heap size is 256MB, then an OOM will be thrown with this error message. In most cases, the problem is either a configuration issue or a bug that results when an application attempts to allocate a massive array.

“Request <size> bytes for <reason>. Out of swap space?”

This message appears to be an OOM. However, the HotSpot VM throws this apparent exception when an allocation from the native heap failed and the native heap might be close to exhaustion. Included in the message are the size (in bytes) of the request that failed and the reason for the memory request. In most cases, the <reason> is the name of the source module that’s reporting an allocation failure.

If this type of OOM is thrown, you might need to use troubleshooting utilities on your operating system to diagnose the issue further. In some cases, the problem might not even be related to the application. For example, you might see this error if:

  • The operating system is configured with insufficient swap space.
  • Another process on the system is consuming all available memory resources.

It’s also is possible that the application failed due to a native leak (for example, if some bit of application or library code is continuously allocating memory but fails to releasing it to the operating system).

<reason> <stack trace> (Native method)

If you see this error message and the top frame of your stack trace is a native method, then that native method has encountered an allocation failure. The difference between this message and the previous is that the Java memory allocation failure was detected in a JNI or native method rather than in Java VM code.

If this type of OOM is thrown, you might need to use utilities on the operating system to further diagnose the issue.

Application Crash Without OOM

Occasionally, an application might crash soon after an allocation failure from the native heap. This occurs if you’re running native code that doesn’t check for errors returned by memory allocation functions.

For example, the malloc system call returns NULL if there is no memory available. If the return from malloc is not checked, then the application might crash when it attempts to access an invalid memory location. Depending on the circumstances, this type of issue can be difficult to locate.

In some cases, the information from the fatal error log or the crash dump will be sufficient. If the cause of a crash is determined to be a lack of error-handling in some memory allocations, then you must hunt down the reason for said allocation failure. As with any other native heap issue, the system might be configured with insufficient swap space, another process might be consuming all available memory resources, etc.

Diagnosing Leaks

In most cases, diagnosing memory leaks requires very detailed knowledge of the application in question. Warning: the process can be lengthy and iterative.

Our strategy for hunting down memory leaks will be relatively straightforward:

  1. Identify symptoms
  2. Enable verbose garbage collection
  3. Enable profiling
  4. Analyze the trace

1. Identify Symptoms

As discussed, in many cases, the Java process will eventually throw an OOM runtime exception, a clear indicator that your memory resources have been exhausted. In this case, you need to distinguish between a normal memory exhaustion and a leak. Analyzing the OOM’s message and try to find the culprit based on the discussions provided above.

Oftentimes, if a Java application requests more storage than the runtime heap offers, it can be due to poor design. For instance, if an application creates multiple copies of an image or loads a file into an array, it will run out of storage when the image or file is very large. This is a normal resource exhaustion. The application is working as designed (although this design is clearly boneheaded).

But if an application steadily increases its memory utilization while processing the same kind of data, you might have a memory leak.

2. Enable Verbose Garbage Collection

One of the quickest ways to assert that you indeed have a memory leak is to enable verbose garbage collection. Memory constraint problems can usually be identified by examining patterns in the verbosegcoutput.

Specifically, the -verbosegc argument allows you to generates a trace each time the garbage collection (GC) process is begun. That is, as memory is being garbage-collected, summary reports are printed to standard error, giving you a sense of how your memory is being managed.

Here’s some typical output generated with the –verbosegc option:

verbose garbage collection output

Each block (or stanza) in this GC trace file is numbered in increasing order. To make sense of this trace, you should look at successive Allocation Failure stanzas and look for freed memory (bytes and percentage) decreasing over time while total memory (here, 19725304) is increasing. These are typical signs of memory depletion.

3. Enable Profiling

Different JVMs offer different ways to generate trace files to reflect heap activity, which typically include detailed information about the type and size of objects. This is called profiling the heap.

4. Analyze the Trace

This post focuses on the trace generated by Java VisualVM. Traces can come in different formats, as they can be generated by different Java memory leak detection tools, but the idea behind them is always the same: find a block of objects in the heap that should not be there, and determine if these objects accumulate instead of releasing. Of particular interest are transient objects that are known to be allocated each time a certain event is triggered in the Java application. The presence of many object instances that ought to exist only in small quantities generally indicates an application bug.

Finally, solving memory leaks requires you to review your code thoroughly. Learning about the type of object leaking can be very helpful and considerably speed up debugging.

How Does Garbage Collection Work in the JVM?

Before we start our analysis of an application with a memory leak issue, let’s first look at how garbage collection works in the JVM.

The JVM uses a form of garbage collector called a tracing collector, which essentially operates by pausing the world around it, marking all root objects (objects referenced directly by running threads), and following their references, marking each object it sees along the way.

Java implements something called a generational garbage collector based upon the generational hypothesis assumption, which states that the majority of objects that are created are quickly discarded, and objects that are not quickly collected are likely to be around for a while.

Based on this assumption, Java partitions objects into multiple generations. Here’s a visual interpretation:

java partions into multiple generations
  • Young Generation – This is where objects start out. It has two sub-generations:
    • Eden Space – Objects start out here. Most objects are created and destroyed in the Eden Space. Here, the GC does Minor GCs, which are optimized garbage collections. When a Minor GC is performed, any references to objects that are still needed are migrated to one of the survivors spaces (S0 or S1).
    • Survivor Space (S0 and S1) – Objects that survive Eden end up here. There are two of these, and only one is in use at any given time (unless we have a serious memory leak). One is designated as empty, and the other as live, alternating with every GC cycle.
  • Tenured Generation – Also known as the old generation (old space in Fig. 2), this space holds older objects with longer lifetimes (moved over from the survivor spaces, if they live for long enough). When this space is filled up, the GC does a Full GC, which costs more in terms of performance. If this space grows without bound, the JVM will throw an OutOfMemoryError - Java heap space.
  • Permanent Generation – A third generation closely related to the tenured generation, the permanent generation is special because it holds data required by the virtual machine to describe objects that do not have an equivalence at the Java language level. For example, objects describing classes and methods are stored in the permanent generation.

Java is smart enough to apply different garbage collection methods to each generation. The young generation is handled using a tracing, copying collector called the Parallel New Collector. This collector stops the world, but because the young generation is generally small, the pause is short.

For more information about the JVM generations and how them work in more detail visit the Memory Management in the Java HotSpot™ Virtual Machine documentation.

Detecting a Memory Leak

To find memory leaks and eliminate them, you need the proper memory leak tools. It’s time to detect and remove such a leak using the Java VisualVM.

Remotely Profiling the Heap with Java VisualVM

VisualVM is a tool that provides a visual interface for viewing detailed information about Java technology-based applications while they are running.

With VisualVM, you can view data related to local applications and those running on remote hosts. You can also capture data about JVM software instances and save the data to your local system.

In order to benefit from all of Java VisualVM’s features, you should run the Java Platform, Standard Edition (Java SE) version 6 or above.

Related: Why You Need to Upgrade to Java 8 Already

Enabling Remote Connection for the JVM

In a production environment, it’s often difficult to access the actual machine on which our code will be running. Luckily, we can profile our Java application remotely.

First, we need to grant ourselves JVM access on the target machine. To do so, create a file called jstatd.all.policy with the following content:

grant codebase "file:${java.home}/../lib/tools.jar" {
permission java.security.AllPermission;
};

Once the file has been created, we need to enable remote connections to the target VM using the jstatd – Virtual Machine jstat Daemon tool, as follows:

jstatd -p <PORT_NUMBER> -J-Djava.security.policy=<PATH_TO_POLICY_FILE>

For example:

jstatd -p 1234 -J-Djava.security.policy=D:\jstatd.all.policy

With the jstatd started in the target VM, we are able to connect to the target machine and remotely profile the application with memory leak issues.

Connecting to a Remote Host

In the client machine, open a prompt and type jvisualvm to open the VisualVM tool.

Next, we must add a remote host in VisualVM. As the target JVM is enabled to allow remote connections from another machine with J2SE 6 or greater, we start the Java VisualVM tool and connect to the remote host. If the connection with the remote host was successful, we will see the Java applications that are running in the target JVM, as seen here:

running in the target jvm

To run a memory profiler on the application, we just double-click its name in the side panel.

Now that we’re all set up with a memory analyzer, let’s investigate an application with a memory leak issue, which we’ll call MemLeak.

MemLeak

Of course, there are a number of ways to create memory leaks in Java. For simplicity we will define a class to be a key in a HashMap, but we will not define the equals() and hashcode() methods.

A HashMap is a hash table implementation for the Map interface, and as such it defines the basic concepts of key and value: each value is related to a unique key, so if the key for a given key-value pair is already present in the HashMap, its current value is replaced.

It’s mandatory that our key class provides a correct implementation of the equals() and hashcode()methods. Without them, there is no guarantee that a good key will be generated.

By not defining the equals() and hashcode() methods, we add the same key to the HashMap over and over and, instead of replacing the key as it should, the HashMap grows continuously, failing to identify these identical keys and throwing an OutOfMemoryError.

Here’s the MemLeak class:

package com.post.memory.leak;
import java.util.Map;
public class MemLeak {
public final String key;
public MemLeak(String key) {
this.key =key;
}
public static void main(String args[]) {
try {
Map map = System.getProperties();
for(;;) {
map.put(new MemLeak("key"), "value");
}
} catch(Exception e) {
e.printStackTrace();
}
}
}

Note: the memory leak is not due to the infinite loop on line 14: the infinite loop can lead to a resource exhaustion, but not a memory leak. If we had properly implemented equals() and hashcode() methods, the code would run fine even with the infinite loop as we would only have one element inside the HashMap.

(For those interested, here are some alternative means of (intentionally) generating leaks.)

Using Java VisualVM

With Java VisualVM, we can memory-monitor the Java Heap and identify if its behavior is indicative of a memory leak.

Here’s a graphical representation of MemLeak’s Java Heap analyzer just after initialization (recall our discussion of the various generations):

monitor memory leaks using java visualvm

After just 30 seconds, the Old Generation is almost full, indicating that, even with a Full GC, the Old Generation is ever-growing, a clear sign of a memory leak.

One means of detecting the cause of this leak is shown in the following image (click to zoom), generated using Java VisualVM with a heapdump. Here, we see that 50% of Hashtable$Entry objects are in the heap, while the second line points us to the MemLeak class. Thus, the memory leak is caused by a hash table used within the MemLeak class.

hash table memory leak

Finally, observe the Java Heap just after our OutOfMemoryError in which the Young and Old generations are completely full.

outofmemoryerror

Conclusion

Memory leaks are among the most difficult Java application problems to resolve, as the symptoms are varied and difficult to reproduce. Here, we’ve outlined a step-to-step approach to discovering memory leaks and identifying their sources. But above all, read your error messages closely and pay attention to your stack traces—not all leaks are as simple as they appear.

This article is originally published at Toptal