Protecting your CouchDB Views

If you work with a SQL or other RDBMS database you most likely have your schema backed up somewhere under source control.  Maybe it’s a bunch of SQL scripts, maybe it’s the classes from which you generated your Entity Framework schema, but you almost certainly have some way of restoring your DB schema into a new database (at least I hope that you do!).

couchdb

But what about CouchDB?

CouchDB, as anyone who has read the first sentence of a beginners guide will know, is a Non-Relational Database and so it does not have a schema.  All of the data is stored as arbitrary JSON documents which can (and do) contain data in a wide range of formats.

The problem is that whilst there is no schema to “restore” into a new database, there is another very important construct: views.

CouchDB Views

Views within CouchDB define how you query the data.  Sure, you can always fall back to basic ID-lookup to retrieve documents, but as soon as you want to do any form of complicated (i.e. useful) querying then you will mostly likely need to create a view.

Each view comprises 2 JavaScript functions: a map function and an optional reduce function.  I don’t want to go into a lot of detail on the map-reduce algorithm or how CouchDB views work under the covers (there are plenty of other resources out there) but the important thing here is that you have to write some code that will play a very significant role in how your application behaves and that should be in source control somewhere!

Storing Views in Source Control

In order to put our view code under source control we first need to get it into a format that can be saved to disk.  In CouchDB, views are stored in design documents and the design documents are stored as JSON, so we can get a serialized copy of the view definitions by just GETting the design document from couch:

curl http://localhost:5984/databaseName/_design/designDocumentName

Pass the output through pretty-print and you will see the contents of the design document in a JSON structure:

{
   "_id": "_design/designDocumentName",
   "_rev": "1-47b20721ccd032b984d3d46f61fa94a8",
   "views": {
       "viewName": {
           "map": "function (doc) {\r\n\t\t\t\tif (doc.value1 === 1) {\r\n\t\t\t\t\temit(\"one\", null);\r\n\t\t\t\t} else {\r\n\t\t\t\t\temit(\"other\", {\r\n\t\t\t\t\t\tother: doc.value1\r\n\t\t\t\t\t});\r\n\t\t\t\t}\r\n\t\t\t}"
        }
   },
   "filters": {
       "filterName": "function () {}"
   }
}

This is, at least, a serialized representation of the source for our view, and there are definitely some advantages to using this approach.  On the other hand, there are quite a few things wrong with using this structure in source control:

Unnecessary Data
The purpose of this exercise is to make sure that the view code is safely recoverable; whilst there is debatably some use in storing the ID, the revision (_rev) field refers to the state of the database and may vary between installations and shouldn’t be needed.

Functions as Strings
The biggest problem with this approach is that the map, reduce and filter functions are stored as strings.  You may be able to put up with this in simple examples, but as soon as they contain any complexity (or indentation, as seen above) they become completely unreadable.  Tabs and newlines are all concatenated into one huge several-hundred-character string, all stored on one line.  Whilst this is not a technical issue (you could still use these to restore the views) it makes any kind of change tracking impossible to understand – every change is on the same line!

As well as the readability issues we also lose the ability to perform any kind of analysis on the view code.  Whether that is static analysis (such as jsLint), unit testing or some-other-thing, we cannot run any of them against a string.

An Alternative Format

Instead of taking a dump of the design documents directly from CouchDB, I would recommend using an alternative format geared towards readability and testability.  You could be pretty creative in exactly how you wanted to lay this out (one file per design document, one file per view…) but I have found that the structure below seems to work quite well:

exports.designDocumentName = {
	views: {
		viewName: {
			map: function (doc) {
				//some obviously-fake logic for demo purposes
				if (doc.value1 === 1) {
					emit("one", null);
				} else {
					emit("other", {
						other: doc.value1
					});
				}
			}
		}
	},
	filters: {
		filterName: function () { }
	}
};

exports.secondDesignDocument = {
	//...
};

This has several advantages over the original format:

  • It is much easier to read!  You get syntax highlighting, proper indentation and the other wonderful features of your favourite code editor
  • There is no redundant information stored
  • jsLint/jsHint can easily be configured to validate the functions
  • By using the AMD exports object, the code is available to unit tests and other utilities (more on that below)

There is one significant disadvantage though: because I have pulled this structure out of thin air, CouchDB has no way of understanding it.  This means that whilst my view code is safe and sound under source control I have no way of restoring it.  At least with the original document-dump approach I could manually copy/paste the contents of each design document into the database!

So how can we deal with that?

Restoring Views

As I mentioned above, one of the advantages of attaching design documents as objects to the AMD exports object is that they can be exposed to node utilities very easily.  To demonstrate this I have created a simple tool that is able to create or update design documents from a file such as the one above in a single command: view-builder.

You can see the source for the command on GitHub or you can install it using NPM.

npm install -g view-builder

After installation you can run the tool like this:

view-builder --url http://localhost:5984/databasename  --defs ./view-definitions.js

This will go through the definitions and for each of the design documents…

  1. Download the latest version of the design document from the server
  2. Create a new design document if none already exists
  3. Compare each view and filter to identify any changes
  4. If changes are present, update the version on the server

The comparison is an important step in this workflow – updating a design document will cause CouchDB to rebuild all of the views within it; if you have a lot of data then this can be a very slow process!

Now we have a human-readable design document definition that can be source-controlled, unit tested and then automatically applied to any database to which we have access.  Not bad…

Other Approaches

Whilst this system works for me, I can’t imagine that I am the first person to have considered this problem.  How is everyone else protecting their views?  Any suggestions or improvements in the comments are always welcome!

Advertisements