Protecting your CouchDB Views

If you work with a SQL or other RDBMS database you most likely have your schema backed up somewhere under source control.  Maybe it’s a bunch of SQL scripts, maybe it’s the classes from which you generated your Entity Framework schema, but you almost certainly have some way of restoring your DB schema into a new database (at least I hope that you do!).

couchdb

But what about CouchDB?

CouchDB, as anyone who has read the first sentence of a beginners guide will know, is a Non-Relational Database and so it does not have a schema.  All of the data is stored as arbitrary JSON documents which can (and do) contain data in a wide range of formats.

The problem is that whilst there is no schema to “restore” into a new database, there is another very important construct: views.

CouchDB Views

Views within CouchDB define how you query the data.  Sure, you can always fall back to basic ID-lookup to retrieve documents, but as soon as you want to do any form of complicated (i.e. useful) querying then you will mostly likely need to create a view.

Each view comprises 2 JavaScript functions: a map function and an optional reduce function.  I don’t want to go into a lot of detail on the map-reduce algorithm or how CouchDB views work under the covers (there are plenty of other resources out there) but the important thing here is that you have to write some code that will play a very significant role in how your application behaves and that should be in source control somewhere!

Storing Views in Source Control

In order to put our view code under source control we first need to get it into a format that can be saved to disk.  In CouchDB, views are stored in design documents and the design documents are stored as JSON, so we can get a serialized copy of the view definitions by just GETting the design document from couch:

curl http://localhost:5984/databaseName/_design/designDocumentName

Pass the output through pretty-print and you will see the contents of the design document in a JSON structure:

{
   "_id": "_design/designDocumentName",
   "_rev": "1-47b20721ccd032b984d3d46f61fa94a8",
   "views": {
       "viewName": {
           "map": "function (doc) {\r\n\t\t\t\tif (doc.value1 === 1) {\r\n\t\t\t\t\temit(\"one\", null);\r\n\t\t\t\t} else {\r\n\t\t\t\t\temit(\"other\", {\r\n\t\t\t\t\t\tother: doc.value1\r\n\t\t\t\t\t});\r\n\t\t\t\t}\r\n\t\t\t}"
        }
   },
   "filters": {
       "filterName": "function () {}"
   }
}

This is, at least, a serialized representation of the source for our view, and there are definitely some advantages to using this approach.  On the other hand, there are quite a few things wrong with using this structure in source control:

Unnecessary Data
The purpose of this exercise is to make sure that the view code is safely recoverable; whilst there is debatably some use in storing the ID, the revision (_rev) field refers to the state of the database and may vary between installations and shouldn’t be needed.

Functions as Strings
The biggest problem with this approach is that the map, reduce and filter functions are stored as strings.  You may be able to put up with this in simple examples, but as soon as they contain any complexity (or indentation, as seen above) they become completely unreadable.  Tabs and newlines are all concatenated into one huge several-hundred-character string, all stored on one line.  Whilst this is not a technical issue (you could still use these to restore the views) it makes any kind of change tracking impossible to understand – every change is on the same line!

As well as the readability issues we also lose the ability to perform any kind of analysis on the view code.  Whether that is static analysis (such as jsLint), unit testing or some-other-thing, we cannot run any of them against a string.

An Alternative Format

Instead of taking a dump of the design documents directly from CouchDB, I would recommend using an alternative format geared towards readability and testability.  You could be pretty creative in exactly how you wanted to lay this out (one file per design document, one file per view…) but I have found that the structure below seems to work quite well:

exports.designDocumentName = {
	views: {
		viewName: {
			map: function (doc) {
				//some obviously-fake logic for demo purposes
				if (doc.value1 === 1) {
					emit("one", null);
				} else {
					emit("other", {
						other: doc.value1
					});
				}
			}
		}
	},
	filters: {
		filterName: function () { }
	}
};

exports.secondDesignDocument = {
	//...
};

This has several advantages over the original format:

  • It is much easier to read!  You get syntax highlighting, proper indentation and the other wonderful features of your favourite code editor
  • There is no redundant information stored
  • jsLint/jsHint can easily be configured to validate the functions
  • By using the AMD exports object, the code is available to unit tests and other utilities (more on that below)

There is one significant disadvantage though: because I have pulled this structure out of thin air, CouchDB has no way of understanding it.  This means that whilst my view code is safe and sound under source control I have no way of restoring it.  At least with the original document-dump approach I could manually copy/paste the contents of each design document into the database!

So how can we deal with that?

Restoring Views

As I mentioned above, one of the advantages of attaching design documents as objects to the AMD exports object is that they can be exposed to node utilities very easily.  To demonstrate this I have created a simple tool that is able to create or update design documents from a file such as the one above in a single command: view-builder.

You can see the source for the command on GitHub or you can install it using NPM.

npm install -g view-builder

After installation you can run the tool like this:

view-builder --url http://localhost:5984/databasename  --defs ./view-definitions.js

This will go through the definitions and for each of the design documents…

  1. Download the latest version of the design document from the server
  2. Create a new design document if none already exists
  3. Compare each view and filter to identify any changes
  4. If changes are present, update the version on the server

The comparison is an important step in this workflow – updating a design document will cause CouchDB to rebuild all of the views within it; if you have a lot of data then this can be a very slow process!

Now we have a human-readable design document definition that can be source-controlled, unit tested and then automatically applied to any database to which we have access.  Not bad…

Other Approaches

Whilst this system works for me, I can’t imagine that I am the first person to have considered this problem.  How is everyone else protecting their views?  Any suggestions or improvements in the comments are always welcome!

Advertisements

Deserializing Interface Properties using Json.Net

The Problem

Let’s say that you have a simple class structure such as this one:

public class Thing
{
	public string Name { get; set; }
}

public class ThingContainer
{
	public Thing TheThing { get; set; }
}

Here we have a class ThingContainer that has a single reference property of type Thing, and Json.Net will do a great job of serializing and deserializing instances of ThingContainer without any extra help:

static void Main(string[] args)
{
	var container = new ThingContainer
	{
		TheThing = new Thing { Name = "something" }
	};
	var serialized = JsonConvert.SerializeObject(container, Formatting.Indented);
	Console.WriteLine(serialized);
	// {
	//   "TheThing": {
	//      "Name: "something"
	//   }
	// }
	var deserialized = JsonConvert.DeserializeObject<ThingContainer>(serialized);
	Console.WriteLine(deserialized.TheThing.Name);
	// "something"
}

Unfortunately the real-world is rarely that simple and today you are writing a good model so you can’t go about using concrete types. Instead, you want to specify your properties as interfaces:

public interface IThing
{
	string Name { get; set; }
}

public class Thing : IThing
{
	public string Name { get; set; }
}

public class ThingContainer
{
	//notice that the property is now of an interface type...
	public IThing TheThing { get; set; }
}

After making these changes the serialization will still work as before, but when we try to deserialize the model we get the following error:

Could not create an instance of type JsonSerialization.IThing. Type is an interface or abstract class and cannot be instantated

This means that the JSON deserializer has seen that there is a property of type IThing but doesn’t know what type of object it should create to populate it.

Enter JsonConverter

The solution to this is to explicitly tell the deserializer what type it should be instantiating, and we do this using an attribute – specifically the JsonConverterAttribute.

The JsonConverterAttribute is part of Json.Net, and allows you to specify a custom converter to handle serialization and deserialization of an object by extending JsonConverter.

public class ThingContainer
{
	[JsonConverter(typeof(/*CustomConverterType*/))]
	public IThing TheThing { get; set; }
}

In this case we are going to write a custom implementation of JsonConverter that will behave exactly as the non-attributed property would, but using a specific type.

The code below shows the shell of the converter class and the methods we need to override. Notice that we are specifying a generic type parameter TConcrete on the class – this will set the desired concrete type for the property when we actually use the attribute later.

public class ConcreteTypeConverter<TConcrete> : JsonConverter
{
	public override bool CanConvert(Type objectType)
	{
		//determine whether or not this converted can create an instance of
		//the specified object type
	}

	public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
	{
		//deserialize an object from the specified reader
	}

	public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
	{
		//serialize the object value
	}
}

The roles of the missing methods are fairly self explanatory, and seeing as we’re feeling lazy today we’ll pick off the easy ones first.

CanConvert

What we should be doing in CanConvert is determining whether or not the converter can create values of the specified objectType. What I am actually going to do here is just return true (i.e. “yes, we can create anything”). This will get us through the example and leaves the real implementation as an exercise for the reader…

WriteJson

The WriteJson method is responsible for serializing the object, and (as I mentioned above) the serialization of interface properties already works as expected. We can therefore just use the default serialization to fulfil our needs:

public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
	//use the default serialization - it works fine
	serializer.Serialize(writer, value);
}

ReadJson

The ReadJson method is where it starts to get interesting, as this is actually what is causing us the problem.

We need to re-implement this method so that it both instantiates and populates an instance of our concrete type.

Thankfully, Json.Net already knows how to populate the object – and any sub-objects, for that matter – so all we really need to do is get it to use the correct concrete type. We can do this by explicitly calling the Deserialize<T> overload on the serializer:

public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
	//explicitly specify the concrete type we want to create
	//that was set as a generic parameter on this class
	return serializer.Deserialize<TConcrete>(reader);
}

Usage and Final Code

Now that we have our custom attribute we just need to specify it on the model…

public class ThingContainer
{
	[JsonConverter(typeof(ConcreteTypeConverter<Thing>))]
	public IThing TheThing { get; set; }
}

…and we can deserialize without any errors!

The final code for the converter is below. There are quite a few extensions that could be made to this, but as a quick-fix to a problem that I come across often this will do the job nicely.

public class ConcreteTypeConverter<TConcrete> : JsonConverter
{
	public override bool CanConvert(Type objectType)
	{
		//assume we can convert to anything for now
		return true;
	}

	public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
	{
		//explicitly specify the concrete type we want to create
		return serializer.Deserialize<TConcrete>(reader);
	}

	public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
	{
		//use the default serialization - it works fine
		serializer.Serialize(writer, value);
	}
}