The Open/Close principle
As an introduction I'll quickly recap the definition of the open/close principle, the O in SOLID. (if you don't know what the SOLID principles are, take a quick look at the
wikipedia article on SOLID. Learn them, use them, thank me later).
Officially the open/close principle states that: Software entities should be open for extension, but closed for modification. The idea here is that code that is tried and tested, shouldn't have to change because you want to add some functionality to it.
For example: if your have an application that can write some data to file in xml, but some time later you also need to add the possibility to write the same data in json format. If you adhered to the open/close principle, then you should just be able extend the program without having to change the existing code that writes your data to an xml file. You could for example write some sort of mapper that maps your data to json or maybe even just a jsonwriter or something.
If you didn't adhere to the open/close principle, well then you'll probably need to modify your existing exporter with a bunch of if/else statements and this could potentially create a waterfall of changes or worse, introduce bugs in code that previously worked perfectly. I'm sure your client won't be very happy when their xml files don't get created anymore just because they can now export to json.
The assignment
I've come op with an assignment that perfectly shows the need for this principle: A generic dispatcher. The client is going to send us data through a predefined interface. They have two types of data: Customers and Articles. They also use two different file formats: xml and json.
We can easily detect the datatype and the format because the client told us that:
- Files containing an article will always start with "article"
- Files containing a customer will always start with "customer"
- Files containing xml will always have the ".xml" extension
- Files containing json will always have the ".json" extension
Our assignment is to parse each file, convert it to a domain object and then schedule a job
that will do the actual processing. Our implementation will have an interface injected which we
have to use to schedule the different jobs. Each type of data, Customer and Article has a separate type of job that can process it. Each separate job works on a class derived from the JobData class and contains the actual data that should be processed.
So in summary:
- Article data should be passed to an IProcessArticleJob implementation together with an ProcessArticleJobData instance that contains the actual article
- Customer data should be passed to an IProcessCustomerJob implementation together with an ProcessCustomerJobData instance that contains the actual customer.
To simplify this, the developer responsible for writing the actual jobs has created a utility interface for us: IJobScheduler. It has a single method: void Schedule(Type jobType, JobData data).
So when we're dispatching a parsed file, we call the
Schedule method and pass it the type of job we want to schedule, and we also pass it the corresponding jobdata containing the actual data.
A very simple straightforward implementation of these specs could look like this:
public class DefaultFileProcessor : IFileProcessor
{
private readonly IJobScheduler _jobScheduler;
public DefaultFileProcessor(IJobScheduler jobScheduler)
{
_jobScheduler = jobScheduler;
}
public void Process(string filename)
{
var contents = new StreamReader(filename);
object parsed = null;
if (Path.GetFileNameWithoutExtension(filename) == "article")
{
switch (Path.GetExtension(filename))
{
case ".xml":
var xmlSerializer = new XmlSerializer(typeof(Article));
parsed = (Article)xmlSerializer.Deserialize(contents);
break;
case ".json":
var contentsAsString = contents.ReadToEnd();
parsed = JsonConvert.DeserializeObject<Article>(contentsAsString);
break;
default:
throw new ArgumentException("Cannot parse file");
}
var jobData = new ProcessArticleJobData { Article = (Article)parsed };
var jobType = typeof(IProcessArticleJob);
_jobScheduler.Schedule(jobType, jobData);
}
else if (Path.GetFileNameWithoutExtension(filename) == "customer")
{
switch (Path.GetExtension(filename))
{
case ".xml":
var xmlSerializer = new XmlSerializer(typeof (Customer));
parsed = (Customer)xmlSerializer.Deserialize(contents);
break;
case ".json":
var contentsAsString = contents.ReadToEnd();
parsed = JsonConvert.DeserializeObject<Customer>(contentsAsString);
break;
default:
throw new ArgumentException("Cannot parse file");
}
var jobData = new ProcessCustomerJobData { Customer = (Customer)parsed };
var jobType = typeof(IProcessCustomerJob);
_jobScheduler.Schedule(jobType, jobData);
}
else
throw new ArgumentException("Cannot parse file");
}
}
Technically, this is a good implementation, really, there is, technically, nothing wrong with it. It does what it has to do! But: it is very closely coupled and it will quickly evolve into a monolithic monster as soon as some of the specs change (and the specs
always change, right).
Let's say the client sends us updated specs, and suddenly, they also want us to process quotes, orders and invoices. Additionally, we also have to support the CSV file format. How would that impact our code?
- We would have to add three more if-else blocks: one for quotes, one for orders and one for invoices.
- Each of the if-else blocks would have to contain a switch with three cases: xml, json and csv
- And worst of all: we would have to change existing code, code that is tried and tested and that is not impacted by the change of spec whatsoever, would now be at risk of breaking because of our change. If we do something wrong, it is possible that processing article.xml files suddenly no longer works, just because we had to add support for order.csv files. That doesn't make us look like very talented developers, does it?
Let's try and refactor the existing code, and see if we can avoid all of this.
Refactoring to adhere to the Open/Close principle
If we analyse the existing code, we can see that it actually has four different responsibilities (so no, it doesn't really adhere to the
S in SOLID either) before a job can be scheduled:
- Determine how the file should be parsed
- Parse the file into a domain object
- Create the correct JobData with that domain object
- Determine which job should be scheduled
So step one would be to isolate these four. Let's introduce four new abstractions:
- IParserResolver to determine how the file should be parsed
- IParser to do the actual parsing
- IJobDataFactory to create the correct JobData
- IJobTypeResolver to determine which job should be scheduled.
Using these four abstractions, we could reduce the Process method to this:
public class SolidFileProcessor : IFileProcessor
{
private readonly IJobScheduler _jobScheduler;
private readonly IParserResolver _parserResolver;
private readonly IJobDataFactory _jobDataFactory;
private readonly IJobTypeResolver _jobTypeResolver;
public SolidFileProcessor(IJobScheduler jobScheduler, IParserResolver parserResolver, IJobDataFactory jobDataFactory, IJobTypeResolver jobTypeResolver)
{
_jobScheduler = jobScheduler;
_parserResolver = parserResolver;
_jobDataFactory = jobDataFactory;
_jobTypeResolver = jobTypeResolver;
}
public void Process(string filename)
{
var contents = new StreamReader(filename);
var parser = _parserResolver.Resolve(filename);
var parsed = parser.Parse(contents);
var jobData = _jobDataFactory.CreateJobData(parsed);
var jobType = _jobTypeResolver.GetJobTypeForJobData(jobData);
_jobScheduler.Schedule(jobType, jobData);
}
}
Now we can unit test this method and make sure it works.
Whenever new types of data or new file formats are added to the spec, no changes will be needed to this method! It is now closed for modification, yet it remains open for extension by providing different implementations of the abstractions we just introduced.
We still need implementations for our four abstractions to remain functional of course:
public class DefaultParserResolver : IParserResolver
{
public IParser Resolve(string filename)
{
if (Path.GetFileNameWithoutExtension(filename) == "article")
{
switch (Path.GetExtension(filename))
{
case ".xml":
return new XmlArticleParser();
case ".json":
return new JsonArticleParser();
default:
throw new ArgumentException("Cannot parse file");
}
}
else if (Path.GetFileNameWithoutExtension(filename) == "customer")
{
switch (Path.GetExtension(filename))
{
case ".xml":
return new XmlCustomerParser();
case ".json":
return new JsonCustomerParser();
default:
throw new ArgumentException("Cannot parse file");
}
}
else
throw new ArgumentException("Cannot parse file");
}
}
public class JsonArticleParser : IParser
{
public object Parse(TextReader contents)
{
var contentsAsString = contents.ReadToEnd();
var parsed = JsonConvert.DeserializeObject<Article>(contentsAsString);
if (parsed == null)
throw new ArgumentException("Cannot parse file");
return parsed;
}
}
public class JsonCustomerParser : IParser
{
public object Parse(TextReader contents)
{
var contentsAsString = contents.ReadToEnd();
var parsed = JsonConvert.DeserializeObject<Customer>(contentsAsString);
if (parsed == null)
throw new ArgumentException("Cannot parse file");
return parsed;
}
}
public class XmlArticleParser : IParser
{
public object Parse(TextReader contents)
{
var xmlSerializer = new XmlSerializer(typeof (Article));
var parsed = (Article) xmlSerializer.Deserialize(contents);
if (parsed == null)
throw new ArgumentException("Could not parse file");
return parsed;
}
}
public class XmlCustomerParser : IParser
{
public object Parse(TextReader contents)
{
var xmlSerializer = new XmlSerializer(typeof (Customer));
var parsed = (Customer) xmlSerializer.Deserialize(contents);
if (parsed == null)
throw new ArgumentException("Could not parse file");
return parsed;
}
}
public class DefaultJobDataFactory : IJobDataFactory
{
public JobData CreateJobData(object parsed)
{
if (parsed.GetType() == typeof (Article))
return new ProcessArticleJobData {Article = (Article) parsed};
if (parsed.GetType() == typeof (Customer))
return new ProcessCustomerJobData {Customer = (Customer) parsed};
throw new ArgumentException("Failed to create jobdata");
}
}
public class DefaultJobTypeResolver : IJobTypeResolver
{
public Type GetJobTypeForJobData(JobData jobData)
{
if (jobData is ProcessArticleJobData)
return typeof (IProcessArticleJob);
if (jobData is ProcessCustomerJobData)
return typeof (IProcessCustomerJob);
throw new ArgumentException("Cannot determine jobtype");
}
}
The parsing of the files is completely Open/Close now! New data types or new file formats just means adding different parsers. We will never have to change an existing parser unless the specification of exactly that data type and exactly that file format changes, in which case it is normal that we have to make modifications.
So our code has definitely already improved, but we're not out of the woods yet. There are still some issues.
The other three abstractions still do not adhere to the Open/Close principle. If we are to implement the new specs, we will still have to make modifications in the existing code of DefaultParserResolver, DefaultJobDataFactory and DefaultJobTypeResolver.
Let's see how we can refactor this further. We'll start with the ParserResolver.
If we want abstract code, we need to move the knowledge of which parser can parse which file. We want to make sure that the way this is determined for each data type and file format only changes when the specification of the file that's currently being processed changes, so it's quite clear that this is coupled to the parsers. We can perfectly allow the parser to contain the knowledge of not only how it can parse a file, but also if it can parse a certain file. We can thus achieve our goal by extending the IParser interface with a CanParse method. The DefaultFileParserResolver will then need a registry of all available parsers and scan that in order to retrieve the right one.
If we also adhere to the
D in SOLID, then we will normally have some sort of IoC Container (Like
Castle.Windsor, or
Autofac) that can inject all available parsers this for us.
The reworked parsers and parserresolver would then look like this:
public class DefaultParserResolver : IParserResolver
{
private readonly IList<IParser> _parsers;
public DefaultParserResolver(IList<IParser> parsers)
{
_parsers = parsers;
}
public IParser Resolve(string filename)
{
var parser = _parsers.FirstOrDefault(x => x.CanParse(filename));
if (parser == null)
throw new ArgumentException("Cannot resolve parser for file");
return parser;
}
}
Next up is the DefaultJobDataFactory. We need to solve two problems here:
- We need to know which type of JobData to create and
- We need to actually create it.
It is immediately clear that these two requirements are coupled. An entity that knows how to create a specific JobData instance will also know the source data type which the JobData holds! We should introduce yet another abstraction: IJobDataBuilder and store that logic in there. Our DefaultJobDataFactory will then contain a registry of IJobDataBuilders onto which it works. Similar to how the DefaultParserResolver holds a registry of IParsers
public interface IJobDataBuilder
{
Type GetSourceDataType();
JobData CreateJobData(object sourceData);
}
public class ProcessArticleJobDataBuilder : IJobDataBuilder
{
public Type GetSourceDataType()
{
return typeof(Article);
}
public JobData CreateJobData(object sourceData)
{
if (sourceData.GetType() != GetSourceDataType())
throw new ArgumentException(string.Format("Cannot create JobData from object of type {0}", sourceData.GetType().Name));
return new ProcessArticleJobData { Article = (Article)sourceData };
}
}
public class ProcessCustomerJobDataBuilder : IJobDataBuilder
{
public Type GetSourceDataType()
{
return typeof(Customer);
}
public JobData CreateJobData(object sourceData)
{
if (sourceData.GetType() != GetSourceDataType())
throw new ArgumentException(string.Format("Cannot create JobData from object of type {0}", sourceData.GetType().Name));
return new ProcessCustomerJobData { Customer = (Customer)sourceData };
}
}
public class DefaultJobDataFactory : IJobDataFactory
{
private readonly IList<IJobDataBuilder> _jobDataTypes;
public DefaultJobDataFactory(IList<IJobDataBuilder> jobDataTypes)
{
_jobDataTypes = jobDataTypes;
}
public JobData CreateJobData(object parsed)
{
var datatype = _jobDataTypes.FirstOrDefault(x => x.GetSourceDataType() == parsed.GetType());
if (datatype == null)
throw new ArgumentException("Failed to create jobdata");
return datatype.CreateJobData(parsed);
}
}
Almost there, only the DefaultJobTypeResolver is left! By now you will probably see a pattern emerging. We need to add a layer of abstraction and delegate the knowledge of which JobType can process which JobData to a separate entity. At first, it looks like we could take the exact same approach as with the parsers and just add a CanWorkWithJobData method to the Type class. but Type is a built-in .NET class, so we can't really modify that, can we.
There are two possible solutions to this problem:
We could have each job derive from an IJob interface and put the CanWorkWithJobData on that interface. Each job implementation would then be responsible for indicating the type of JobData it can work with.
Or, we could take the same approach as with the DefaultJobDataFactory and put this knowledge in a separate class, similar to the JobDataBuilders. But since we're providing metadata about a job, rather than actually building something, we'll name the interface IJobTypeMetaData for clarity.
public interface IJobTypeMetaData
{
bool CanWorkWithJobData(JobData jobData);
Type GetJobType();
}
public class ProcessArticleJobTypeMetaData : IJobTypeMetaData
{
public bool CanWorkWithJobData(JobData jobData)
{
return jobData.GetType() == typeof(ProcessArticleJobData);
}
public Type GetJobType()
{
return typeof(IProcessArticleJob);
}
}
public class ProcessCustomerJobTypeMetaData : IJobTypeMetaData
{
public bool CanWorkWithJobData(JobData jobData)
{
return jobData.GetType() == typeof(ProcessCustomerJobData);
}
public Type GetJobType()
{
return typeof(IProcessCustomerJob);
}
}
public class DefaultJobTypeResolver : IJobTypeResolver
{
private readonly IList<IJobTypeMetaData> _jobs;
public DefaultJobTypeResolver(IList<IJobTypeMetaData> jobs)
{
_jobs = jobs;
}
public Type GetJobTypeForJobData(JobData jobData)
{
var taJob = _jobs.FirstOrDefault(x => x.CanWorkWithJobData(jobData));
if (taJob == null)
throw new ArgumentException("Cannot determine jobtype");
return taJob.GetJobType();
}
}
et voila, each and every part of our solution is now fully adherent to the Open/Close principle.
Conclusion
Like everything, the solution we have now has some advantages and unfortunately also some disadvantages. The most obvious advantage is that our code will be able to cope with changes in specifications much more robustly: A new file format? Just add a new Parser implementation. A new data type? Just add a new Parser, JobDataBuilder and JobTypeMetaData. That's all there will be to it. Never ever will we have to touch or even come near to any code that processes an article.xml file if we want to add support for an orders.csv file. That gives us, as developers, a huge increase in peace of mind. We don't have to worry that we'll break existing functionality, because we will always be adding new code in new classes, instead of modifying existing ones. That is the heart of Open for extension, closed for modification.
Another advantage is that our code has become much more testable. each of our classes is very short and has very limited responsabilities. It will be easier to test all possible paths without the need for a huge
arrange part for each test.
Some of the disadvantages of this approach include a potential class explosion: A new data type means three new classes. A new file format means a new class per existing data type. We now have two data types and two file formats, which results in eight classes (CustomerXmlParser, CustomerJsonParser, ArticleXmlParser, ArticleJsonParser, ProcessCustomerJobDataBuilder, ProcessArticleJobDataBuilder, ProcessCustomerJobMetaData and ProcessArticleJobMetaData). Our new spec contains five data types and three file formats. That will result in twentyfive classes.
Another disadvantage is the performance. We do a lot of look-ups and that's just not good for performance.
The original monolithic implementation had O(1) runtime (constant) whereas our new implementation now has a O(n) runtime (linear) depending on the number of data types and file formats.
We also lose type-safety which the compiler would normally provide us. This last one can actually easily be regained by using generics (thanks to my colleague Wim Vergouwe for pointing that out ;-)), but I'll save that approach for another post.