Friday, January 29, 2016

The D in SOLID or how thought we refactored to FE and BE servers in a single day

Warning

Although this post does touch on the SOLID principles and on dependency inversion in particular, it is in fact not really a post on software design and best practices, but rather a lessons learned kind of post. 

My goal here is to teach you that our job as a software consultant, or even as a software developer in general, is more than just knowing how to program. I'm not talking about general people's skills or even basic project management skills. These are no doubt traits that are good to have as a software developer, I even believe that understanding or having these skills will make you a better software developer in general. But ultimately, you could still hand them over to a non-technical person. Like I said: these are undoubtedly great skills for anyone, but technically, they are not required for a software developer.

So what am I talking about then? Well, sometimes we need to provide information to others, usually our PM or a client, and that information is 100% non-technical. It has, in essence, nothing to do with software, but still: we are the only ones that can supply this information accurately! 
This type of information is almost never asked for explicitly and to make matters worse: this is usually the type of information that, if you fail to provide it, may turn an otherwise healty project into a budgeting- and planning hell!

"So why doesn't anyone explicitly ask for that information then" I hear you think.
Well: because they don't know that they have to ask this 
It is our job as a developer to detect missing information in our PM or client's head and to fill that void. Seems hard? That's because it is hard but nevertheless it is an extremely important skill for every developer to have, especially if you're in the consulting business.

The main topic of this post is actually an example of where we failed to deliver this information.
Sit back, relax and learn (and laugh... a little... go ahead, I can take it)

The Problem At Hand

We had started out with the basics for a new project. You know: setting up the environment, creating the solution, configuring mvc, you know, the works. We decided to go with a standard three layer architecture.

  • Application layer was just the UI
  • Business Logic Layer contained our services, which implemented the business logic
  • Data Access Layer contained our repositories to handle database access.
After a couple of sprints, the client thought it would be a good idea to let us know that they're not really fond of single-tier applications. They insisted that we changed the current deployment architecture to use front-end and back-end servers. They wanted us to use publicly accessible front-end server which contained only UI logic. All business logic and data access and everything had to reside on a different back-end server. That back-end server had to be completely shielded from outside access. Only the front-end server was allowed to access the back-end server.

The Proposed Solution

The original design

Luckily, we had been smart from the start. We had made interfaces for each and every logic-class.
So the controller depended only on interfaces, the services depended only on interfaces, and so on
This is what the simplified design looked like.



This is of course just an example of one of the services: managing installations. As you can see, the application layer is an mvc controller: InstallationController which depends on an interface: IInstallationService. This is our business layer, implemented in the DefaultInstallationService, which once again depends on an interface for it's data access: IInstallationRepository. It's implementation, DbInstallationRepository, was then the concrete implementation of the data layer.

The reworked design

Here's how we designed the solution: We created a new project for the back-end, with the same architecture as the original project but, without the application layer. So there were no more mvc controllers and we had the services derive from ApiController. So our services, the business layer, would be accessible via WebApi. The Data layer stayed the same.


Next, we stripped everything from the original project, except the application layer.
We then made new implementations of all service interfaces. This time they were empty boxes only capable of calling a remote service over WebApi.


All we had to do was deploy both back-end and front-end, connect them up and TADAAAAA done.
Cool right? This is truly dependency inversion in action!

The estimates

We now had our design ready so the next question was: "How long will it take to make this change?" To which we happily replied: one day.... (three developers, so technically: three days)
Let that sink in for a while.

At this point I encourage you to take a look at the designs above again and read the text again.
Do you think you could do this all by yourself in three days? Be honest!
We believed we could. You know what? I still believe I could! really!

Where we went wrong

So where did we go wrong? Well, we kind of forgot to look ahead, because the change actually doesn't stop at implementing our new design. It also has huge consequences for each and every decision for the rest of the project, as well as for all estimates that were already done in the past. Why? Well consider these everyday tasks:

Calling a function in the business layer

In a regular application we just call the function, done. But now there's a lot more to it, because now your function call becomes a network call! it will take longer, so you should think about if you really want or need to do it. Also, it can fail: you'll have to handle that too. Next up is versioning, of both your domain objects and your methods, otherwise you'll need to take your entire system down until both front-end and back-end are updated.
You see, performing a simple function call is now suddenly subject to a lot of thinking and possibly concern.

Lazy loading of one-to-many relations

In a regular application, you just decide whether you want to postpone the loading of related objects until usage or not. Now suddenly, your lazy-loading is multi layered! you might want to lazily load relations in your back-end, but what about the front end? Will you also lazy-load relations? If so, you'll need to provide a separate function for it!

Caching

What will you cache and where? You'll have to think about lifetime on multiple levels. Maybe you'll even need to invalidate the back-end cache from the front-end, but then again, maybe not. Point is: you'll have to spend time thinking about these kind of things.

Read-modify-write in a single transaction

Distributed transactions over webservice calls and database access, that ought to be fun.

Conclusion

So there it is. This whole post serves, not only to be a source of information from the trenches,
but also as a warning. The SOLID principles are great to design and write good software 
and I would evangelize them avidly every day, but they are just a base, just the foundation of good software. If you use them every day, you will write better software and your software will be better prepared for crazy changes or other obstacles that may arise. But don't you thing for a minute that they are all you need to save your ass. Don't think that, because the design you have allows you to easily handle a huge change, just like ours here, that you're out of the woods. Being able to easily adapt is just the beginning, then comes the real hard work of bearing the consequences of your change throughout the rest of your implementation. Don't forget that, like we did, when you do an estimation for a design change, don't forget to think about how this design change will impact the rest of the project. Sometimes the best decision is not to do the change even if your design allows it. Luckily for us, a more experienced colleague had spotted the flaws in our estimates and made us aware of the consequences it would have on the rest of the project.

Tuesday, December 15, 2015

The O in SOLID or how to properly design a dispatcher

The Open/Close principle

As an introduction I'll quickly recap the definition of the open/close principle, the O in SOLID. (if you don't know what the SOLID principles are, take a quick look at the wikipedia article on SOLID. Learn them, use them, thank me later).

Officially the open/close principle states that: Software entities should be open for extension, but closed for modification. The idea here is that code that is tried and tested, shouldn't have to change because you want to add some functionality to it. 

For example: if your have an application that can write some data to file in xml, but some time later you also need to add the possibility to write the same data in json format. If you adhered to the open/close principle, then you should just be able extend the program without having to change the existing code that writes your data to an xml file. You could for example write some sort of mapper that maps your data to json or maybe even just a jsonwriter or something. 
If you didn't adhere to the open/close principle, well then you'll probably need to modify your existing exporter with a bunch of if/else statements and this could potentially create a waterfall of changes or worse, introduce bugs in code that previously worked perfectly. I'm sure your client won't be very happy when their xml files don't get created anymore just because they can now export to json.


The assignment

I've come op with an assignment that perfectly shows the need for this principle: A generic dispatcher. The client is going to send us data through a predefined interface. They have two types of data: Customers and Articles. They also use two different file formats: xml and json.
We can easily detect the datatype and the format because the client told us that:
  • Files containing an article will always start with "article"
  • Files containing a customer will always start with "customer"
  • Files containing xml will always have the ".xml" extension
  • Files containing json will always have the ".json" extension
Our assignment is to parse each file, convert it to a domain object and then schedule a job
that will do the actual processing. Our implementation will have an interface injected which we
have to use to schedule the different jobs. Each type of data, Customer and Article has a separate type of job that can process it. Each separate job works on a class derived from the JobData class and contains the actual data that should be processed.
So in summary:
  • Article data should be passed to an IProcessArticleJob implementation together with an ProcessArticleJobData instance that contains the actual article
  • Customer data should be passed to an IProcessCustomerJob implementation together with an ProcessCustomerJobData instance that contains the actual customer.
To simplify this, the developer responsible for writing the actual jobs has created a utility interface for us: IJobScheduler. It has a single method: void Schedule(Type jobType, JobData data).
So when we're dispatching a parsed file, we call the Schedule method and pass it the type of job we want to schedule, and we also pass it the corresponding jobdata containing the actual data.

A very simple straightforward implementation of these specs could look like this:


public class DefaultFileProcessor : IFileProcessor
    {
        private readonly IJobScheduler _jobScheduler;

        public DefaultFileProcessor(IJobScheduler jobScheduler)
        {
            _jobScheduler = jobScheduler;
        }

        public void Process(string filename)
        {
            // 1. open file
            var contents = new StreamReader(filename);

            // 2. Parse file
            object parsed = null;
            if (Path.GetFileNameWithoutExtension(filename) == "article")
            {
                switch (Path.GetExtension(filename))
                {
                    case ".xml":
                        var xmlSerializer = new XmlSerializer(typeof(Article));
                        parsed = (Article)xmlSerializer.Deserialize(contents);
                        break;

                    case ".json":
                        var contentsAsString = contents.ReadToEnd();
                        parsed = JsonConvert.DeserializeObject<Article>(contentsAsString);
                        break;

                    default:
                        throw new ArgumentException("Cannot parse file");
                }


                // 3. schedule job
                var jobData = new ProcessArticleJobData { Article = (Article)parsed };
                var jobType = typeof(IProcessArticleJob);
                _jobScheduler.Schedule(jobType, jobData);
            }
            else if (Path.GetFileNameWithoutExtension(filename) == "customer")
            {
                switch (Path.GetExtension(filename))
                {
                    case ".xml":
                        var xmlSerializer = new XmlSerializer(typeof (Customer));
                        parsed = (Customer)xmlSerializer.Deserialize(contents);
                        break;

                    case ".json":
                        var contentsAsString = contents.ReadToEnd();
                        parsed = JsonConvert.DeserializeObject<Customer>(contentsAsString);
                        break;

                    default:
                        throw new ArgumentException("Cannot parse file");
                }

                // 3. schedule job
                var jobData = new ProcessCustomerJobData { Customer = (Customer)parsed };
                var jobType = typeof(IProcessCustomerJob);
                _jobScheduler.Schedule(jobType, jobData);
            }
            else
                throw new ArgumentException("Cannot parse file");
        }
    }

Technically, this is a good implementation, really, there is, technically, nothing wrong with it. It does what it has to do! But: it is very closely coupled and it will quickly evolve into a monolithic monster as soon as some of the specs change (and the specs always change, right).

Let's say the client sends us updated specs, and suddenly, they also want us to process quotes, orders and invoices. Additionally, we also have to support the CSV file format. How would that impact our code?

  • We would have to add three more if-else blocks: one for quotes, one for orders and one for invoices.
  • Each of the if-else blocks would have to contain a switch with three cases: xml, json and csv
  • And worst of all: we would have to change existing code, code that is tried and tested and that is not impacted by the change of spec whatsoever, would now be at risk of breaking because of our change. If we do something wrong, it is possible that processing article.xml files suddenly no longer works, just because we had to add support for order.csv files. That doesn't make us look like very talented developers, does it?
Let's try and refactor the existing code, and see if we can avoid all of this.

Refactoring to adhere to the Open/Close principle

If we analyse the existing code, we can see that it actually has four different responsibilities (so no, it doesn't really adhere to the S in SOLID either) before a job can be scheduled:
  1. Determine how the file should be parsed
  2. Parse the file into a domain object
  3. Create the correct JobData with that domain object
  4. Determine which job should be scheduled
So step one would be to isolate these four. Let's introduce four new abstractions:
  • IParserResolver to determine how the file should be parsed
  • IParser to do the actual parsing
  • IJobDataFactory to create the correct JobData
  • IJobTypeResolver to determine which job should be scheduled.
Using these four abstractions, we could reduce the Process method to this:

    public class SolidFileProcessor : IFileProcessor
    {
        private readonly IJobScheduler _jobScheduler;
        private readonly IParserResolver _parserResolver;
        private readonly IJobDataFactory _jobDataFactory;
        private readonly IJobTypeResolver _jobTypeResolver;

        public SolidFileProcessor(IJobScheduler jobScheduler, IParserResolver parserResolver, IJobDataFactory jobDataFactory, IJobTypeResolver jobTypeResolver)
        {
            _jobScheduler = jobScheduler;
            _parserResolver = parserResolver;
            _jobDataFactory = jobDataFactory;
            _jobTypeResolver = jobTypeResolver;
        }

        public void Process(string filename)
        {
            var contents = new StreamReader(filename);

            var parser = _parserResolver.Resolve(filename);
            var parsed = parser.Parse(contents);
            var jobData = _jobDataFactory.CreateJobData(parsed);
            var jobType = _jobTypeResolver.GetJobTypeForJobData(jobData);

            _jobScheduler.Schedule(jobType, jobData);
        }
    }

Now we can unit test this method and make sure it works.
Whenever new types of data or new file formats are added to the spec, no changes will be needed to this method! It is now closed for modification, yet it remains open for extension by providing different implementations of the abstractions we just introduced.
We still need implementations for our four abstractions to remain functional of course:

public class DefaultParserResolver : IParserResolver
    {
        public IParser Resolve(string filename)
        {
            if (Path.GetFileNameWithoutExtension(filename) == "article")
            {
                switch (Path.GetExtension(filename))
                {
                    case ".xml":
                        return new XmlArticleParser();

                    case ".json":
                        return new JsonArticleParser();

                    default:
                        throw new ArgumentException("Cannot parse file");
                }
            }
            else if (Path.GetFileNameWithoutExtension(filename) == "customer")
            {
                switch (Path.GetExtension(filename))
                {
                    case ".xml":
                        return new XmlCustomerParser();

                    case ".json":
                        return new JsonCustomerParser();

                    default:
                        throw new ArgumentException("Cannot parse file");
                }
            }
            else
                throw new ArgumentException("Cannot parse file");
        }
    }

    public class JsonArticleParser : IParser
    {
        public object Parse(TextReader contents)
        {
            var contentsAsString = contents.ReadToEnd();
            var parsed = JsonConvert.DeserializeObject<Article>(contentsAsString);
            if (parsed == null)
                throw new ArgumentException("Cannot parse file");
            return parsed;
        }
    }

    public class JsonCustomerParser : IParser
    {
        public object Parse(TextReader contents)
        {
            var contentsAsString = contents.ReadToEnd();
            var parsed = JsonConvert.DeserializeObject<Customer>(contentsAsString);
            if (parsed == null)
                throw new ArgumentException("Cannot parse file");
            return parsed;
        }
    }

    public class XmlArticleParser : IParser
    {
        public object Parse(TextReader contents)
        {
            var xmlSerializer = new XmlSerializer(typeof (Article));
            var parsed = (Article) xmlSerializer.Deserialize(contents);
            if (parsed == null)
                throw new ArgumentException("Could not parse file");
            return parsed;
        }
    }

    public class XmlCustomerParser : IParser
    {
        public object Parse(TextReader contents)
        {
            var xmlSerializer = new XmlSerializer(typeof (Customer));
            var parsed = (Customer) xmlSerializer.Deserialize(contents);
            if (parsed == null)
                throw new ArgumentException("Could not parse file");
            return parsed;
        }
    }

    public class DefaultJobDataFactory : IJobDataFactory
    {
        public JobData CreateJobData(object parsed)
        {
            if (parsed.GetType() == typeof (Article))
                return new ProcessArticleJobData {Article = (Article) parsed};
            if (parsed.GetType() == typeof (Customer))
                return new ProcessCustomerJobData {Customer = (Customer) parsed};

            throw new ArgumentException("Failed to create jobdata");
        }
    }

    public class DefaultJobTypeResolver : IJobTypeResolver
    {
        public Type GetJobTypeForJobData(JobData jobData)
        {
            if (jobData is ProcessArticleJobData)
                return typeof (IProcessArticleJob);
            if (jobData is ProcessCustomerJobData)
                return typeof (IProcessCustomerJob);

            throw new ArgumentException("Cannot determine jobtype");
        }
    }

The parsing of the files is completely Open/Close now! New data types or new file formats just means adding different parsers. We will never have to change an existing parser unless the specification of exactly that data type and exactly that file format changes, in which case it is normal that we have to make modifications.
So our code has definitely already improved, but we're not out of the woods yet. There are still some issues.
The other three abstractions still do not adhere to the Open/Close principle. If we are to implement the new specs, we will still have to make modifications in the existing code of DefaultParserResolver, DefaultJobDataFactory and DefaultJobTypeResolver.
Let's see how we can refactor this further. We'll start with the ParserResolver.

If we want abstract code, we need to move the knowledge of which parser can parse which file. We want to make sure that the way this is determined for each data type and file format only changes when the specification of the file that's currently being processed changes, so it's quite clear that this is coupled to the parsers. We can perfectly allow the parser to contain the knowledge of not only how it can parse a file, but also if it can parse a certain file. We can thus achieve our goal by extending the IParser interface with a CanParse method. The DefaultFileParserResolver will then need a registry of all available parsers and scan that in order to retrieve the right one.
If we also adhere to the D in SOLID, then we will normally have some sort of IoC Container (Like Castle.Windsor, or Autofac) that can inject all available parsers this for us.
The reworked parsers and parserresolver would then look like this:

public class DefaultParserResolver : IParserResolver
    {
        private readonly IList<IParser> _parsers;

        public DefaultParserResolver(IList<IParser> parsers)
        {
            _parsers = parsers;
        }

        public IParser Resolve(string filename)
        {
            var parser = _parsers.FirstOrDefault(x => x.CanParse(filename));
            if (parser == null)
                throw new ArgumentException("Cannot resolve parser for file");
            return parser;
        }
    }

Next up is the DefaultJobDataFactory. We need to solve two problems here:
  1. We need to know which type of JobData to create and
  2. We need to actually create it.
It is immediately clear that these two requirements are coupled. An entity that knows how to create a specific JobData instance will also know the source data type which the JobData holds! We should introduce yet another abstraction: IJobDataBuilder and store that logic in there. Our DefaultJobDataFactory will then contain a registry of IJobDataBuilders onto which it works. Similar to how the DefaultParserResolver holds a registry of IParsers

public interface IJobDataBuilder
    {
        Type GetSourceDataType();
        JobData CreateJobData(object sourceData);
    }

    public class ProcessArticleJobDataBuilder : IJobDataBuilder
    {
        public Type GetSourceDataType()
        {
            return typeof(Article);
        }

        public JobData CreateJobData(object sourceData)
        {
            if (sourceData.GetType() != GetSourceDataType())
                throw new ArgumentException(string.Format("Cannot create JobData from object of type {0}", sourceData.GetType().Name));

            return new ProcessArticleJobData { Article = (Article)sourceData };
        }
    }

    public class ProcessCustomerJobDataBuilder : IJobDataBuilder
    {
        public Type GetSourceDataType()
        {
            return typeof(Customer);
        }

        public JobData CreateJobData(object sourceData)
        {
            if (sourceData.GetType() != GetSourceDataType())
                throw new ArgumentException(string.Format("Cannot create JobData from object of type {0}", sourceData.GetType().Name));

            return new ProcessCustomerJobData { Customer = (Customer)sourceData };
        }
    }

    public class DefaultJobDataFactory : IJobDataFactory
    {
        private readonly IList<IJobDataBuilder> _jobDataTypes;

        public DefaultJobDataFactory(IList<IJobDataBuilder> jobDataTypes)
        {
            _jobDataTypes = jobDataTypes;
        }

        public JobData CreateJobData(object parsed)
        {
            var datatype = _jobDataTypes.FirstOrDefault(x => x.GetSourceDataType() == parsed.GetType());
            if (datatype == null)
                throw new ArgumentException("Failed to create jobdata");

            return datatype.CreateJobData(parsed);
        }
    }

Almost there, only the DefaultJobTypeResolver is left! By now you will probably see a pattern emerging. We need to add a layer of abstraction and delegate the knowledge of which JobType can process which JobData to a separate entity. At first, it looks like we could take the exact same approach as with the parsers and just add a CanWorkWithJobData method to the Type class. but Type is a built-in .NET class, so we can't really modify that, can we.

There are two possible solutions to this problem:
We could have each job derive from an IJob interface and put the CanWorkWithJobData on that interface. Each job implementation would then be responsible for indicating the type of JobData it can work with.
Or, we could take the same approach as with the DefaultJobDataFactory and put this knowledge in a separate class, similar to the JobDataBuilders. But since we're providing metadata about a job, rather than actually building something, we'll name the interface IJobTypeMetaData for clarity.

Any of these two solutions will do, but since I'm an advocate of Composition over Inheritance, we'll use the second option

public interface IJobTypeMetaData
    {
        bool CanWorkWithJobData(JobData jobData);
        Type GetJobType();
    }

    public class ProcessArticleJobTypeMetaData : IJobTypeMetaData
    {
        public bool CanWorkWithJobData(JobData jobData)
        {
            return jobData.GetType() == typeof(ProcessArticleJobData);
        }

        public Type GetJobType()
        {
            return typeof(IProcessArticleJob);
        }
    }

    public class ProcessCustomerJobTypeMetaData : IJobTypeMetaData
    {
        public bool CanWorkWithJobData(JobData jobData)
        {
            return jobData.GetType() == typeof(ProcessCustomerJobData);
        }

        public Type GetJobType()
        {
            return typeof(IProcessCustomerJob);
        }
    }

    public class DefaultJobTypeResolver : IJobTypeResolver
    {
        private readonly IList<IJobTypeMetaData> _jobs;

        public DefaultJobTypeResolver(IList<IJobTypeMetaData> jobs)
        {
            _jobs = jobs;
        }

        public Type GetJobTypeForJobData(JobData jobData)
        {
            var taJob = _jobs.FirstOrDefault(x => x.CanWorkWithJobData(jobData));
            if (taJob == null)
                throw new ArgumentException("Cannot determine jobtype");

            return taJob.GetJobType();
        }
    }

et voila, each and every part of our solution is now fully adherent to the Open/Close principle.

Conclusion

Like everything, the solution we have now has some advantages and unfortunately also some disadvantages. The most obvious advantage is that our code will be able to cope with changes in specifications much more robustly: A new file format? Just add a new Parser implementation. A new data type? Just add a new Parser, JobDataBuilder and JobTypeMetaData. That's all there will be to it. Never ever will we have to touch or even come near to any code that processes an article.xml file if we want to add support for an orders.csv file. That gives us, as developers, a huge increase in peace of mind. We don't have to worry that we'll break existing functionality, because we will always be adding new code in new classes, instead of modifying existing ones. That is the heart of Open for extension, closed for modification
Another advantage is that our code has become much more testable. each of our classes is very short and has very limited responsabilities. It will be easier to test all possible paths without the need for a huge arrange part for each test.

Some of the disadvantages of this approach include a potential class explosion: A new data type means three new classes. A new file format means a new class per existing data type. We now have two data types and two file formats, which results in eight classes (CustomerXmlParser, CustomerJsonParser, ArticleXmlParser, ArticleJsonParser, ProcessCustomerJobDataBuilder, ProcessArticleJobDataBuilder, ProcessCustomerJobMetaData and ProcessArticleJobMetaData). Our new spec contains five data types and three file formats. That will result in twentyfive classes.
Another disadvantage is the performance. We do a lot of look-ups and that's just not good for performance. 
The original monolithic implementation had O(1) runtime (constant) whereas our new implementation now has a O(n) runtime (linear) depending on the number of data types and file formats.

We also lose type-safety which the compiler would normally provide us. This last one can actually easily be regained by using generics (thanks to my colleague Wim Vergouwe for pointing that out ;-)), but I'll save that approach for another post.