Enterprise Architecture is about the ‘landscape’ of your organisation. A landscape of humans and IT and the behaviour of both. These landscapes have become very complex. In itself that complexity and the problems it brings has led to many efforts at standardisation and rationalisation — often seen as a key aspect of Enterprise Architecture. Not too many brands of an operating system, for instance. Not six different operating systems, but two. Not five different workflow engines, but one. Not too many different networking or computing technologies. Less is better.
Gerben Wierda, Team Coordinator Architecture & Design, APG
Gerben will be speaking at IRM UK’s Enterprise Architecture & Business Process Management Conference Europe 21-24 October 2019, London where he will be presenting a half day workshop on ‘Setting up Effective Architecture Governance Without a Formal EA Framework‘
This article was previously published here.
Even when you are standardised, an additional complexity comes from the Lifecycles of IT. Putting IT in — any IT, even standardised — doesn’t mean you are done. For software this is the most obvious: new versions are released on a regular basis. There are many reasons for new versions, a very important reason are fixes of bugs, both in terms of the primary function, but also for for instance in terms of stability or security. Platforms that are used to run applications get updated all the time with security and other fixes. And then there is a constant stream of new functionality, which in itself also means new functionality that can have bugs.
Having old IT in your landscape is generally not a good thing. Old stuff is one of the most important sources of security problems. If your landscape still contains Flash, very old Java or dotNet versions, outdated operating systems, it is very likely not that secure. Old stuff often doesn’t work well with new stuff and complicated, brittle, and costly workarounds may be required.
Companies are aware that they have to manage the systems and make sure that they are up-to-date. This is Lifecycle Management (LCM). LCM also takes place with IT hardware, by the way. Hardware also gets old. It wears. It can fail. Hardware has a lifetime of a few to maybe ten years, generally three to five. And when you replace it, you generally cannot replace it with exactly the same as it is not for sale anymore. Technology has moved on. New versions and products are forced into your landscape, even if you do not have new demands (but generally, you have).
All these products often depend on each other. A patched operating system or application framework may suddenly result in an application not working properly anymore. A new version of a platform may offer new possibilities, but also lose old ones. You may want to update your Windows server from Windows Server 2012 to Windows Server 2016, or from RedHat Linux 6 to 7, because the old version is going out of support, but will your application that runs on it still work? Or you might want or need to update an application, but the new version requires a different platform version — which you do not support yet. Add to this that most of your landscape may not be built but bought (or rented) and the suppliers are in control over their LCM and what they support (and what not). Does your license allow you to use the software for ever if need be? Will you be using that old Windows Server 2003 that is no longer updated with security patches? Is there an old application that forces you to keep using it? The result is a frothing sea of components, all at a certain level of ‘up-to-dateness’ and all in constant change. This is a major source of why it is difficult to change your IT landscape and with that your enterprise landscape.
It is hard enough to plan a change as it is, but while you are making the change, a lot of other things change as well simultaneously. The landscape is volatile. In Chess and the Art of Enterprise Architecture this is likened to playing a chess game where — while you are making a move — hundreds of other players are making moves simultaneously on a very large board with overlapping squares. And you think you can manage that with a plan leading to some sort of ‘end state’? That’s not how it works in chess. And chess is simple compared to an enterprise.
Anyway, it is no wonder that Lifecycle Management is one of the most tricky parts of maintaining a healthy architecture or landscape and that it is hard to take into account while having a lot if impact on your agility. If you want to move some application to the public cloud and it still uses dotNet version 2, you’re in trouble.
Strangely enough, architecture frameworks seem to pay little attention to what Lifecycle Management does to (managing) your landscape. If they talk about Lifecycle Management they talk about the lifecycle of your architecture artefacts, not about what lifecycle events do to your actual business landscape and how to manage that. Typical, the old-school frameworks are — if anything — pretty navel-gazing. To give one example: in some capability maturity models for old-school EA (such as FEAF) you are just doing navel-gazing in architecture artefacts until your maturity goes to the highest levels, where — finally — you get to have an actual effect and provide actual value. But I digress, as usual.
So, how can you get grip on Lifecycle Management? I’ve seen in multiple organisations that architects were managing roadmaps (now we are using Windows 7, next year we will move to Windows 10). But it was difficult to get a grip on all the items and selection of what is part of the roadmap and what not is pretty arbitrary (“Why do you mention JBoss versions but not tomcatversions or Hibernate versions? I don’t know, should I?”), so often I saw more generic rules, such as “We use two versions of anything: if the latest version is N, we will be using versions N-1 and N-2”. Or: “We will only be using software that has ‘standard support’ and only exceptionally use software in ‘extended support’”. Those generic principles generally did not solve the issue, the real world of dependencies was too complex to be governed by them. “I should update as a new version N+1 has arrived, but I can’t because…”
I have been proposing something called ‘Daylight Lifecycle Management’ to get a grip on this:
First, it is important to understand what is in your landscape is in the end your choice. There are some legal limits (e.g. if you do not have the license to go on using something, you’re legally not allowed). But for the rest: your landscape (your architecture), your choice. Do you want to keep running that unsupported old piece of soft- or hardware? Nobody stops you. The effect of vendor choices for support are (almost always) informative, not prescriptive, unless you decide that they are.
In the Daylight Lifecycle Management approach, in order to manage your own choices, for every product or system or platform or application or service you use, you define a set of your ownperiods when it can be used. For managing the use of systems and their full LCM the periods are:
Before Sunrise, the product (defined as a specific version or version range) is not to be found in your managed landscape. Not anywhere. Not in development. Not in testing. It is not there. If it is, it is not under LCM control. The only place you’ll find it is in the Lifecycle administration (see below).
At Start of Sunrise, the product/version may be in your landscape, but it is not yet in production. This is the period where people develop what is needed to put the product in production. Think of security baselines that need to be defined for a new platform, or logging, backups, etc. are being set up for a new product or version. People are being trained or recruited to use or support the new product. Nobody in the business can use it for real production work, we are in preparation mode, agile or not.
At Start of Sunshine (which is End of Sunrise, but we ignore that name) the product/version is fully in production in your landscape. During that time everyone in the organisation can use the product as it is supported by your organisation. The fact if there is a vendor that offers support for you is secondary. You decide if you want to have it in your landscape. That it is in (extended) support by a vendor or service provider, is part of the complex reasons for allowing it in your landscape.
You plan when you want to retire the product/version. Before that, there needs to be a period where everyone using the product/version must act move away from it. This period is the Sunset period. During that period it is still supported and allowed in your organisation, but new uses are not allowed, only when an exception is given. So after Start of Sunset, you (repeatedly) start to warn current users of the product/version that the product/version will be (forcefully) removed from the landscape at End of Sunset. And this cutoff is hard (but see below): after End of Sunsetany instance of the product/version will be shut down.
With production state added, it looks like this:
Now, it is customary to have these sorts of dates in organisations (though generally a specific Sunset period is not used, thi is I think new in this approach), but what generally happens in an organisation is that the reality forces you to keep things operational beyond what you would like and have planned. For instance, you want to scuttle Windows 2008, but there is that one important application that only runs on that platform and the replacement has been delayed. Or, you really have planned to update all the Oracle databases to a new version, but some important legally required change gets absolute priority in the organisation and there is a limit of what you can handle in terms of changes. This is reality. Watch out. It bites.
As said, products have vastly different lifecycles. And these change constantly. Some have the classic Support and Extended Support periods. Other work with Short Term Support (STS) and Long Term Support (LTS) versions. Recently, many systems have been moving to a constant flow, e.g. releases that are released every quarter and are supported for two years, period. 19a (for 2019 Q1), 19,b, 19c, 19d, 20a, and so forth. You make your pick in those flows what to use and when to upgrade, but the end date for support is hard. A typical old-fashioned support/extended support could be like this:
So, what happens when your plan to scuttle a product/version gets bitten by reality? What organisations generally do is consider their standards (which are updated once in a while) and use a ‘comply or explain’ approach. And if the explanation is valid, an exception is given, for a certain period. The idea is to try to get to the point that in the end nobody uses the product/version anymore and it is gone from your landscape. In some organisations there will be — at least if LCM actually is working — so many exceptions that it is hard to maintain that you have a standard.
You might argue that each exception amounts to a lie. You say you are ‘standardised’, but the exceptions prove that you are not. Your policy document, roadmap or “landing mode of operations”, may show that Windows Server 2008 is gone, but all the exceptions are not part of that idealised reality and they still are there. Think what happens by the way when your policy documents say that Windows 2008 is no longer allowed in your organisation at some point, but there are many exceptions and some auditor comes looking. It becomes messy.
That is why I argue that you should not give exceptions to End of Sunset in your organisation. Instead, if there is a pressing reason to keep something around beyond the End of Sunset date, you move the End of Sunset date itself instead. You do not give an exception to the standard, you change the standard so it fits reality. By doing that, you also make clear that what remains in that landscape is still there because you want it. Because it is policy. It still gets backed up. It still is monitored. Everything needed to keep it running, safely and according to your organisational requirements, still needs to remain in place. You cannot get rid of your mainframe support people until the mainframe is really gone, can you? You can (should) even put in a pricing mechanism. If some business owner makes you keep up a group of mainframe specialists because he or she has been unable to move away from an old application on the mainframe (technical debt), that business owner is the source of all that extra cost. Why are you as platform provider so expensive for John? Well, that is because of the cost that comes from Suzan’s technical debt. It is not for nothing that in the world of outsourcing and cloud providers you have even less choice. If they drop it, they drop it. And you must move. It is your landscape, but it is theirdecision. When the landscape and the decisions are still fully your own, you have the advantage of making more flexible decisions. The cloud has many advantages, but is not disadvantage-free. You lose some manoeuvring room.
A small exception to a standard (as it mostly is managed these days in organisations) is easily granted, but moving the End of Sunset is a lot harder. And that is as it should be. We should stop lying about our landscape. The ‘exception’ is still part of your landscape. It is still there. Acting as if it isn’t is fooling yourself and maintaining an illusion.
Aside: the philosophy behind is also illustrated by an old AI adagium: “The best model of the world is the world itself”. As argued in Mastering ArchiMate, any administration you have about (parts of) your landscape is in fact (part of) a ‘model’ of (part of) that reality. An important hygiene factor for your organisation is to have ‘models’ not contradict each other, another problem that ‘linking to that single reality’ solves. When you rely on the model and the model lies, you may have pay a hefty price at some point. Things get harder than they need to be. In AI during the 80’s-90’s at the height of the Winter of AI, when it had become clear that almost all that symbolic modelling of the first thirty years had failed, AI researchers decided that instead of trying to make ever better symbolic models, they should rely on direct observation more. Relying on a model instead of on observation is like walking in your home with your eyes closed and relying on memory. There are reasons we don’t do that much. So, if you are forced to use a model, and IT this is often the case, make sure your model at the least doesn’t lie. And if possible, do not use a model (an administration) when it is feasible to do direct observation. That discovery tool for licensed software is a lot more reliable that a spreadsheet with a human maintained administration of servers and what has been installed on them.
In a picture, moving End of Sunset looks like this:
Anyway, will End of Sunset get extended all the time? No. First, we are still in Sunset and duringSunset new instances of the product/version still do require an exception. And second, by making it visible that the end date moves, you improve the weight of the decision. Those small exceptions largely remain hidden. The move of an End of Sunset date far beyond the end of vendor support for a standard will get noticed. That someone has gotten permission to keep using it while on paper we have ended it is much harder to spot.
Now, it will happen (it does in many organisations) that some product/version is kept in operation while it has gone beyond (affordable) support, as in the example above. Apart from the cost of keeping the product/version running, we need to manage security as we will not be getting patches anymore. One way to do that is to put the not-vendor-supported product in Restrictions. Restrictions is a special separate period. After Start of Restrictions, the product might have to be isolated. Maybe behind a firewall, in a separate network segment, or in extreme cases physically. Or there are restrictions with respect to monitoring, time-of-day support, you name it. So, you can let Start of Restrictions fall together with End of Extended Support (as in this case), but there can be other reasons as well. The Restrictions period is independent from the Daylight period. In a picture:
Before the change on the End of Sunset date, the original Daylight Lifecycle Management for an item looks like this:
The yellow labels make up what you manage.
Summarising, for a product/version, it could be like this:
- Start of Sunrise: 1 Jan 2016
- Start of Sunshine: 1 Jul 2017
- Start of Sunset: 1 Jan 2021
- End of Sunset: 1 Jan 2022
- Informational: Vendor End of Support (optional, informational): 1 Jan 2023; Vendor End of Extended Support (optional, informational): 1 Jan 2024
- Start of Restrictions: 1 Jan 2023
- End of Restrictions: never (see below)
Again, the vendor support is just informational. It is we who decide if we ‘support’ it being in our landscape, based — amongst many other things, from business needs to resource availability — on vendor support. And as we said, all these vendors have wildly different support schemes. By the way, we need to make the same choices for open source products where there maybe isn’t vendor support at all (just a community forum).
Now in the above example, if End of Sunset has to move, it might be wise to let the business owners that are the cause of this pay the extra cost, I think. And if it is unavoidable that the End of Sunset goes beyond Start of Restrictions, the product goes into Restrictions. E.g. if you have an old application (technical debt) that depends on Oracle 8, then you will find the database and/or the application behind a firewall. The performance may suffer. But you have choices.
Suppose Start of Restrictions is when standard Vendor Support ends. We have set it that way, because extended support is too costly. But if we have to move End of Sunset beyond standard Vendor End of Support, we have a choice. Do we put the system in Restrictions, or do we buy Extended Support and move our Start of Restrictions date?
Note, Start of Restrictions may even be Start of Sunshine. For instance, if you add some appliance to your landscape that is managed by another party (say the people who you have hired to do some facilities work including climate control of your building, they use IT that they manage themselves, they do their own security patching (or not…), etc. Such a new addition to your networks should be separated from your other systems. That already means Start of Restrictions. Technically, you might even have End of Restrictions, for instance for a product that goes into production with security flaws but for which improvements have been announced. As soon as the improvements have been installed, Restrictions may end.
Roadmap, LCM, LMO — It Is All The Same Thing
Often Architects have been tasked with maintaining “Technology Roadmaps” and more recently “Landing Modes of Operation” (LMO).
A LMO is a description of “what is supported in terms of how we can run applications”. Where the applications end up may be called a “landing zone” (LZ). The LZ is where an application ‘lands’. LMO is just another name of which products/versions make up your standards of that LZ.
A Roadmap in fact can be seen as how a LMO develops over time. But that Technology Roadmap of the past, often only contained a few major choices, like operating system brands and versions, database brands and versions, etc.. It is kind of arbitrary what ends up in the (often strategic) Technology Roadmap and what not. The LMO’s are often somewhat more extensive and precise, but lack the dimension of time. This is a disadvantage, as it pays to know for the organisation as a whole (especially the users of IT) that while Windows 2012 is still part of LMO now, it soon will no longer be. Maybe it is better for the application owner to choose Windows 2016 now even if 2012 is technically still part of Sunshine.
The Daylight Lifecycle Management approach puts it all together. Your planning of all the product deployment choices of everything that you want to control (which also is your choice as an organisation) in terms of lifecycle is in fact LCM and LMO and Roadmap in one. The Lifecycle administration tells you what the LMO is now, but also what it will be in a few years (as far as planned). Is it too much? No, because the poor enterprise architects are not the ones who have to do all the work. Each product in the Daylight administration has a product owner. If your IT department (owner) offers tomcat 8.1-4 (minor upgrades, so when 8.4 replaces 8.3, all migrate — as with a security patch), tomcat 8.5 (major functional changes with respect to 8.1-4), and tomcat 9 application servers for applications to land on, you are having three entries in the Lifecycle administration, each with its own Start of Sunrise, Start of Sunshine, Start of Sunset, End of Sunset and Start of Restrictions. If you’re smart, you implement it in tooling with discovery and a workflow engine. You manage all the dependencies. You’re in control as much as you want to be.
Owners may at the same time be users of the products of other owners. Business owners run applications on the tomcat platform the IT product owner offers them. Every product (application, platform, etc.) ends up in that administration (though I would start bottom-up if I were you). It might be seductive to try to manage all the date and version dependencies in a tool with some algorithmic intelligence. I think that will be far too complex. The dates of the Daylight Lifecycle Management can set initially or change later, and at those times one must check if there are dependencies that have to be taken into account.
It is some work to set this up, but it will be not very hard to maintain if you make sure that your process works and the model is closely linked to reality as in: “the best model of the world is the world itself”.
You may think: “Just 5 (or 6) dates to manage for each recognisable separate item in my landscape? Is that all there is to it?” I think it is enough. Pareto and all that… Besides: In der Beschränkung zeigt sich erst der Meister.
[An earlier version of this article appeared on InfoWorld. That one uses Quarantaine for Restrictions and has no images and also the text is slightly different. Of my articles, the versions on this site are the ones I maintain if fixes are necessary.]Gerben Wierda is author of the highly rated books Chess and the Art of Enterprise Architecture (on enterprise architecture and its governance) and Mastering ArchiMate (on modelling enterprises using the ArchiMate® language), as well as several blogs (e.g. EAPJ, InfoWorld). His writings are based on his experience in the trenches. His current day job is Team Coordinator Architecture & Design at APG. Before, he (amongst other activities) was Lead Architect at APG Asset Management (which manages roughly 480 billion euro (2018) in pension assets), Lead Architect at the Judiciary (all the courts) of The Netherlands, Head of the Digital Technology Department of the Netherlands Forensic Institute, scientific staff member of the Dutch Advisory Council for Science & Technology Policy and before that held various positions in IT. He holds an M.Sc. in physics (University of Groningen) and an MBA (RSM/Erasmus Rotterdam).
Copyright Gerben Wierda, Team Coordinator Architecture & Design, APG