Taxonomies are everywhere in Information Management, but are hardly ever formally acknowledged and managed. Most code tables (lookup tables) contain taxonomies – perhaps even the overwhelming majority. Yet we seem to have no consistent guidelines on how to design the taxonomies that will be used to populate these tables. Also poorly understood is the difference between those who create taxonomies and those who have to actually use them.
Malcolm Chisholm, President, Data Millennium; follow Malcolm on Twitter @MDChisholm
Malcolm will be presenting the course ‘Successful Implementation of a Master Data Management Programme‘ face-to-face and via live streaming 16-17 November 2020, London
Anyone confronted with filling in the average government form will likely come across a question where a choice must be selected from a list of alternatives that are difficult to interpret. No doubt the choices were intelligible to the designers of the form, but that is no guarantee that they can be understood by anyone obliged to fill in the form.
It is unfortunate that little is taught about taxonomy, or many other aspects of traditional logic, in the educational systems of the West. However, it is possible to find some interesting articles scattered across the literature. One of them is by Jorge Luis Borges, entitled The Analytical Language of John Wilkins. In this piece Borges claims to produce a taxonomy from a Chinese dictionary called The Celestial Emporium of Benevolent Knowledge. Of course, this is a hoax – no such dictionary ever existed. However, the taxonomy that Borges produces is rather interesting. It purports to classify animals as follows:
|1||Those that belong to the Emperor|
|2||Those that are embalmed|
|3||Those that are tame|
|8||Those included in this classification|
|9||Those that shake as if they were mad|
|10||Those that cannot be counted|
|11||Those drawn with a fine camel-haired brush|
|13||Those that have just broken a jar|
|14||Those that from a distance resemble flies|
So while the health crisis is acutely important today I would like to try to think about the economic crisis.
Please not that the original essay was in Spanish and several variants of the taxonomy exist in English. The above version is my translation from the Spanish.
What’s Wrong with This?
Borges’ taxonomy does seem rather exotic, and has sparked a lot discussion in academic circles. Much of this centers on comparisons of Western and Non-Western thought. For instance, Michel Foucault, is quoted by Wikipedia as describing his reaction to the taxonomy as:
“…that shattered, as I read the passage, all the familiar landmarks of thought—our thought, the thought that bears the stamp of our age and our geography…”
I disagree. If the academics who made such comments about Borges’ taxonomy had bothered to look into a few of the code tables implemented in databases right here in the USA they would have discovered examples of taxonomies at least as bizarre. And they would have been told by the creators of these taxonomies just how logical and necessary they were. I remember one code table I reviewed that was for customer credit level, and had entries for “Gold”, “Silver”, “Bronze”, “Employee”, and “Suspended”. How could “Employee” and “Suspended” be in such a table? But they were, and program logic was built around them.
Even in publicly available taxonomies we can find issues that echo what we see in Borges. For instance, here is a classification of Financial Instrument extracted from the venerable International Monetary Fund (http://www.imf.org/external/np/sta/bop/pdf/chap5.pdf):
|5||Currency and Deposits|
|7||Other Debt Instruments|
|11||Net Equity in Insurance Technical Reserves and Pension Funds|
|12||Financial Derivatives and Employee Stock Options|
In all wide-scale crises we each have our role to play, and it is a very fair question to ask what Data Governance can do to help. Now in answering this question, we are not here to think about how Data Governance can profit from the situation, or even how Data Governance units can find ways to make themselves seem relevant so they can survive the economic crisis and then get back to business as usual after this is all over. Selfish thinking is not going to contribute anything to help the organizations we work for or the communities we live in.
What is a Taxonomy?
If we are to criticize taxonomies – including what Borges produced – we had better know what a taxonomy is. This is difficult to find out because, as noted above, the subject that deals with taxonomies – traditional logic – is hardly taught anymore in the West. However, if we look into texts of traditional logic, we find that taxonomies are formed by breaking a generic concept (a “genus”) into more specific concepts (“species”) that compose it. There are some rules about how this should be done. For instance, “the basis of division must remain constant”, and “the species must exhaust the genus”. This is not the place to get into a treatise on traditional taxonomies, but rather to note that there is a literature about how they should be created and governed.
However, besides the traditional top-down method of forming taxonomies (properly called “logical division”), there is also the bottom-up process of classification. This can be used to group any number of objects according to any need we have. For instance, the ten things I would take out of my house if it were on fire have no commonality among them as such (e.g. the cat, my children’s photographs, my PC, etc.) other than my purpose to prevent them being destroyed by fire. In classification we are not bound by the same rules as logical division. Rather there is some common way in which we deal with the objects in a classification.
Now, Borges’ taxonomy does not make any sense from the top-down perspective. However, it might be more allowable as a classification if some purpose could be found for it. And perhaps that is what Borges is hinting at – not that the universe is so constituted that animals fall into his classification, but that some people have a reason (admittedly unknown) for grouping animals in this way. If we could find the reason, we would understand the taxonomy.
Unfortunately, in information management, taxonomies often seem to be little more than grab bags of concepts thrown together, neither dividing up a general concept, nor aggregating distinct concepts for a specific purpose. Hopefully, as semantics progresses we will see a lot more clarity brought to bear in this area.
Malcolm Chisholm has over 25 years experience in data management, and has worked in a variety of sectors, including finance, insurance, manufacturing, government, defense and intelligence, pharmaceuticals, and retail. He is a consultant specialising in data governance, master/reference data management, metadata engineering, business rules management/execution, data architecture and design, and the organisation of Enterprise Information Management. Malcolm is a well-known presenter at conferences in the US and Europe, writes columns in trade journals, and has authored the books: Managing Reference Data in Enterprise Databases; How to Build a Business Rules Engine; and Definitions in Information Management. In 2011, Malcolm was presented with the prestigious DAMA International Professional Achievement Award for contributions to Master Data Management. He holds an M.A. from the University of Oxford and a Ph.D. from the University of Bristol, and can be contacted at [email protected]
Copyright Malcolm Chisholm, President, Data Millennium