How can we uncover complex relationships through visualising relatedness? We explore many visualisation methods and find which ones are the most effective.
In November of last year, Fatima revealed that we have been looking at sector relatedness. This is a key consideration for many of our customers and is also of particular interest to us within The Data City. Understanding the relationships between sectors is essential for understanding collaboration across industries, identifying growth opportunities, mitigating potential risks, highlighting connections in supply chains, and informing policy development. More specifically, exploring the relatedness of our RTICs and their verticals could be key to opening up a never-before-seen picture of the emerging economy.
Important Notes: all of the charts are interactive meaning you can drag the elements around to aid in your reading of the data. If a chart looks wrong it might be broken – try reloading the page.
Relationships within The Data City data
Within our database, we hold rich and valuable information regarding such company and sector relationships. But as humans, we just want to see these relationships. We want to visualise the connections and get a feel for the relatedness of sectors in an easily digestible way.
However, the sheer quantity and complexity of the data that we hold makes the extraction of this information (and the ability to present it in a palatable format from which it is possible to identify the valuable insights outlined above) a unique and interesting challenge.
Here we will outline our workings and document our thinking in developing an approach for best considering sector relatedness which not only offers a clear and insightful visualisation of the relatedness (demonstrated using our RTICs), but also facilitates further, crucially targeted analysis, to interrogate and derive learnings from the relationships identified.
The Tech Sector as a Case Study
Let’s consider a case-study of 8 technology RTICs: AdTech, AgriTech, AI, CleanTech, EdTech, FinTech, FoodTech, and MedTech. These RTICs span 74 verticals and 19,181 distinct companies.
But how do they relate to each other? Does AI overlap the most with FinTech? Is there any relationship between AdTech and EdTech? What can we learn from the relationship between AgriTech and MedTech?
Relationship pairs
We can pull out counts of the overlaps to gain an initial understanding of relatedness.
AdTech | AgriTech | Artificial Intelligence | EdTech | CleanTech | FinTech | FoodTech | MedTech | |
AdTech | 772 | 2 | 28 | 1 | 1 | 13 | 2 | 10 |
AgriTech | 2 | 852 | 46 | 1 | 143 | 4 | 116 | 19 |
Artificial Intelligence | 28 | 46 | 2,387 | 13 | 111 | 353 | 11 | 149 |
EdTech | 1 | 1 | 13 | 1,453 | 1 | 1 | 2 | 57 |
CleanTech | 1 | 143 | 111 | 1 | 4,454 | 2 | 21 | 15 |
FinTech | 13 | 4 | 353 | 1 | 2 | 5,002 | 32 | 2 |
FoodTech | 2 | 116 | 11 | 2 | 21 | 32 | 2,622 | 11 |
MedTech | 10 | 18 | 149 | 57 | 15 | 2 | 11 | 1,238 |
The diagonal of the table shows the overlap between an RTIC and itself as the count of companies which are solely classified as this RTIC (from this set of RTICs). The other values represent the count of companies in each overlap pair.
Now let’s visualise this information.
A chord plot seems a great place to start when visualising relationship pairs, but in this case, it is not at all clear. Although we can immediately see a chunky connection between AI and Fintech, since the majority of the RTIC does not overlap with any other, this element suffocates any other information and makes any other real insights almost impossible to pick out. If we were to remove this aspect, however, the relative size of each RTIC, which is currently represented by the size of the arc assigned to the RTIC, would become skewed, and the information could become misleading.
This is not the ideal visual solution.
A secondary approach could be to visualise the data in a network diagram, with RTICs as the nodes and the relationships between the RTICs are represented by the edges. In the example below, the thickness of the edges are weighted by the count of companies assigned to both of the connected RTICs (the overlap) and the size of the nodes represent the size of the RTIC represented by the node.
This is a much clearer visual representation of the data and the pairwise connections, which maintains the relative size of each RTIC (as reflected in the size of each node). This presents a more instant picture of the relatedness of this set of RTICs. It is clear from this diagram, for example, that there are overlaps between all RTIC pairs.
But we’re still not 100% satisfied that we are visualising this information at its fullest potential. For example, companies classified as more than 2 of this set of RTICs are not accounted for in the table, in the chord plot, or in the network diagram. None of these approaches can reveal more complex relationships. Whilst we can now see the relationship between AgriTech and MedTech, we cannot see if there is any relationship between AgriTech, MedTech and FoodTech that is worth exploring, or if there any companies working at the intersection of three, four, five, or even all of these technologies. More interestingly still, are there relationships between sets of RTICs that we might never have thought of looking into, but which might reveal themselves through a visualisation of more complex relationships? And what can we learn from any more unusual relationships that might appear?
We need to consider an alternative approach to try and answer these kinds of questions.
So let’s extend our thinking beyond pairs of RTICs.
Complex relationships
Traditionally, “complex relationships”, or relatedness representing a higher than pairwise degree of relationship (i.e., triplets, quadruplets, etc.), might best be visualised using an Upset plot. This kind of plot enables visualisations which represent intersections of sets of more than 2 and has the capacity to visually represent the complex relationships that we are keen to capture, and which is lost in the Chord plots and the Network diagrams explored, above.
This could even be ordered by RTIC, for a bit of extra clarity:
Although these do now capture the lost relationships between more than just pairs of RTICs, there is a lot of information in here, and can become quite difficult to interpret.
What’s more, besides the size of the RTICs and the overlaps, not much of the information jumps right out, it takes some rooting around to dig out the gems of insights, and in this instance, as we saw with the chord plot, the overwhelming proportion of each RTIC which does not overlap with any other contributes to the drowning out of the other, more interesting overlaps.
Whilst it is possible to customise the plots by re-scaling the overlap counts (maybe logarithmically), by re-ordering some of the information, or by adding colours to pick out groups, for example, we still feel like there is a better approach to be reached.
So instead, let’s reconsider the network approach and try something new to try and capture the relationships for more than just pairs with complex network diagrams.
We’re going to build these up systematically.
Let’s start by thinking of the problem in a different way. Let’s, instead, map the companies allocated to each of these RTICs as nodes in the network alongside the nodes representing RTICs.
NB: This is struggling to render such a large amount of edges here, but here is a snippet of what it would look like:
Instead of representing the overlaps by the weight of the edges between nodes, in this way, we have created “clouds” (or groups) of companies which are classified by two, or three, or more, RTICs, hence depicting the size of the relationships between RTIC pairs, triplets, quadruplets, etc.
This visualisation could be instantly improved by colouring the RTIC nodes a different colour to the company nodes.
With this idea in mind though, we could improve this diagram by generating a single node representing each of the clouds of company nodes which captures the size of the cloud. This would contain the same information but would be sleeker, thus easier to render and easier to visually process and interpret.
Now it feels like we are getting close to a solution. This nicely depicts the size of each RTIC with the orange nodes and the size of the overlap between RTIC pairs/triples/etc. (the count of companies shared by each unique pair/triple/etc. of RTICs) by the blue nodes. But to further increase the clarity, let’s colour pairs/triples/etc. differently:
And with that, we have something which is clear and concise and is much easier to instantly identify points of interest and offers a good guide for where to look in the underlying data for the interesting stories or RTIC relationships which have, up until now, been hidden within the mass of information.
The “So what?” and the “What next?”
This approach not only (in our opinion) offers a superior visualisation, arranging the data in a network format, in such a way, introduces the opportunity for additional benefits. As Fatima highlighted in her previous discussions, we are now able to apply network analysis to the relationships to further interrogate the relatedness of the set of RTICs (or verticals).
“This way, we can measure the density of the network, identify hierarchal structures and find points of encounter between seemingly unrelated nodes” – Fatima.
So this is getting really interesting, but you might be thinking, I could have pulled that information from the platform directly. Of course, that’s true, but working in that way, would we have thought to consider that there might be a relationship between AdTech and MedTech? And more importantly, considering such a visualisation, we can see all of the relationships at once, enabling us to compare and interrogate them at the same time, delving into previously hidden insights and finding answers to questions we might never have thought to ask.
For example, from the network diagram, above, we find that there are 7 companies classified in AI, EdTech, FinTech and MedTech. Have we ever wondered whether any companies specialise in all four sectors, and would we have dug around in the database to find out if it wasn’t highlighted to us in this way?
These companies are:
- Informatica Software Limited
- Xoriant UK Limited
- Intellectsoft Ltd
- Vhl Technical Solutions Limited
- Qburst Technologies UK Ltd
- Great Software Laboratory UK Limited
- Beyondminds (Ai) Ltd
These are each software companies delivering software across this range of technologies.
We also see that the largest RTIC relationship triplets are between FoodTech, CleanTech and AgriTech, with 63 companies classified in all of these RTICs. One example of which is Bayer Cropscience Limited, which is classified as “CleanTech: Agriculture, Forestry and Biodiversity”, “Net Zero: Agritech”, “AgriTech: AgSciences”, and “Food Tech: Agri Tech” (among other RTICs which were not in our Case Study), and which specialises in “effective crop protection products”.
Demonstrating further potential with a little MedTech Case Study
What if we considered the verticals within a single RTIC to understand the relatedness of the companies within the RTIC itself?
Let’s take MedTech, which has a total of 1,460 distinct companies classified across 7 verticals: Advanced Materials, Artificial Intelligence, Extended Reality, Imaging, Monitoring Technologies, Photonics, and Robotics. How inter-related are these verticals within the MedTech sector?
Immediately we see that the MedTech companies classified as “Extended Reality” are not classified in any other MedTech verticals.
Looking at the other verticals, there is still very limited relatedness within this RTIC.
Just 4 companies are classified in more than 2 verticals, these are classified in “Photonics”, “Imaging” and “Robotics”:
- Oncology Imaging Systems Limited
- Oncology Imaging Limited
- PTW-UK Limited
- PTW Ltd
These are specialist providers of innovative products and solutions used in diagnostic imaging, radiotherapy treatment, and medical physics applications.
Wrapping Up Relatedness
So what have we learned from this? Well, visualising relatedness is difficult. There isn’t always a perfect solution, and it can often get visually messy. Nonetheless, it’s a very worthwhile exercise that helps to unearth deeper insights you might never have considered when working straight from the platform.
Do you think we approached this problem in the right way? Are there any other visualisation methods we should consider? Get in touch! If you’d like to chat relatedness, RTICs or anything else then we’d love to hear from you.