We are all about text data at The Data City. We use text to classify companies into our Real-Time Industrial Classifications (RTICs), the heart of our work.
But it does not stop there. We also analyse the text of company websites to find prevalent keywords and key phrases for groups of companies in comparison to the “average UK company”.
We are the only ones offering this type of insight, and we want to tell you about it. In this article we explain what it means and we give an example of further analysis you can apply to the output data.
Let’s take a look at the Green Economy.
To do that, we selected a series of RTICs that represent this space: Cleantech, Energy Generation (excluding fossil fuels and nuclear), Energy Storage (excluding UPS) and the Net Zero supply chain. We got 28,455 unique companies with a website match.
Our platform analyses the website text of these companies and finds over-represented keywords across these companies’ websites. Let’s see how this looks and explain it in more detail.
The bar chart above represents the first 15 results of the total output of 147 keywords. Hover over the bars to view the keywords.
The data can be read in the following way: the companies in our Green Economy list are 1888 times more likely to use the key-phrase “vehicle to grid” than the average UK company.
Thanks to this data, we can find dominant practices, technologies and/or narratives across the companies in a list.
Hover over the chart to find out all the different terms! You can get this data directly from the ANALYSE section of our platform for any group of companies.
Using keywords to discover topics
It is important to note that we identify quite a lot of keywords. In this case, 147.
While, from a quick look at the list, we can find repeated themes, like energy generation or renewables; we can apply further analysis to find the main topics across our 147 keywords.
Topic analysis is a method to dissect and analyse text data and uncover the main themes and subjects.
Although this is regularly applied to larger datasets, we found that it can be useful to understand the common themes across these 147 terms.
We found 5 main topics:
- Topic 1 references carbon capture technologies
- Topic 2 points towards energy management
- Topic 3 suggests a focus on climate change and its effects on the land (for instance, the prevalence of the term “flood”)
- Topic 4 focuses on clean power (interestingly showing a high over-representation of keywords related to hydropower, such as “tidal” and “wave”)
- Topic 5 shows a relationship with other sectors (like waste management, finance or smart buildings).
The qualitative analysis of these terms can bring us to similar conclusions.
However, in research it is always important to triangulate findings by applying different methodologies. This way, we can be confident in the results.
In this case, our quantitative analysis of the text data has yielded results that support the qualitative interpretation of the sector keyword enrichment output.
Text data is fundamental for The Data City’s technology and insights. It has allowed us to create the methods to classify companies into RTICs and it also gives us more insight about what companies do.
If you are interested in the type of insights we get from text data or you have research questions for which topic analysis can be relevant, we’d love to hear about it.
Want to dig deeper into the UK’s Green Economy? Sign up for a free trial to see our data in action.