Pre-order the data city's fintech census
Working out loud

Creating 200 new industrial classifications. An interview with Alex Craven.

The Data City has published 200 new industrial classifications.

I recently caught up with Alex Craven, one of our founders, to hear what the team has been up to over the summer.

Data City Founder Alex Craven

Tom: So Alex, how’s it been going? A busy summer!

Alex: Ha, yes we’ve had a really busy time. Our goal was to map new and emerging economic sectors. We’ve now created 200 brand new real-time industrial classifications which are all on our website. It’s been a real team effort to get them there.

Tom: How did you do it?

Alex: Yeah, so the first step was to agree what the emerging economy is. We started by taking all of the sectors that our clients have asked us to map – people like DCMS [Department of Culture, Media and Sport], BEIS, the Knowledge Transfer Network and the Catapult Network.  We amalgamated those with other sectors that local authorities and LEPs  have asked us to review. That gave us a starting point of around forty industries, which we then split down in to what we call RTIC groupings. In the end each sector had up to  fifteen actual classifications within it – the Net Zero supply chain is one of the largest that we’ve done with sixteen. The job was to identify good sources of reference material, research, and experts and we could work with.

For each RTIC we needed to correctly identify the sector taxonomy that we should be using and the types of company that it reflected and then classify and build all of those sectors. The process here was to find really good examples of companies in the sector for the machine learning to learn from so that it could find all the other companies that operate in the same industry.

So yes, very busy and huge body of work by the team. This September, we’ve launched more than 200 individual classifications from everything from ad tech to insure tech, and clean energy which are all in the RTIC section of our website.

Tom: And these are all brand new classifications as well and they’re not covered by SIC, you’ve had to create all of them?

Alex: Yeah, that’s right. So, you know that the SIC ecosystem was last updated in 2007 so particularly the tech sector in the UK or digital sectors in the UK are really under represented in SIC. Typically they’re lumped under either ‘ICT and computing’ or ‘other business activities not elsewhere classified’. Not very useful! And obviously, you know, when you look at them, just the sheer number of niches that are now individually identifiable within digital and technology there are a lot of new classifications needed. The interesting thing is obviously the SIC ecosystem is very hierarchical, but actually a lot of these companies aren’t neatly definable in a hierarchical system, they operate in both the service manufacturing and technology sectors. Under the traditional SIC Code View, and actually part of the work that we’re doing is, is rethinking the way the Industrial Classification works, and then building system that keeps pace with the evolution of those sectors as they evolve themselves.

Tom: In effect SIC just can’t keep up with these new sectors that are being invented every single day?

Alex: No, that’s right. And I think as a team we’re increasingly of the opinion that the SIC system shouldn’t be updated. The approach itself is not fit for purpose. It’s not a matter of just producing a load of new classifications, because the first issue you’ve got is that all incorporated businesses that have been trading for a while, won’t bother updating their classification. So you’re only going to capture new companies as they form, but also companies when they form aren’t very good at actually selecting the SIC code they operate in. There’s a lot of issues with the data – we think more than 50% of businesses are incorrectly classified in SIC based on our work, which is, you know, a too many to be useful, we would argue.

Tom: And that must make policy planning or any form of analysis impossible to do accurately?

Alex: We would argue that, and it’s really evidenced in some of the sectors that we do where the traditional approach might be to pick one or two digit or a few four digit SIC codes to try and find the sector. Some of the RTICs we’ve created show that companies haven’t been able to find a code that satisfactorily describes what they do, so you end up with over 200 different SIC codes being used in just one sector. It just shows how they haven’t been able to find an SIC Code that reflects what they do so they’ve all picked all sorts of stuff, which means most of them are being missed in any analysis.

Tom: So you mentioned, real time industrial classifications. What would people see if they were to go into one of those?

Alex: So, fundamentally, we’ve tried to keep it near to the experience of SIC as possible in that there is a list of companies. So if you log into the platform and select ‘insurance tech’, which is one of the RTICs, you will see a list of all the companies that have been classified into insurance tech. You can see the taxonomy we’ve used, and you can actually see the training sets that we’ve used so you can see how that decision was arrived at. Those insights are all published into the RTIC through our platform so you can see the sorts of words that were used in the website text of those businesses which are part of the reason they were classified. You can also see all the other sectors, they’ve been classified into, so you might find insurance tech companies are also, you know, physically installing tracking devices and may have a whole Internet of Things angle to them. So they may appear in other sectors as well. You get all of that insight, and then obviously it’s all joined to the company financials information that you find a Companies House with some additional data points that we’ve added from either publicly available information and open data sets, or third party data providers.

Tom: It sounds like it’s been incredibly, incredibly busy recently. What can people expect to see from data to in the next six months?

Alex: The strategy first off, was to map the emerging economy. Our mission now is to get our classifications accepted as the normal way of looking at emerging economic sectors. So we’ve got a lot of really exciting work coming out with partners, looking at that data and starting to publish reports and content based on the insights to try and raise awareness of that.

It’s a really important body of work. The ONS (Office of National Statistics) itself estimates about 30% of GDP growth is missed, because of SIC Codes, and what we’re trying to do is shine a spotlight on the technology innovation in the UK properly so that it can be truly understood and truly supported. I think it’s critical in the UK for COVID recovery, recovering with Brexit.

I mentioned the net zero supply chain where we’ve done, I think, you know, hopefully that’s an extremely important piece of work because working out what we’ve already got here to help us decarbonize our economic activity has got to be a critical, critical piece of information you’d hope. We might get to make a real difference.

Alex Craven was interviewed in Leeds on Thursday 9th September 2021.

 

About the author