Working out loud

Introducing Real-Time SIC

In this blog we introduce our new system called Real-Time SIC and show you how our Real-Time SIC system will transform industry classification, enhancing economic analysis and business data accuracy.  

Issue with SIC codes 

Standard Industrial Classification (SIC) codes, which are used to categorise businesses by their primary activities, have been criticised for their vagueness, inaccuracies, and misleading classifications. The key issues include; 

SIC code assignment

SIC codes are often chosen out of convenience and a company can choose up to 4 SIC codes but the issue is there are no enforcements to make sure they are accurate, and this often results in companies selecting codes that don’t accurately describe what they do and there are no penalties for getting it wrong.

For example, companies like Facebook, Google UK Limited being classified under the SIC code other business support services n.e.c (82990) 

Google’s headquarters in Silicon Valley

Vagueness of ‘Other’ categories

The ‘Other’ categories in SIC codes are frequently overused and lack specificity, making it challenging to analyse data accurately. For instance, big companies like Amazon, and Facebook are classified under ‘other’ business support services. You can read more about it here:  SIC Codes are flawed as a basis for industrial classification (thedatacity.com) 

These inaccuracies often distort economic analysis like ‘growing of rice’ having a significant rise in the number of UK companies post pandemic even though you can’t grow rice in the UK. 

How can The Data City solve this issue? 

To accurately assign a company to its most appropriate SIC code many companies need to be reclassified. The Data City will use its machine learning platform to update the classifications of companies into their most appropriate SIC codes based on what they say they do on their website.  We call this new SIC classification solution, Real-Time SIC (RSIC). 

While the concept of using machine learning for SIC reclassification is being tested by ONS Classify AI, The Data City’s approach is unique. We will be using the comprehensive web text of individual companies in the machine learning process to reclassify them into the correct SIC codes. 

To test this method, we selected 20 different five-digit SIC codes from various sectors and retrieved their definitions. We then developed a machine learning list for each SIC code by training the algorithm with examples of companies already correctly classified under those codes or by searching the platform using keywords.

You can learn more about how we built these machine learning lists here 

To ensure the RSIC’s accuracy, we conducted a manual quality assurance process. This involved checking the URLs and the accuracy of the machine learning results for 30 randomly chosen companies from each SIC code. 

The outcome is the RSIC, which not only updates outdated SIC codes but also utilises the power of machine learning for more precise classification. 

Examples of RSIC that shows and validates the process 

We have illustrated a few examples that address the issues we had previously mentioned and how our RSIC system can address it better.  

The following is the SIC section distribution of companies within the RSIC list for hairdressing and other beauty treatment (96020)  

Aesthetics (Leicester) Limited, a hair salon has been classified into ‘Other information services’ – 63990 whereas using RSIC we were able to classify this back into the hairdressing SIC code.

This process helps us bring any hairdressing companies misplaced into other irrelevant RTICs into where they should be. 

Example 2: 

Immersive Reality  a platform that specialises in creating interactive, multi-sensory learning environments using advanced laser projection, gesture control, and sound technology are currently classified as ‘Other business support service activities’ (82990) but with RSIC they are put into ‘Ready-made interactive leisure and entertainment software development’ (62011)

Immersive Reality is a platform that specialises in creating interactive, multi-sensory learning environments

Why RSICs and not RTICs?

RTICs have always been the best alternative for mapping emerging sectors where there is no relevant SIC classification by involving industry experts in defining sectors more precisely. However, SIC codes still remain a widely used and established classification system across various organisations, including ONS. 

RSICs aim to address the limitations of SIC code by assigning them based on what a company says it does on its website and tracking any changes over time. It provides a balanced approach by combining traditional classification with machine learning. While RSICs cannot replace RTICs, they offer a refined approach to SIC codes by improving its accuracy.  

Interested in our approach to SIC codes and want to find out more abour RSICs? Feel free to get in touch

Want to get hands on with our data and platform? Why not sign up for a free trial today.  

About the author