Classifying companies for the first time, in real-time

Working in consultation with the UK Department for Business, Energy & Industrial Strategy (BEIS) and the Knowledge Transfer Network (KTN) , The Data City is creating a new standard that we call Real-Time Industry Classification. Using ground-breaking technology and the input of experts from a range of specialisms and industries – we are creating taxonomies for a range of new and emergent business sectors which increase the volume of accurately classified organisations (compared to current methods) by up to 95%.

Our Data City Explorer is an automatic classifying tool which uses machine learning to facilitate the analysis of economic sectors through company list building. In other words, it is an instrument which identifies and maps organisations working in a field.

Its central methodological approach is supervised machine learning, integrating the best aspects of traditional methods and the newest technological developments. Supervised machine learning technology uses training sets defined by the user to refine algorithms that mine and enquire company data.

Therefore, supervised machine learning combines the benefits of manual search strategies, as far as there is always a user that controls the searching process, with automated technology. With Data City Explorer, the user creates and refines training sets based on keywords used by an algorithm to obtain a comprehensive list of relevant companies.

The first step is to decide what data is needed. This is an important decision that informs the creation of training sets and the overall size and quality of business and sector lists.

First: selecting keywords

Second: adding relevant companies to the training set

Confirmation of companies used to build the list

The above images showcase the initial steps to build a company list. First, the user introduces relevant keywords the software uses to identify companies. Secondly, the user selects the companies relevant to their approach. Then, The Data Explorer creates a list.

Finally, the ability to filter results allows us to target specific keywords, as well as exclude others, giving us the incredibly precise ability to target and select the exact organisations we want. It is this ability that allows us identify companies within specific taxonomies or industry categories, but also to build incredibly accurate classification lists based on the self-declaration of each company website identified by the classifier.

View of the first list

Companies included/excluded in the training set

Score created by the algorithm. Communicated how well the company matches the keywords and training set.

This methodology, applied to 20 fast moving, high-priority sectors, means that we can identify almost twice as many organisations compared to current methods and estimates, as well as a range of more granular data sets (including sector specialisms, market focus, financial data and more) for better sector analysis and engagement.

These sectors include:

  • Cyber Security
  • Artificial Intelligence and Machine Learning
  • Targeted Therapeutics
  • Photonics
  • Augmented Reality and Immersive Content
  • Advanced Manufacturing and Advanced Materials
  • Autonomous Vehicles
  • Robotics and Automation
  • And many more.

To find out more about the platform, the process or the value of this incredible tool to your organisation; we are hosting a free webinar as part of the Leeds Digital Festival where we will be showcasing the technology and offering 20 licences (free of charge) to Local Authorities and LEP’s in support of their work around their Local Industrial Strategies.

Find out how to reserve a place.

About the author