Reports

Clarifying terms we use

At The Data City we have to be specific with our terminology due to the speed at which we work. Fortunately, we can make quick decisions when we need to adapt to changing the platform or speaking to our clients; it’s a great benefit of being an agile team at a start-up.

As we’ve grown we’ve realised we need to step back and help our clients understand what we’re talking about. We use terms like RTIC, Taxonomy and Sector all the time but we forget these concepts can be tough for our users.

Terms

Real-Time Industrial Classification (RTIC)

We’ve got loads on our website on what an RTIC is (#1, #2) but essentially it’s an industrial classification made up of industry verticals. We use our Taxonomy based process to build them and we verify our methods/results using sector experts. We built our RTICs and they are available to all of our users on the platform. Each RTIC comes with a definition and Taxonomy to ensure they go through the same process. They are built in real time hence the name!

Taxonomy

Taxonomies are usually used in naming, describing and classifying organisms, we’ve found it’s the perfect way to build a process for defining Industrial Classifications. Fatima has written some great stuff on the approach we used for CleanTech and Gaming which offers some description on a Taxonomy. Using an example like the one below can often help clarify what we mean. This is a subset of our Net Zero (RTIC) taxonomy:

General KeywordsAgriTechBuilding Technologies
DefinitionCompanies developing technologies and providing services transforming dominant/traditional agricultural practices.Companies providing technology and services for increased energy efficiency in buildings
Example1http://levitycropscience.com/http://www.smartbuildingfocus.co.uk/
Example2http://emeraldresearchltd.com/https://www.shields.energy/
Keywordsgreen 
“agritech” OR “agri tech”
 NEAR(“smart” “sustainable” “construction”,8)
sustainable“precision farm”*NEAR(“sustainable” “Building”,5)
renewable“precision agr”*NEAR(“sustainable” “iot”,5)
“net zero”“soil science”NEAR(“low carbon” “construction”,5)
“low carbon”“crop science”“insulation”
cleanNEAR(“genet”* “crop”*, 5)“energy efficiency”
climate“harvest tech”*“retrofit”
 

In the table above, we have 2 industry verticals being represented for Net Zero (out of 16), these are: Agritech and Building Technologies. We generally have some core keywords which are applicable to the whole sector then some keywords for each industry vertical. Each industry vertical also includes at least two best representative examples, a definition and a title. The keywords help to identify companies to add to the training set. A company with a website which includes a few of these keywords is a strong indication of a company to be included in the training set. Companies in the industry vertical are not limited to these keywords nor do they need to use all the keywords on their site.

Industry Vertical

Our RTICs are made up of industry verticals, these are the columns in our Taxonomies. An industry vertical is not limited to being in one RTIC, for example one of our industry verticals is titled “Artificial Intelligence: Data Analysis”. Currently, it exists within our AI RTIC but it could just as easily exist within a Data Analysis RTIC. We build our RTICs using a bottom up approach which allows our industry verticals to be able to form relationships (aggregations) with similar industry verticals thus forming an RTIC. Within an RTIC a company may exist in multiple industry verticals because different parts of the business operate in different industries, this is especially prevalent in the emerging economy.

Custom Industrial Classification (CIC)

We use the same process we use for our RTICs but these are not released to every customer registered on the platform. These are custom built using our Taxonomy process but this is delivered to a specific set of users. We encourage all of our clients to open-source their CICs to all of our users as we are strong advocates for working in the open however we understand our clients may want to protect their classifications.

A machine learning list (built using The Data City’s platform)

Similar to an industry vertical but is not part of an RTIC or CIC. We’ve found that using a Taxonomy (title, description, best representative examples, keywords) will still produce a higher quality ML list however this is not essential. These can be updated with ease, we just need to update the training set to reflect the list we’re trying to build. To remove specific companies, we recommend converting the list to be a static list.

An Explore (static) list

This is just a list of companies. This could be due to filtering the data, for example companies in Newcastle with more than 250 employees, or by entering a list of company numbers. An explore list allows the user to quickly remove or add specific companies to their list.

Classifier

When we enter companies into a training set and select “Build” on the platform, we’re using machine learning to build a list of companies which are classified whether they are in the industry vertical or not. A company with a score above 0 has been classified as a company within the industry vertical. Companies are also ranked using a score of which the higher the score, the more representative the company is of the companies selected in the training set. The ML algorithm builds a classifier for this process.

Training Set

When we build a list of companies for our industry verticals, we select companies we’re interested in and not interested in. Companies with a website which is most representative of the industry vertical should be added to the positives in the training set. Companies with websites which are not representative of the industry vertical should be added to the negatives in the training set. Companies in a training set allow us to train the ML model.

Supply Chain vs Ecosystem

Supply chain refers to the infrastructure and actors (companies more often than not) that engage in the production/manufacture/provision of a product or service. For example, the supply chain of bread will group together all companies that engage in producing bread, from the people that farm wheat, to the people that grind it and make flour and those that bake the bread itself. This is a tightly defined concept.

Ecosystem is an idea that is being used to address everyone that is related to a sector, although they may not explicitly engage in developing a product.

The main difference is the idea of a “chain”, which implies necessary (and often sequential) links amongst the nodes/actors/companies. Ecosystem can group nodes/actors/companies that do not have those necessary linkages amongst themselves.

About the author

Jack Lewis

Jack has transitioned his masters in Geographical Information Systems (GIS) to data analytics. From University he enrolled in the Leeds Institute for Data Analytics internship programme accelerating his development in using code as a tool to generate accurate data.