When Tom, Paul & I decided to create The Data City, it was to prove that it was possible to provide data that was fit for modern industry. At the heart of this was a core problem: building lists of businesses was time consuming and expensive, often resulting in poor lists and flawed analysis.
We broke the problem down into 4 main points:
SIC codes are limited.
- Businesses rarely update them despite their business’s activities changing (for instance some digital document management businesses are still classified as printers despite their businesses shutting their printing presses decades ago).
- The actual SIC codes don’t cover key areas of industry and can’t be updated frequently enough, this is particularly poor for the emerging economy sectors such as FinTech, MedTech, AI etc which have no SIC code.
- SIC codes are too crude a definition for many use cases (I’d like a list of companies that provide managed technology services to the legal sector as a simple example).
Registered addresses are problematic.
Most industrial analysis or business list building starts with Companies House data, which only holds the registered address of the business. This causes a number of problems:
- The registered address is not necessarily the trading address of the company. Unfortunately, many businesses choose to use their accountants or company registrations provider as their registered address which is often not in the same city as their base of operations. In fact in the UK there are there are 603,752 businesses at addresses with 400 or more registered businesses (https://odileeds.org/blog/2019-08-01-offices-of-multiple-occupation) this causes a lot of problems and inaccuracy.
- Many businesses have more than one office/location. In using the registered address, the entire economic footprint of a business is credited to its registered address. For example Tata Steel, incorporating Corus Steel Ltd and British Steel Ltd is registered to 30 Millbank, London, SW1P 4WY (https://beta.companieshouse.gov.uk/company/02280000) an office block where a tiny fraction of its operation are based but where many pieces of analysis will show its entire industrial footprint.
Companies are poor at describing themselves in their accounts.
The description of the company’s activities as provided in the annual accounts is used for keyword-based searches to find businesses in non-standard sectors. Unfortunately, this description (often written by accountants) rarely describes the businesses activity in enough variety or detail meaning many key word-based searches miss large numbers of businesses operating in an area of industry (see example below for DPS Software Ltd).
The ‘problem solution’
Interviewing users of business data in both the public and private sector we observed a common workaround being employed. Analysts were resorting to a long and drawn out process to try to build their target list following a process we will illustrate by example:
- ‘Matt’ wants to build a list of businesses that supply technology to the legal sector including their financial information, a sector he refers to as ‘LegalTech’. He looks but there is no SIC code for this sector so he resorts to selecting SIC codes relevant to IT & Computing despite knowing only a small % of these will provide the technology or services he is looking for.
- Matt then performs a number of keyword searches on the company information he has been provided, the text he is searching is the company description provided with the company’s annual accounts, unfortunately many companies provide a very brief description here. For instance the company DPS Software https://www.dpssoftware.co.uk/ who’s ‘tag line’ is ‘Legal Technology Experts’ provide the following description in their accounts (as a PDF)
DPS do not appear in Matt’s search and he is unable to identify the company in his list.
- Matt is already aware of DPS, so is disappointed not to be able to find them this way. Matt now resorts to a series of Google searches, finding businesses in the search results, visiting their websites and then looking for their details on Companies House to match them to their company financials. Matt adds these businesses to the list he has generated from his search of companies house-based data a long and laborious process and one he knows is likely to fail to find the whole sector.
- Eventually the amount of time the process is taking becomes worth less than the benefit of continuing, Matt stops building his list and ‘makes do’.
The observation of this process was a key moment for us, in a dream world Matt would have thousands of helpers performing Google searches and matching the websites they found to their company information. This process would overcome all 4 of the problems outlined but was clearly to expensive to operate. The perfect list builder would be a technology solution that overcame this cost limitation, this became or mission.
We have built a platform based on a Machine Learning process that overcomes all four of these problems, quickly and cost effectively, we think it’s the best list builder in the world, but we’d like you to be the judge of that!
So what have we built?
The Data City List Builder – Out of Beta!
Primarily it is the ability to build the best (most complete, most accurate against our problem statements) list of businesses that comprise any sector. When we say any sector, that’s exactly what we mean. For instance, a number of our customers told us that they wanted a list of Cyber security companies but struggled to build a list they were happy with primarily because there was no SIC code for the sector. A recent government report from DCMS had identified 1221 companies active in the sector but it was suggested that this was intuitively ‘too’ small.. could our technology prove/disprove this intuition?
Following the same taxonomy and sector component approach as the DCMS report, our technology identified over 4000 businesses in the sector, a huge difference and importantly a list that will update itself now as the sector evolves. The live product now includes:
Sector/list builder
- use our Machine Learning to build a list of businesses in ANY sector you can conceive
- identify ANY supply chain
- lists are ‘subscribed’ to and update with our data
- track sectors over time
RTIC codes (our Real-Time Industrial Classifications)
Gain access to lists we have built for you for the most interesting areas of the emerging economy (live, coming soon and always being added to, feel free to send us your suggestions):
- Cyber
- FinTech
- EdTech
- LegalTech
- MedTech
- HealthTech
- Internet of Things (IoT)
- Artificial Intelligence (AI)
- Software as a service (SaaS)
- Mobility
- Electric and Autonomous Vehicles
- Virtual and Augmented Reality
- 3D Printing and Advanced Manufacturing
- Blockchain
- Circular and Sustainable Economies
- GovTech and Policy Development
- Green Energy and Waste to Energy
- Photonics
- Aeronautics
- And more… get in touch if you’d like us to build one for you
A wide range of data points (over 30 now)
- Company details including financials
- ‘Website description’
- ‘URLs’
- ‘Registered & trading address(es)
- ‘SICs’
- ‘OECD cities’
- ‘Local authorities’
- ‘LinkedIn’, ‘Facebook’, ‘Twitter’, ‘Instagram’, ‘YouTube’
- ‘Telephone’
- ‘Email’
- …plus more
Sector insights
We’ve launched version 1 of our sector/lists insights, an automated page of key insights on the list or sector that you have created on our platform. You can quickly see where the sector is clustered, how it tracks back to SIC codes, sector key words and more.
Sector reports
We’re also working on the semi-automated production of sector reports, our first is on the Cyber security sector and is available for purchase here, this builds upon the list insights as a downloadable PDF.
Going global
We worked hard to build a scalable technology platform, we now looking to expand our platform globally enabling local markets to also enjoy the benefits of our list builder but also to enable international comparisons for our UK client base.
How Covid-19 has affected us
We are glad to say that so far, our team have been unaffected apart from observing the lockdown; we already operated a virtual office model with total flexibility for home working so the adjustment required no real changes for us. In fact if anything we’ve been even busier, particularly as we have joined a powerful Alliance of organisations led by Rolls Royce (https://emer2gent.org/ & https://www.rolls-royce.com/media/press-releases/2020/16-04-2020-rr-establishes-new-covid-19-data-alliance.aspx) to try to help track the impact of Covid-19 on the economy and help with decision making around the recovery process. This is all being done on a voluntary basis and will be scaled globally over the coming weeks.
We’re going to run a weekly Covid-19 tracker that uses over 800,000 websites to show the impact and (hopeful) recovery by analysing their Covid 19 statement and then linking to our data on their sector, financials employees etc. the output of which will be published as open data to the Emer2gent Alliance platform. We’ll be running this throughout the crisis starting in the UK but are working to extend across key global economies (you can read more here)