Where do UK businesses operate? It’s a simple question with a simple answer. We don’t know.
A good place to start is to look at The Companies House register of all UK companies.
The first problem is that 3.5 million of the UK’s nearly 6 million active businesses are sole-proprietorships. These businesses aren’t registered with Companies House.
The second problem is that for the remaining 2.5 million active companies we only know their registered address. Many businesses are registered with their accountant or a company registration service and do not operate at that location. Many businesses operate at many locations, but in the UK all are registered only at one.
Excellent work by our friends at ODILeeds in 2019 documented the scale of this problem. Around half a million companies are registered at just 400 addresses in the UK. All of those addresses have more than 400 businesses registered at them. A single address in central London has over 35,000 companies registered at it. One in Warrington has over 10,000.
We are certain that 35,000 companies do not in any meaningful sense operate out of a single address. We consider it implausible that 400 companies genuinely operate out of any single address. But there is no way to know where to set the cut-off. A large co-working space may genuinely host dozens of companies.
When we set the cut-off lower we get a further glimpse at the scale of the problem. There are 28 thousand unique addresses in the UK which host 10 or more businesses. A total of 1.4 million companies operate out of them. The distribution has a tail that would make a spider monkey jealous.
Does this matter? In normal times, maybe not. But these are not normal times.
In response to the Covid-19 lockdown the UK government announced The Small Business Grant Fund (SBGF) to support small and rural businesses in England with their business costs. This scheme is open to businesses in England that pay business rates and receive either the small business rate relief or rural rate relief.
You might see the problem now. Business rates are a tax on business premises, but most UK businesses operate out of shared premises, or a different premise to where they are registered. With so many businesses unable to prove their eligibility for SBGF the UK government announced a supplementary scheme to help them.
The Top-up to local business grant funds scheme provides additional money directly to local governments in England. They are already using their local data and expertise to find and support eligible businesses who cannot prove their eligibility for the national scheme. For the past two weeks The Data City has been helping them. Our method aims to find all businesses in the UK that operate out of premises of multiple occupation (PMOs) and that may therefore be ineligible for the national SBGF scheme.
Local governments set us four challenges,
- Identify businesses operating out of PMOs in every local authority (LA) of the UK.
- Use this data to calculate the fair split of funding for each LA in England (the scheme is devolved and the other three nations of the UK have their own equivalents).
- Add additional data such as websites, social media, email addresses, and phone numbers to as many companies as possible to make contacting them easier.
- Incorporate local data where available to improve the quality of output from tasks 1 and 3.
This is how we’ve achieved all four.
The Data City Solution
Key to our approach is the database that we’ve been building since 2017. One table in that database matches registered companies to their websites. Today we have websites for over 800,000 of the UK’s 1.1 million active registered companies with employees. On average we have four websites for every company and we take snapshots once a month. So we have monthly snapshots of three million websites that we can search for any content. When we find something we can link it back to a registered company in the UK.
The basics of our algorithm are simple.
- We get a list of every address in the UK.
- We search for each of those addresses within the registered address and all of the website content for every company in the UK.
- We create a spreadsheet for each local authority of all companies at all addresses which have more than one company operating from it.
- We add the extra information in our database such as email addresses, websites, social media and phone numbers to that spreadsheet.
In practice the algorithm is significantly more complicated than that. We delivered our first result three days into the project and have improved our results following feedback from local authorities and our internal testing team every three days since. 300 lines of code expanded to 3000. It has now shrunk back down to 700. Execution time has gone from 100 days (our first two runs could only complete for a few local authorities) to 3 hours. Our error rate has decreased from about 50% to about 5%.
Sadly we can’t publish our resulting data under an open licence, the company to URL table that makes our method possible is too valuable. Nor can we publish our code under an open licence, most importantly because the techniques for interpreting addresses on websites are too valuable. But we are able to share our results with local authorities directly under a simple licence and we can share a summary of our results here. We also make a suggestion of how both UK government and local government could make this and similar future analysis much easier and better.
The distribution of companies in co-working spaces
In summary, our technique identifies that 2.2 million of 3.9 million companies in our database are operating out premises of multiple occupation. There 304 thousand such premises.
Considering only premises with 2 to 10 companies operating out of them (to discount accountants and company registration services), we find that 790 thousand companies (1 in 5 of the companies in our database) are operating out of premises of multiple occupation that are likely to be their true operating location. In total we have been able to add information such as websites, email addresses, and phone numbers to 403 thousand companies. Approximately 8 thousand companies would not have been found to be operating out of premises of multiple occupation without our web scraping methodology.
Without our results the fairest way to allocate central government money to local authorities to fund their local business grants schemes would be to distribute the total pot in proportion to the number of businesses registered in each LA. Our data suggests a better way to distribute the money.
While across the UK 20% of businesses operate out of true premises of multiple occupation (TPMOS, those with 2 to 10 other businesses operating in them) the rate varies significantly. For example, in Warrington just 11% of businesses operate out of TPMOs. In Ipswich the rate is 27%.
The map below shows the adjustments to the weighting of central government funding that we recommend as a result and the underlying data is available on request.
The case for open addresses
The most expensive and time-consuming part of this process has come as a direct result of the UK having a closed address system. Specifically, the Postcode Address File which links addresses to postcodes is not available for us and others to use.
While options to use the PAF do exist, they are prohibitively time-consuming and expensive (especially in the legal fees required for compliance). As a result there is no open list of addresses available for us to search websites for. There is also no standard address system that businesses can use during company registration.
Businesses in the UK commonly register the exact same address so differently with Companies House that the two entities cannot be matched. Matching millions of addresses against a closed address list is expensive and imperfect.
For some areas we have been able to get around this problem in part and provide much better data. Where local authorities have published business rates collection data to the ODILeeds business rates collection standard we can use a high quality business operating address list for our searches. This lets us match companies directly to the code used by the business rates system in that area.
Excellent work over many years by companies such as Sqwyre has made much of this possible and has generated significant extra value for those local authorities who were pro-active in publishing their business rates collection data.
For Local Authorities
If you need a list of the businesses that we have identified in your geography, please get in touch with us via this form. We need you to sign a licence agreement but we can provide you with a .csv file containing the following fields,
- The address of all premises of multiple occupancy in your local authority and the number of businesses that operate from them.
- All companies registered at the address or which we believe to operate at the premise of multiple occupancy.
For each company the data will include
- Company number at companies house
- Company name
- Company registered address
And where we have additional information on the company,
- Company trading address on their website
- Company website URL
- A short description of the business
- Company email
- Company phone number
- Company social profiles
We think that our dataset of over 800,000 businesses matched to websites is the most complete in the UK, but it does not contain information on every business.