With the launch of V5 of our platform, we’ve implemented new tools to help users keep their lists up to date. In this blog our CTO Tom Forth looks at the improvements to our data set and what this means for list maintenance.
Running a business is hard. Over time companies go bust, change what they do, and merge with competitors.
These changes means that Machine Learning (ML) lists in our platform need maintenance to avoid degrading over time. It is particularly important to maintain lists regularly if training sets are small (under 40 companies) or unbalanced (many more includes than excludes or many more excludes than includes).
You may want to increase the training set sizes and balance of such lists to make them more robust to changes and thus easier to maintain.

We continually maintain the ML lists that power our Real-Time Industrial Classifications (RTICs) for you and we’ve been developing tools that make it quicker easier for us to do this internally. With our version 5 release, we’re sharing these tools with you and making it clearer when we last used them internally.
But we know that our users don’t always have time to maintain their lists. So we’re also introducing a new feature called “classifier snapshots”. This means that you can still use your lists with the benefit of our constantly updating company data while you find the time to bring them fully up to date.
Accessing classifier snapshots
When you open an ML list it will be created based on the training set as it was the last time you made changes to that training set. You should expect your lists to be largely the same size and of a similar composition to how they were the last time you updated the training set.
But this does not mean the companies in your list are frozen in time.
- Companies in your list will always have the latest data that we have for them including the latest financial, funding, skills, trade, director, and sector data.
- Companies will be classified using the latest web text from the most recent websites and web scrapes we have for them.
- Companies that are no longer active will not be in your list.
We still want you to update the training sets for your lists. This will ensure that your lists are created from a training set with the latest and best website text we have for companies. If you don’t update your training sets you risk missing companies at the cutting edge of your sector of interest, whether it’s LLMs for AI, the latest grid-scale battery technologies for net zero, or drone advances in defence.
Maintaining your training set
To help you maintain your lists we’ve tided up our process for maintaining training sets and in version 5 we’re making it available to everyone.

We will guide you through replacing companies that have gone bust or changed their websites with similar companies that are still active. If you can’t find a good replacement, we’ll help you remove these companies from your training set.
You don’t have to click the Maintain Training Set button, but if you do you’ll know that you have the best possible lists and access to all of our latest features.
Testing and feedback
We’ve thoroughly tested our process internally and we’ve worked hard to simplify and accelerate it so that it’s as fast and easy for you as possible. The outputs from our internal tests look great, but it’s our user’s experience that matters.
If you go through the Maintain Your List process, please let us know how you found it. If it kept your lists as good as they were when you first built them, we’ll be happy. If it made them even better, we’ll be thrilled. And if it made them worse, we’re ready to get you back on track.
All the tools for you to maintain and update your lists remain available to you and you can and should still tweak the training sets for your lists regularly. When new and exciting companies enter your sectors of interest it pays to add them to the training set of your lists, especially if those training sets are currently small or unbalanced.
V5 is now live for all users. New to The Data City, sign up for a free trial today and see the updates for yourself.
If you’d like to discuss the updates further please let us know and get in touch.
You can see a full list of updates in V5 in our release notes.