Global

Using The Data City’s new Global platform

Discover the power of our Global platform with CTO Tom Forth, as he effortlessly classifies French kombucha producers and tracks down Italy’s top coating technology businesses.

Four years ago I wrote a blog post on my own site about Real-Time Industrial Classifications (RTICs) and our UK product. It was still early days and a single round of list training took 100 times as long as the 5 seconds it takes today. But I felt that our product was ready for serious use without focused support and it was time to share that. This is a similar blog post about our Global product.

Our ambition has always been to tell people what every company in the world does. But ambition is not action and there were very large technical obstacles in our way.

Over the past five years those obstacles have gradually disappeared. The price of RAM and solid state storage has fallen substantially. Frameworks and programming languages that we rely on for high performance computing such as .NET and Rust have matured. The hype around large language models has made working with them much easier, cheaper, and faster. New technologies like DuckDB have delivered huge performance improvements and simplifications to established processes. All of this means that today we can do more and more of what we do to understand British companies for companies across Europe and North America.

I’ll give two examples.

What is the French for kombucha?

One of my favourite things to do when I’m away from home is try local speciality soft drinks. Whether it’s Irn-Bru in Scotland, Root Beer in the USA, Chinotto in Italy, Rivella in Switzerland, or Spezi in Germany, variety is the spice of life and nothing’s so bad you can’t enjoy it once.

Tracking soft drink companies in the UK is one of the lists that I maintain for myself in our UK product. And one of the most interesting and fast-growing categories of soft drink I’ve tracked in recent years is kombucha, a fermented tea drink.

Unsurprisingly there isn’t an SIC code for kombucha breweries. Kombucha manufacturers in the UK are spread across a wide range of SIC codes but our RTIC system works brilliantly at picking them out with as few as 5 examples of companies that brew kombucha and 10 examples of companies that don’t.

What if we do the same worldwide?

The results are very good. With a training set of just 24 companies that brew kombucha and 25 companies that don’t we get a list of 284 kombucha manufacturers spread across the USA, UK, France, Germany, Italy, and Denmark.

And the results are excellent, even across languages and cultures. For example, we find Takubeh, a small kombucha manufacturer in the countryside outside of St. Etienne, France. They are classified in the French refreshing beverages code, 11.07B: Production de boissons rafraîchissantes, but identified clearly in our product despite having a French language website (we’ll say more about how this works in the future).

Screenshot
Takubeh are a small French kombucha in the hills just south of St. Etienne.

Another excellent result from rural France is BBKombucha, a kombucha brewer from the small village of Vailhauquès near to Montpellier. They have a different French industrial classification code 11.04Z : Production d’autres boissons fermentées non distillées, showing that the specificity of industrial sector classifications is a problem worldwide.

BBKombucha homepage screenshot

We’ve shown that we can use our global product to build lists of companies using AI in the same way as we do in the UK. We’re still working on the user interface and the speed of that classification, but we already think it’s good enough to use more widely. We’ve solved most of the hardest problems we faced and we’re entering a phase of our development where users should start seeing rapid improvements.

There are some questions we can’t answer well yet. For example, does Germany, Britain, Italy, or France have more kombucha manufacturers? We don’t know for sure because we’ve matched different numbers of companies to their websites in each country and while our classification engine works multi-lingually, it still works better in some languages than others. But we’ll get there.

Applying existing RTICs to the world.

But what if you don’t want to build your own list? The good news is that we’ve already applied our over 400 RTICs to the 12 million and growing global companies with websites we’ve added to our platform.

So if you want to see companies in Italy specialising in Coating Technologies, we’ve got a lot of them, even if their website is mostly in Italian.

Take for example Verniciatura Industriale Pesarese on an industrial estate outside of the city of Pesaro. We’ve got them, and lots of details about what they do.

Vippesaro's company details from The Data City's global platform
We have details of over 120 million companies in 12 countries.

And their website makes clear even to someone with poor Italian that we’ve got them classified pretty well.

Screenshot of Vippesaro's website
We have details of over 120 million companies in 12 countries.

As with kombucha brewers, we can’t do everything in our global product that we can do in the UK. For example, we’ll often have a good estimate of employee count at a company, but we don’t have historical data on that, so we can’t tell you whether a company is growing or shrinking yet. And while comparisons within countries are likely to be safe, we’d urge caution comparing across countries.

We expect rapid improvements in coming months. We’ll have more websites matched for companies in more countries.

But with already nearly 120 million companies across 12 countries in our database, with over 12 million of them with a matched URL, and with over a million with at least one assigned RTIC we think there’s a lot to explore already.

Interested in discovering how our Global platform can help you understand what companies do? Visit our Global product page and sign up for a free trial today.

You can also sign up for our upcoming global webinar where we’ll be showcasing the platform for the first time. 

About the author