Product

Data Explorer Release Notes

The core technologies and datasets that make The Data City work have improved enormously in the past five years.

In December 2019 our Data Explorer product contained 100GB of data on the UK’s companies. In December 2020 that number was 200GB. In December 2021 it is 400GB. Despite this growth, classification speed has increased by a factor of ten. We are exceeding our >99.99% uptime target, beating our target for 2021 of achieving speed and stability while adding new features.

Using machine-learning and instant keyword filtering, our unique process lets users classify companies into sectors of the economy not covered by SIC codes. They are achieving more in days and weeks than they used to in months and years.

Coming soon

Short term (Q2 2023)

  • Better documentation of Lightcast data.
  • Improved RTIC sector growth estimates in ANALYSE.
  • Refresh of innovation and sector keywords.
  • Company births and deaths data for the last 10 years.
  • Breakthrough sectors at a regional level.
  • Further improvements to quality assurance.
  • Automated list quality estimates.
  • Include summary of Lightcast data results with company counts.

Long term ambitions

  • Use past versions of company websites to track growth trends over time.
  • More detailed mapping of companies.

v4.3 (released December 2022)

New features

  • RTIC ranking and summary tables now available.
  • RTIC summary tables can now be filtered by region.

New data or data upgrade

  • Companies house data updated to December 2022 version.
  • General availability of Lightcast data on jobs and skills (Available on Request, please contact us for pricing details).
  • Improved salary and posting duration data from Lightcast available in analyse.
  • Four new job skills breakdowns available – software skills, certifications, common skills, specialised skills.
  • Improved company growth estimates.
  • Improved Dealroom data on funding – dates of multiple funding rounds, total funding raised and raised series.
  • Dealroom data is now incorporated in company growth stage estimates.
  • Improved URL matching quality.
  • Increase of companies with URL matches to 1.68 million.
  • Improved website match reasoning.

Performance improvements and bug fixes

  • Fixed bug where full download of ANALYSE results would occasionally fail.
  • Up to 4x improvement in speed of ML list building.
  • Fixed a bug where the RTIC sector counts widget in ANALYSE would sometimes report more companies per RTIC than there were companies in a list.
  • Fixed a problem whereby some local authorities created in 2021 had fewer employees reported than expected in ANALYSE.
  • Fixed a bug where some companies had more than one registered address.

v4.2 (released November 2022)

New features

  • Filter by company stage estimates (startup, scale-up, established business, unicorn etc…).
  • Company growth stage results now available in ANALYSE.
  • See score distribution of all classified companies for ML lists.

New data or data upgrade

  • Company growth stage added to list downloads.

Performance improvements and bug fixes

  • Fixed bug where full ANALYSE results download file was unavailable.
  • Faster loading of previously classified ML lists.
  • Faster loading of ML lists – classifier explanation now returned with company results.
  • Faster location quotient calculations now performed server side in ANALYSE.
  • Faster loading of All RTICs summary page.

v4.1 (released October 2022)

New features

  • Median company growth rate estimates are available for lists in ANALYSE.
  • Company stage estimates (startup, scale-up, established business, unicorn etc…) available in the product.
  • Added CIC results to ANALYSE.
  • Improved line charts in ANALYSE.

New data or data upgrade

  • Improvements to company growth score estimates. (switch to ln-based estimates)
  • Improved ordering of companies when filtering by company name.

Performance improvements and bug fixes

  • Improvements to layout on mobile.
  • Fixes to RTIC and CIC filtering.
  • Faster processing of RTICs and CICs on page load.
  • Fixed issue where company details page would fail to load if it had no shareholder information.
  • Fixed bug where explain company score results would fail to load.
  • Fixed bug where filtering by certain company categories would cause a failure.
  • Fixed issue with company name filtering.

v4.0 (released September 2022)

New features

  • Increased accuracy of classification. Your lists may be slightly smaller. Add more positives to the training set if required.
  • Company growth estimate now included in ML lists and EXPLORE.
  • New financial filters – EBITDA and total Innovate UK funding.
  • New Innovate UK funding widgets in Analyse.
  • One click export of company URLs enables easier use of data with external providers such as Hunter.io’s Bulk Domain Search.
  • Improved EBITDA data.
  • Sort lists by company growth estimates or total Innovate UK funding.
  • New Dealroom funding filter, you can now include companies where funding is unknown when filtering.

New data or data update

  • English and Welsh geographies updated to 2021 census, and recent local and regional government reorganisations.
  • Companies house data updated to September 2022 version.

Performance improvements and bug fixes

  • Instant classification. Average time to classify 1.6 million companies reduced from 10 minutes to 10 seconds.
  • Improved performance when loading RTICs and CICs into the platform.
  • Fixed layout issues with line charts in ANALYSE.
  • Fixed director’s appointment date field.

v3.3 (released August 2022)

New features

  • Company emails are now matched to which company director they might belong to.
  • Individual company profile pages now contain all available data fields including group structure, funding and shareholders.
  • New instantly print a company profile control.
  • New download options and formats (XLS, CSV and JSON) for company profiles.

New data or data update

  • EBITDA data is now available for companies in ML lists, EXPLORE lists and company profiles.

Performance improvements and bug fixes

  • Fixed bug that prevented directors pages from loading data.
  • Fixed bug where including single quotes in keyword filters would prevent lists from being processing. Single quotes are now automatically changed to double quotes.
  • Manually included and excluded companies in ML lists are now added to copied and shared versions of the list.
  • Improved formatting of company data in EXPLORE and ML lists.

v3.2 (released July 2022)

New features

  • Updated “My lists” page now contains details of all types of list – ML list, EXPLORE list and CICs.
  • Added new tooltips and explainers.
  • New add many companies functionality for manually included or excluded companies in ML lists.
  • Company score is now always visible in ML lists results regardless of what property the list is being sorted by.

New data or data update

  • Updated website screenshots, increasing count from 700,000 to 1,600,000.

Performance improvements and bug fixes

  • Fixed bug where downloads of ML Lists were not correctly ordered by score.
  • Fixed bug where removing a company from a training set and then filtering the list would cause the list to be rebuilt.
  • Fixed layout issues for ML list score elements on small-width devices.
  • In ML lists sorting via keyword ranking position will no longer automatically be applied if filtering by keywords is active.

v3.1 (released June 2022)

New features

  • Added location filter for company’s ultimate parent nation.
  • Added option to filter out company’s that are known to be ultimately foreign-owned for EXPLORE and ML Lists.
  • Included the CIC filter in it’s own section, allowing comparisons between different sectors to be performed.
  • Updates to ANALYSE and COMPARE pages. Easily navigate to different sections with the new controls.
  • New fields in ANALYSE and COMPARE – investment funding via Dealroom and Innovate UK grant funding.
  • Added parent nation and ultimate parent nation to group structure details in EXPLORE and ML Lists.
  • Downloads of ML and EXPLORE Lists are now available in JSON format.
  • New group structure fields and updated financial data years available in downloads of ML and EXPLORE Lists.
  • Improved filters UI in mobile view.

New data or data update

  • Data refresh based on Companies House records June 1st 2021.
  • Full update of company director details, including fixing the capitalisation of names.
  • Full update of financial data, including shareholdings, group structure, and beneficial ownership.
  • Funding data from Innovate UK and 360 Giving.

Performance improvements and bug fixes

  • Faster page loading.
  • Faster server maintains speed even with many users.
  • Fixed incorrect units for currency and location quotient fields in ANALYSE.
  • Improved company search results – more relevant companies will appear higher up in the list.
  • Fixed missing icon images.
  • Fixed bug where company website links in search results were broken.
  • Removed list size options from ML list creation process.

v3.0 (released May 2022)

New data or data update

  • More accurate company URL matches.
  • Improved website data for all companies.
  • More business locations.
  • Company funding data provided by Dealroom.
  • Added company group structure data.
  • Added company shareholders data.
  • Added persons of significant control data.
  • Manually include and exclude companies from ML lists without affecting machine learning classification.
  • Filter companies by minimum innovation score.

New features

  • Improved methods for splitting employees and turnover across multiple operating locations of a business.
  • Added new fields to ANALYSE and COMPARE.
  • Improved company list UI.
  • Added location quotient option to ANALYSE and COMPARE.
  • Added percentage option for website keywords to ANALYSE and COMPARE.
  • Improved company searches in EXPLORE, the most relevant companies will now appear highest in the results list.

Performance improvements and bug fixes

  • Faster page loading.
  • Faster filter searches.

v2.6 (released March 2022)

  • New COMPARE tool. Easily compare two different sectors or EXPLORE lists through visualisations of overview statistics. Fields such as employees, sectors and financials are included.
  • Improved ML lists. Lists are no longer limited in size by a preset return count. Faster loading of lists.
  • Improved filtering UI.
  • Improved RTICs pages. RTICs now contain unique codes and descriptions.
  • RTICs are now tagged as New or Updated if they have been edited within the last month.
  • Fixed order of recent EXPLORE lists on homepage.
  • Analyse and Compare pages now contain Location quotient options for business counts, employees and turnover by local authority.
  • Improved download lists UI.
  • Updated company lists UI. See the most important data fields more easily.

v2.5 (released January 2022)

  • Faster page loading.
  • New company R&D innovation score. Filter and sort by how innovative companies are in ANALYSE and EXPLORE.
  • New similar companies measure. For companies in EXPLORE see up to five companies sharing similar characteristics.
  • Improved filtering UI in EXPLORE and ANALYSE. Easily access all available filters.
  • More data on company turnovers and employee counts provided by Red Flag Alert.
  • Added results summary panel to ANALYSE page.
  • Added option for basic or detailed downloads on EXPLORE and ML List pages.
  • Added sort by Turnover and Keyword ranking score options for EXPLORE lists.
  • Faster data downloads.
  • Added innovation score, turnover, and similar companies data to ML List page.

v2.4 (released December 2021)

  • Data refresh based on Companies House records December 1st 2021.
  • Over 1.6 million companies with URLs matched.
  • Financials data for 2020 now over 50% coverage, giving accurate 2020 financials through extrapolation.
  • Over 5 million companies.

v2.3 (released September 2021)

  • Data refresh based on Companies House records September 1st 2021.
  • More than 200,000 new companies added to the platform, with 1.25m companies with URLs matched.
  • Improved company phone numbers data.
  • Improved company email addresses data.
  • Improved speed when building ML lists.
  • Fixed bug where companies with scores below zero were incorrectly sorted on ML list page.
  • UI updates on Define list page.
  • Fixed bug where previously deactivated filters would be turned back on after list rebuild.
  • RTICs added to individual company pages.
  • On EXPLORE page all RTICs for a company are now displayed by default.
  • ML list filtering by location can now be limited to registered address only.
  • In EXPLORE you can now lock the company numbers filter against filter reset.
  • Filtering custom EXPLORE lists will no longer automatically overwrite the saved version.
  • All My words used for keyword filtering are now added to shared ML lists.
  • Fixed simultaneous filtering of RTICs and company numbers to show proper overlap between companies.
  • ANALYSE page now contains list summary data on RTIC sectors.
  • Speed improvements on EXPLORE page.
  • Added incorporation date filter to EXPLORE and ANALYSE pages.

v2.2 (released August 2021)

  • Full data refresh based on Companies House records August 1st 2021.
  • Improved URL matching of companies.
  • Better company phone number matching.
  • More data on directors, now includes details of officer’s past appointments.

v2.1 (released July 2021)

  • Integration of v1.x stability into v2.x branch
  • Updated design to reflect new Data City branding.
  • RTICs are now available to view in the platform. Access individual RTIC lists from the home page to view on the EXPLORE or ANALYSE pages. RTICs can also be used as filters.
  • Improved filtering UI.
  • Increased maximum ML list download size to 30,000 companies.
  • New location options for filtering: LEPs, regions and constituencies.
  • Improved interactive line charts on ANALYSE page.
  • Save static lists generated in the EXPLORE page. These can be edited, shared or copied.
  • Access and edit saved EXPLORE lists from the home page.
  • Estimates of per company greenhouse gas emissions is now available on the EXPLORE page.
  • Sort ML and EXPLORE lists by properties including company name, employee count and incorporation date.
  • New search UI to quickly find companies in your list by name, company number and URL.
  • Added a report mismatched RTIC button to companies in EXPLORE page.
  • Active filters in ML company lists are now autosaved. Filtered lists can now be shared with other users.
  • New print styles for improved output from the platform.
  • Improved UI on mobile devices.
  • Fixed bug where “My words” would become unavailable if a ML list was rebuilt.

v2.0 (released March 2021)

  • Increased company website matching (>1m websites) and improved website matching (<1% error rate).
  • Improved list download. Native Excel download and Excel-friendly CSVs. Please note that downloads will have slightly different column headings to the previous version of the product.
  • Filters available with a consistent UI in Lists, Data Downloads, and Insights.
  • Extended financial data including Debtors due after one year, Debtors, Trade Debtors, Depreciation of Tangibles, Amortisation of Intangibles, Directors Remuneration, Employee Remuneration.
  • Estimates of company group structures.
  • Fix for a rare bug where a user with multiple lists open at once receives the results from one list in another tab.
  • Fix for a rare bug a keyword filter does not return the correct number of companies.
  • Fix for a rare bug results for the wrong lists are returned during list building.
  • New Disqus user forum to send us feedback on the product.
  • The classifier explanation now allows you to see up to 100 terms used during company classification.
  • See explanations of individual companies scores on the list page.
  • Redesigned company list page now shows company description and homepage screenshot by default.
  • Completely new EXPLORE section: Explore details on all 4.7m UK companies. Filter companies by sector, location, financials and keywords.
  • List insights improved and renamed ANALYSE.
  • Results in ANALYSE can be viewed as barcharts or linecharts as appropriate.
  • New interactive maps to explore location data.
  • New data points including website keywords, company size and employees by location are now available in the ANALYSE section.
  • New company filters in EXPLORE and ANALYSE sections. Paste in a list of company numbers to see either company details or summary statistics of the list.
  • You can now filter companies by number of employees.
  • Improved “Find a company” functionality on the homepage. Results will now show extended company details.
  • In EXPLORE you can now download details of the top 5000 companies in your search results.
  • ANALYSE now includes results on webpage keywords, including how over/under-represented they in your sample compared to all companies

v1.7 (released Feb 1st 2021)

  • New home page. Instantly access, edit, and share your 10 most recent lists
  • Quickly search for companies directly from the home page.
  • Added search functionality to list select pages.
  • Included additional opportunities to report missing and mismatched companies on the find a company and company details pages.
  • Fixed some issues with company details downloads.
  • Fixed bug where you could not access director details for companies with more than five directors on the company details page.
  • Layout improvements on the company list page. Some details are now arranged in a column structure.
  • Fixed bug where sometimes the correct score setting was not carried across from the list page to the insights page when filtering.
  • Fixed downloads of Top 10 company financials on the insights page.
  • Trial users can now see 60 companies in their lists.

v1.6 (released Jan 1st 2021)

  • The creation of a Long Term Support level product in the 1.x branch.
  • Feature development of the 1.x branch will be frozen and all efforts shifted to reliability.
  • All bug fixes in v2.0 will be backported to the v1.x branch.
  • Fix for a bug where on older laptops, older browsers, and on systems with unusually high security restrictions the product was unusable.

v1.5 (released Dec 1st 2020)

  • Added folder functionality in the list page. Default folders include all lists and favourites. The top 10 most recent lists will be displayed at the top of the page.
  • Added keyword filtering functionality to the insights page, including number of companies considered after filtering.
  • Improved the algorithm for assigning a company’s financial year to a calendar year for aligned financial estimates. January 2020 results are assigned to calendar year 2019.
  • Fix for a known bug where under rare circumstances a CSV download may be malformed.
  • Improved UI so that users who over-define a training set with included companies are prompted to add negatives and avoided the list being lost and requiring support.
  • Stronger separation between staging/alpha and live/beta environments.
  • Increase in server uptime. We are currently at >99.5% uptime for 2020 with zero lost data. But the <0.5% of time when our service is down is during peak load when our users most need the service.

v1.2 (released — Dec. 2020)

  • Insights 1.9 released. This consists of financial projections for a given list and identifying the top 10 companies per financial field.
  • The Data City does not include financial predictions for singular companies however the insights page will forecast company financials for a given sector. Should a company include a financial field for the previous year but not the current year, the same value from the previous year is used for the current year. This assumes a normal trading year.
  • x2 speed improvements to list building.
  • Added the ability to report a missing website for a company.
  • Added the ability to report a mismatched website for a company.

v1.1 (released — Nov. 2020)

  • Added Country of Origin into the locations filter.
  • Added ‘Always include companies present in training set’ option to the keyword filter. This ensures a user will not miss out on companies they have manually included into their list (through the training set).
  • Added Company Category into the sector filter
  • Added column view of finances to CSV format download of lists.
  • ‘In training set’ column added to the list download.

v1.0 (released — Oct. 2020)

  • 830,000 businesses with matched websites.
  • Significant UI improvements (links, back button, multiple tabs, etc…)
  • Expansion of financial data to cover up to the past five years of trading. Financial records for nearly 100% of companies including turnover, profit, assets and liabilities.
  • Broader coverage of employee number estimates.
  • Complex keyword adding to the list-building UI to accelerate list creation and training set refinement.
  • “My Words” feature added.
  • “Find a company” feature added.
  • Added list copying.
  • Improved server stability.
  • Historical director data added to the explorer.

v0.5 (released — Sep. 2020)

  • 300,000 website matches added leaving 830,000 companies will full website details.
  • 64,000 false positive website matches removed.
  • X60 speed improvement to whole-website keyword filtering. From five minutes to five seconds.
  • Fixed duplicate financial data for companies filing their accounts twice in one year.
  • Expansion of all features of the product to Northern Ireland.
  • Classifier terms added to the CSV download feature (now served a single .zip).
  • Directors added to the CSV download feature (now served a single .zip).
  • Company incorporation date added as a field.

v0.4 (released — Mar. 2020)

  • Data download updated to include financial data.
  • Three years of financial data expanded to cover 75% of businesses.
  • CSV format download of lists (row view of finances)
  • Included company descriptions available. These are more detailed than those included on company accounts and more accurately reflect the company’s primary IP.
  • Classifier explanation (terms) included and added to the list building page. Opening these up will help the user know keywords being prioritised to build the list.

v0.3 (released — Feb. 2020)

  • Added three years of financial data for >50% of companies.
  • Copy and paste enabled on training set fields.
  • Expanded company details page for each company.
  • 600,000 website screenshots added.
  • Insights v1.0 included. Each list is now able to be visualised. Graphical outputs will be produced which include SIC code breakdowns and locational data.
  • Current company officers and directors added to company details.

v0.2 (released — Dec. 2019)

  • Autosave for lists.
  • Added “share lists”.

v0.1 (released — Nov 2019)

  • Added Phone, Email, LinkedIn, Facebook, Instagram, YouTube and Twitter to company descriptions.

v0.1a (released — Oct 2019)

  • 600,000 companies with websites.
  • User-defined classifier to define lists.
  • Company details from Companies House.

About the author

Dan Billingsley