On October 7th we joined the event Modelling an Evolving Economy, organised by the Digital Catapult in collaboration with Nesta and the Economic Statistics Centre of Excellence. It was fantastic to hear about the research conducted by the panellists, which in many ways supports some of the developments that we have worked on during the last few years. Here are just a few of our notes and thoughts from the session.
Measuring relatedness between industrial clusters
Researchers from the Copenhagen Business School and MIT introduced their method to measure relatedness between industrial clusters as part of a wider discussion of economic regional resilience. In their paper, they discussed the implications of using categorical industrial classification systems (NAICS) to understand newer economic sectors.
Currently, we are exploring industrial relatedness with the Bennett Institute by understanding our Real-Time Industrial Classifications (RTICs) in terms of networks. Furthermore, our team of amazing data scientists are developing more advanced and detailed analysis to describe how different industries relate to themselves. This will be included in the platform, so anyone can automatically explore the relationship between different sectors using the locations and RTICs/sectors of interest.
Similarity measures, hierarchical clustering and network structures.
The Center for Economic Performance (LSE) introduced its work on the application of hierarchical clustering to website text data to group companies that share overrepresented keywords. Something similar was worked on by Dr. Amanda Otley, Data Scientist at The Data City. She has applied clustering analysis to an existing FinTech RTIC as a method of validation. Specifically, she used various techniques to explore hierarchal clustering. This exercise helped us conclude that the RTIC methodology (industrial taxonomies and industrial classification using our proprietary algorithm) produced more detailed and purpose-oriented databases than the hierarchical clustering techniques alone.
Likewise, the LSE project referenced use of Cosine Similarity measures in the preparation of the website text data prior to applying the clustering techniques. Whilst this approach differs from the Machine Learning which underpins our RTICs, we have also explored the use of similarity measures to identify similar companies, the result of which is available in our platform.
Additionally, the Center for Economic Performance introduced how network structures can help us understand the degree of connectedness of a company. Back when we had two CDRC students spending some time with us completing their masters dissertations, Michael explored how companies link to each other using a URL analysis technique. Under our supervision, Michael developed a method for measuring innovation and encountered similar problems.
Our second student, Jason spent time exploring the connections between directors. Jason had to develop a novel method for assigning a unique identifier to each director in the UK and figured out which directors worked for which company! Jason wrote up his work as a blog article. The results led to being able to identify the most influential directors for any industry.
Understanding job posting data with website text
Researchers at the Wharton University of Pennsylvania introduced their work on further understanding jobs and skills postings data by Lightcast. They created a classifier algorithm that analyses the website text of the websites that Lightcast data scrapes for job postings. This is so they could explore the specialisation of the companies as per their website, and group them considering commonalities. We applied the same logic to the same problem. However, we went around it slightly differently.
First, the RTIC method classifies companies according to common language patterns in their website text. Then, we match the job postings data to the correct company number. Then, the analytical functions automatically present the aggregated results for job posting data per sector, including Job postings by SOC4 name and code, average salary by SOC4 name and code, job postings by skill and average posting duration by skill.
Likewise, the available location filters allow for a geographical analysis of job posting data. We base this information on postcode data that we have scraped from Companies House and companies’ websites, making it possible to understand jobs and skills spatially.
It was brilliant to see and hear from so many fantastic speakers at the event, all contributing insightful and differing approaches to mapping the emerging economy. A big thanks to the Digital Catapult, Nesta and the Economic Statistics Centre of Excellence for hosting, and to all the speakers for talking so openly about their work.
Want to talk more? Feel free to contact us today.