Ever since our inception in 2014, when we developed our first text summarization algorithm using Natural Language Processing (NLP), I’ve always felt the best applications of NLP are in helping people consume information faster and more efficiently. In times of COVID-19 we’ve been able to put that into practice with our Covid Insights Platform.
When the novel coronavirus pandemic spread across the globe and we were stuck hopelessly in our homes while the brave clinicians and key workers put their lives on the line for the sake of us all, we wanted to do our part to help out, and felt that we could do so by trying to make the researchers’ lives easier.
The idea came from a rather panicked phone call in early March with my sister, Dr Sophie Harris, a consultant at Kings College London. She had just been informed that her ward was being taken over for COVID-19 patients and she suddenly had to become an expert in everything to do with the virus. She was immediately overwhelmed by the huge amount of information from multiple sources with no easy way to navigate this evolving landscape of research from across the globe.
A deeper dive into the problem and I discovered there was in the region of 4,000 papers being written and published on the virus every week. Serious information overload.
Around the same time, the fantastic CORD-19 dataset was published. This was a rich dataset of research on COVID-19, validated by leading institutions in research and medicine, and therefore one I felt we could build upon. When we started there were 28,000 research papers, and now at the time of writing there are 128,000.
Putting the pieces of the puzzle together, I came to the realization that Deeper Insights was in a unique position to help navigate this vast sea of information. As experts in analytics and data science, we have the tools and the know-how to build innovative products to tackle big problems.
Like many businesses at the start of lockdown, we’d had about 50% of our projects postponed or cancelled, which on reflection was actually quite helpful as it gave us the spare capacity to knuckle down and focus on the task at hand.
In the space of a week, I’d managed to pull together a consortium of key technology and healthcare industry professionals to help guide the development of the COVID-19 Insights Platform. Amazingly, Matthew Harker, a former BMJ director, and Meuthia Endrojono-Ellis, a former director of the NHS, agreed to provide advisory support, and were invaluable throughout the product development process. Through their network we were also connected to Dr Andrew Jones at Amazon who very generously granted us credits to develop our platform using their Neptune Knowledge Graph.
During the product development process, we interviewed countless clinicians (with a focus on those who were self-isolating rather than taking up the valuable time of frontline doctors) and from our interviews, we deduced that the product should support ongoing research into the virus, rather than clinical decision support which was an initial hypothesis. Speaking to anaesthetists in Italy and Intecivists in Wales, we got a very clear picture of the problems they face:
Information overload from too many sources
Not enough time to read everything
We set out to solve these two problems through the design and implementation of our knowledge graph which incorporates the CORD-19 dataset and is navigated via a visual representation of the graph that connects key concepts such as symptoms and drugs. We believe this solves the first problem of information overload.
Technically, the graph uses a number of NLP techniques to link key concepts in the data, and is tied to the Unified Medical Language System (UMLS) that aids the search and navigation of the platform. Additionally we applied topic networks, to slice through the graph by areas of interest such as specialisms or preventions.
This enables researchers to automatically discover the most relevant paper to read based on their search criteria. It’s a bit like using a sat nav on a long road trip, with multiple route options available to get you to your destination, except we’re hoping the destination in this case is a eureka moment for researchers in discovering new insights about the coronavirus.
To tackle the second problem - the serious lack of time - we included a newsletter service. A user can create email alerts based on their path of discovery. If they’ve searched Remdesivir and COVID-19 for example and then navigated a few extra steps through the graph to filter the papers further, the user will then get updated whenever new research comes out that matches those specific filters.
The platform has been operating for a couple of weeks and we’ve had a few hundred users returning to the platform, some spending as much as 30 minutes searching and reading, so it's very encouraging to see our efforts put to good use. Even if just one researcher finds one piece of useful information that could potentially save just one life, it will all be worthwhile.
A very special thanks
I’d like to use this blog to also say a massive thank you to all the people that helped us through phone calls, weekly product meetings, introductions, and so much more:
Dr Peter Jaye
Dr Matt Yates
Dr Lydia Wuarin
Dr Maxime Baroz
Dr Sophie Harris
Dr Andrew Jones
Dr Matthew Morgan
Dr Marcia Oliveira
Dr Brett Drury
But of course the biggest THANK YOU goes to the amazing work of all the incredible scientific researchers that have created one of the largest and fastest growing medical research data sets in history.
To start using the Covid Insights Platform for free today, go to https://covid.deeperinsights.com
Please share with any clinicians or researchers in your network that might find it useful.
Ref: CORD-19 This dataset was created by the Allen Institute for AI in partnership with the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine - National Institutes of Health, in coordination with The White House Office of Science and Technology Policy.