How web scraping and data mining can help predict, track and contain current and future disease outbreaks
COVID-19, a novel strain of the coronavirus, started as a rare respiratory illness in the port city of Wuhan, capital of China’s Hubei province. Since 31 December 2019, when the Chinese government first reported several cases of unusual pneumonia, the virus has spread around the world and left governments scrambling to contain its spread.
On 30 January 2020, the WHO declared the then unnamed coronavirus a global emergency with new confirmed cases increasing by the thousands every day. One researcher believes 40 to 70 percent of the world’s population will be infected within the coming year.
As of 11 March 2020, there have been almost 120,000 confirmed cases in more than 100 countries, including more than 4,200 deaths. While the spread appears to be under control in China, cases in the rest of the world, mainly Europe and USA, are rapidly increasing every day.
Impact on Global Economy
The Organisation for Economic Cooperation and Development (OECD) has projected that the global economy could grow at its slowest rate (2.4%) since 2009 because of the coronavirus outbreak. It added that the forecast would look much worse if the virus wasn’t contained within the first quarter of 2020 and spread throughout Asia, Europe and North America.
The Dow Jones and FTSE 100 plunged 4.4% and 3.5% respectively on 27 February, as major stock markets lost $1.5 trillion in global shares value the same week — their worst weekly performance since the 2008 financial crisis. Conditions got even worse over the following week, with all major stock markets posting their worst numbers since the 2008 crisis.
Impact on Global Tech Industry
The world of tech is also not immune to the effects of the coronavirus outbreak.
Almost all major events and conferences have either been cancelled or restricted to online media, including Barcelona’s Mobile World Congress, Facebook F8, Google Cloud Next, Google I/O, IBM’s Think, Austin’s South by Southwest, etc. The economic loss as a result of these cancellations is reportedly more than $1 billion.
Companies are also informing consumers to expect manufacturing and supply chain delays on their products, with offices, stores and factories in China still closed and employees urged to refrain from non-essentials travels.
Role of Technology
As local and international authorities continue to contain the outbreak, incorporating data and technology into the day-to-day decision-making would not only be shrewd but also highly effective. With more relevant data, you can create a bigger picture to take aggressive measures.
However, quick access to accurate and reliable data is not straightforward in the current climate of privacy concerns, fake news and conflicting information between sources.
The WHO has encouraged researchers, governments, business and scientific communities to collaborate and disseminate data among themselves to better understand the virus and its spread, and develop concrete action plans. This data will also be crucial in developing vaccines and preventing similar outbreaks in the future.
Artificial Intelligence and Data Analytics
AI and Big Data are at the forefront of the technological involvement in combating the global outbreak.
Techniques like web scraping and data mining play integral roles by gathering factual data and minimizing the flow of misinformation. This data helps doctors and health experts to assess their successes or failures, and reorient their actions.
Visualization
Tools like Healthmap (above) and the Johns Hopkins University dashboard are perfect examples, which have become some of the most popular resources for information on the current outbreak.
These use web scraping, data mining, machine learning and Geographic Information Systems technologies to scrape information from a variety of well-sourced sites, including local, national and international-level public hospitals and health centers, news reports, chatrooms, forums, etc. This disparate date is then organized to generate visualizations that show what course the outbreak is taking.
Social media is another useful source for such GIS technologies, where users’ posts can be scraped, and keywords (or hashtags) associated with the outbreak turned into actionable data to determine areas of interest.
Projects like these supplement traditional data-collection techniques used by organizations like the WHO, and are used by governments and health officials to develop prevention measures and action plans.
AI in Diagnostics
On 14 March, Chinese media outlet CGTN reported that AI could now detect cases of COVID-19 in 20 seconds with 96% accuracy. The AI algorithm convolutional neural network, combs through 5,000 CT scans to learn new inputs and can be trained in a week.
A deep learning classifier first analyzes images for abnormalities, which are then segmented and a massive extraction of texture features is applied. The AI can then instantly differentiate between the lungs of patients with common viral pneumonia or those with COVID-19, while also calculating the number and size of lesions, and determining the severity of each case.
Predictive Analytics
BlueDot is another project that has gone a step further. After collecting disease data, it predicts where it might next appear by using airline flight information. The resulting information is valuable in identifying potentially infectious travelers and isolating or quarantining them as soon as they land to contain the contagion at the very first point of contact, and prevent any further spread.
Other researches, like the Global Virome Project, are building genetic and ecological databases of viruses in animal populations which can potentially be transmitted to humans. The GVP researchers aim to develop vaccines and other preventive measures against potential future outbreaks.
Thanks to advances in AI, machine learning and GIS technologies, disaster response times have never been quicker. But as with everything, there are limitations to the current techniques. There are still some blind spots around the world — rural areas and their populations — which may be generating less or no online data at all.
Having said that, the enormous amount of data that these technologies collect can be used to train AI algorithms to better deal with the more disastrous disease outbreaks of the future.
P.S.: As the COVID-19 outbreak continues to affect lives all over the world, we’d like to urge our readers and customers to stay safe and take all preventive measures. We wish the very best of health to you and your loved ones.
Additional References:
- https://towardsdatascience.com/how-to-fight-the-coronavirus-with-ai-and-data-science-b3b701f8a08a
- https://www.techrepublic.com/article/how-twitter-data-mining-and-machine-learning-can-combat-the-coronavirus/
- https://www.zdnet.com/article/china-tech-giants-chip-in-to-combat-coronavirus-global-shows-impacted/
- https://time.com/5780683/coronavirus-ai/