Girls of Steel x COVIDcast: Update #2

Data Anomalies Blog Post

Since our last update, the Girls of Steel COVIDcast group has been hard at work monitoring and providing suggestions for the COVIDcast website. In fact, a new version of the website was recently released, and it has been made more compatible for users accessing the data from a wide range of devices. Delphi consulted experts in designing the website and we were able to witness the transformation of the initial website to the new interface. Not only did the website undergo a series of changes, but so did the survey that was sent out via Facebook. To date, there have been eleven iterations of the survey, and it has evolved not only to become more mobile-friendly, but also more efficient. In fact, upon completing the survey, respondents are randomly categorized into modules A and B, and are asked different questions accordingly. Participants are assigned a module when they click on the link to fill out the form, and this is not only expected to help facilitate data analysis, but increase participation. Furthermore, in addition to monitoring the website, we have been looking at the data indicators to ensure there is consistent and accurate data being collected from Delphi’s resources. At first, we were able to use an internal tool to see how to process and track a range of data imported from behind the scenes, but currently, the information on the dashboard is being transferred onto the public website. This process is designed so any errors can be detected through unexpected fluctuation in the graph. Each data source has one graph, and we were assigned different indicators to check on a weekly basis. Our main focus is looking for any data anomalies for COVID cases and COVID deaths across all states. In other words, we are looking for any unexpected data fluctuations for these two signals at a larger scale, so we can compare the data from state to state. Tracking any spikes yields us the opportunity to source any data backlog or reporting error.

This is an image of the mobile version of survey that users are asked to fill out.

These images represent the same day in time on the old website (below) and the new website (above).

What are data anomalies?

Data anomalies are when inconsistent data is entered and skews graphs or information. Data anomalies can be shown in many different ways on different websites and for different information, but here is an example of a data anomaly on COVIDcast.

Here we can see that on April 3rd, Missouri stands out on the map for having a large amount of cases. This should already be a red flag but if you are unsure you can check the surrounding days to see if this abnormality in the data is recurring or if it only appears on that day. Let’s look at Missouri on the third of April a little bit closer.  If you scroll down a little you will see a graph like this.

 We can clearly see a giant jump made in the graph. This jump levels off to what we call a plateau. Plateaus can form when a large amount of information was either inputted or changed. This information will be logged as coming from that date(April 3rd in this case) but actually spreads out across a couple of days and weeks. At the top of the page we can confirm what these plateaus mean. 

These plateaus are good signifiers for where there may be data anomalies. Look back at the map and see if you can find another possible data anomaly. Delphi cares about all of this because it is important to have accurate and up to date information, even if it is in the past. This data is not only for us, but for future researchers as well and can be referenced. We want to make sure that all of our information is correct and data anomalies can cause some holes in our accuracy. That is why finding data anomalies is so important of a task and the reason we decided to dedicate time tackling them. 

Detecting Data Anomalies

Our most recent job of detecting anomalies is a very opinion-based task, so in more operational terms: a data anomaly is when there is a sudden spike or steep drop in the given statistics. For example, if Iowa has suddenly jumped from 70 reported COVID-19 cases per day to 200 cases. Our job was to report these abnormalities, specifically for COVID-19 cases and deaths. The process of communicating our observation was: 

  1. Go to the COVIDcast website and look at the wanted indicator (cases or deaths) for the United States and observe the states in their line-graph format. 
  2. If there is any suspicious activity in the data shown, click into that state’s statistics and see if there has already been a report that explains its behavior.
  3. If there is not a given report, go to the CMU-Delphi team’s sheet and file a report detailing the behavior of the anomaly, where it occurred, and over what period of time.
  4. Check the COVID-related news of that state to check if something occurred recently that can explain sudden spikes or drops. If so, add the linked news article and explanation, for example if there was paperwork that was not previously reported correctly (this helps the CMU-Delphi team announce what happened on the COVIDcast website). 

Our Impact & Skills We’ve Obtained

With our partnership with CMU, we have made a substantial impact on the community, reaching thousands of people that go onto the website every day. Our main goal is still to have an outside perspective that allows us to find suggestions and changes that can help the general public better understand statistics about the pandemic. This includes the many reiterations of the survey, which we have helped with edits. These suggestions and changes have impacted the survey and gone out to millions of people through Facebook. Additionally, our new work with data anomalies helps people understand odd occurrences in that data. 

Being a part of COVIDcast has given us new skills to continue working with. We’ve learned about how the virus is being tracked and the factors that go into the data about this pandemic that is affecting all of our lives. We’ve gained an understanding of what needs to be considered when managing such an important website with data coming in from many different sources, which we track.

What’s next?

Delphi is working on transitioning to a 15 month plan for tracking the COVID pandemic. They plan on collecting data on a daily basis so the data shown on COVIDcast will stay up to date and be more reliable. Another goal is to gain more detail about the specifics of the data the team is receiving (what was the date of the given test, when was it reported, etc).  Of course they also want to maintain relations with users and continue asking for their feedback on the website so they can always improve it and make it more convenient for anyone who uses COVIDcast. The Delphi team also plans to track seasonal epidemics for other diseases, such as influenza. This was the original goal for the Delphi team, and now that they have gained tools and resources in tracking COVID, they can use those same tools and resources to help track influenza. The team plans for bumps in the road as smaller epidemics are not nationally reported like COVID, and data sources may become less reliable and more delayed. In addition, the Delphi team also has a 3-5 year plan in which DELPHI will work with the national COVID system to maintain tracking of the pandemic, and share resources that will help track epidemics as well. The team also plans on making a computing system to analyze data to do the jobs that us at Girls of Steel are helping out with for the long run. Girls of Steel will continue analyzing statistics until the system has developed as well as anything else the DELPHI would like assistance completing.