Out[1]:

Covid-19 US Data Analysis April 23rdΒΆ

Crude CFR measures the number of cumulative deaths, divided by the cumulative number of cases. It is an admittedly problematic indicator. See the sections below for an analysis how crude CFR varies over time as an epidemic progresses.

State OverviewΒΆ

Time-shifted State ViewΒΆ

These charts show the growth of confirmed cases by state. To make developments more comparable across countries, each state curve is shifted in time so that day 0 corresponds to the moment when that state exceeds a threshold number of confirmed cases (here=100).

State Projections OverviewΒΆ

State projections are based on fitting curves to the daily changes in cases and deaths. This removes a challenge with cumulative data: each cumulative data point is correlated with all previous data points. So fitting on the daily changes should result in better accuracy. These fits are transferred to the cumulative values by symbolic integration.

To project current trends, we try and fit three families of curves: the derivative of an exponential, which is itself exponential; the derivative of a sigmoid, and the derivative of a sigmoid plus a sigmoid to simulate constant terminal growth. We select the value with the lowest squared error. Such projections should be taken with a grain of salt, as both types of functions grow and diverge rapidly. The global projection is the sum of all country projections, including countries not shown here.

The overview chart now shows 95% prediction ranges for the values. That is, 95% of future values on the given date, including measurement noise, should fall within the given error bound. These bounds have tightened as we now use the full covariance matrix, including correlations between different parameters. Confidence ranges for the underlying future value, i.e. without noise, are even tighter. Those are shown on the country details charts.

State Projection DetailsΒΆ

State projections are based on fitting curves to the daily changes in cases and deaths. Charts show both the 95% confidence intervals in blue shading, and 95% prediction intervals in blue dashes. The confidence interval means that the true value, without measurement noise, lies in that range. The prediction intervals mean that future data point measurements, including noise, will lie in that range. Newly added in orange is a projection of crude CFR based on cumulated fatality and cumulated case projections.

Note that passing the peak of new infections does not indicate the peak strain on healthcare systems is already behind us; also note this does not imply that social distancing can be relaxed. Without a reduction in physical contacts, the curves would soon resume exponential growth.

Predicting new deaths from new casesΒΆ

When the fitted curves for new cases and for new deaths both depart from the exponential, we can try and predict new deaths from new cases. Intuitively, this should be governed by the case fatality rate (CFR) and a time delay. As a formula, let's try and apply predictedNewDeaths(t)=CFR x newCases(t - delay). Turns out, this is already a fair match, but two more refinements are necessary. Firstly, the time delay is in reality a random distribution of delays. Let's fold a normal distribution of given breadth over the newCases function to reflect this, i.e. predictedNewDeaths(t)=CFR x (N(1.0,sigma)*newCases)(t-delay). Secondly, health systems don't exhibit a fixed CFR as they deliver worse outcomes when overloaded. Let's apply an overload factor to the CFR around the peak of the case curve. Results are shown below.

For the states where data is available, inferred CFRs and delays range widely. Part of this can likely be attributed to a difference in testing intensity. Part of it may also be attributable to locally overwhelmed health systems, and/or general differences in health system capacity relative to population size.

Comparing Inferred CFRs with Crude CFRsΒΆ

Matching curves for new deaths and new cases, the pattern new_deaths(t)=inferred_cfr * new_cases(t - delay) generally seems to hold. The two parameters are an inferred case fatality rate inferred_cfr, and the average time between reporting of a case and the reporting of a fatal outcome, delay. LetΒ΄s see how the inferred CFR relates to crude CFR, defined as number of cumulative deaths over cumulative cases.

As you can see for selected states below, crude CFRs vary strongly over time. They first dip from a high level in the eary stages of an epidemic. Then they recover almost symmetrically, and converge on a final value. However, they can start both above and below the inferred CFRs. They can also end both above and below. Hence, crude CFR remains a poor indicator.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-272cb6a4f679> in <module>
     20     # Calculate CFRs
     21     cfr=[d/c if d is not math.nan and c is not math.nan else math.nan for d,c in zip(dc, cc)]
---> 22     inferredCFR=inferredCFRLookup[countryName]
     23 
     24     # Prepare plot area

KeyError: 'California'

A look at weekday patternsΒΆ

I have suspected for a while that there might be a weekday pattern to the data we see. To test this hypothesis, let's calculate the percentage growth of new cases for each day, subtract the average for the prior week from this, and bin the growth rates by day of week.

Here is what we get for the ten most impacted states: New York, California and Illinois appear to display the largest weekday differences.

Return to daily series overview. Data source: Johns Hopkins. For questions and comments, please reach out to me on LinkedIn or Twitter.