Unveiling the Seasonal Rhythms of Opioid Purchases: Exploring Patterns, Spikes, and Insights

Introduction to the ARCOS Dataset

The Automation of Reports and Consolidated Orders System (ARCOS) is a data monitoring program managed by the Drug Enforcement Administration (DEA) to track the flow of controlled substances, including opioids, from their point of manufacture to the point of sale or distribution at the pharmacy level. This dataset provides a detailed view of the distribution of legally manufactured controlled substances and offers insights into patterns and trends across the United States between 2006 and 2019.


In light of the ongoing opioid crisis, understanding the dynamics of opioid distribution at a granular level is essential for crafting effective public health responses and regulatory measures. My objective was to analyze the Morphine Milligram Equivalents (MME) of opioids using the ARCOS dataset, focusing on a time series analysis from December 2005 to December 2019. MME is a standard measurement that allows for comparisons across different opioids, providing a view of opioid distribution intensity and trends over time. For this analysis, I only looked at the two leading opioids, hydrocodone and oxycodone. The filtered ARCOS dataset is very large at over 100 gigabytes, requiring much care in data wrangling.

Methodology: Time Series Analysis

Data Preparation

I aggregated the ARCOS transaction data on a weekly basis, calculating the total MME for each week across the study period. This approach provided us with a manageable and meaningful dataset for analyzing trends over time. All data manipulation was done in SQL.

CREATE TABLE pain_pills.weekly_aggregated_data AS
    date_trunc('week', "TRANSACTION_DATE") AS week_start,
    SUM("MME") AS total_quantity

Model Selection

To look into the time series data, I employed two forecasting models: the ARIMA (AutoRegressive Integrated Moving Average) model and the ETS (Error, Trend, Seasonality) model. These models were chosen for their ability to handle various patterns in time series data, such as seasonality, trend components, and error characteristics.

Diagnostic Checks

Using R, I conducted diagnostic checks to compare the performance of these models. These checks included analysis of the residuals (errors between predicted values and actual values), where ideally, the residuals should resemble white noise, indicating that all systematic information has been captured by the model.

I performed an adf test for stationarity, where the null hypothesis is non-stationary data. Stationarity indicates that the statistical properties of the data, such as its mean and variance, do not change over time. The results of the test, DF=-8.08, P = 0.01, indicate evidence of stationarity. Thus, I did not need to correct for this by differencing the time series.

After creating the ARIMA and ETS models, I verified model diagnostics. The Mean Absolute Percentage Error (MAPE) measures the average percentage difference between predicted and actual values. While both models exhibit MAPE values within a reasonable range, the ETS model’s slightly higher MAPE suggests a marginally larger deviation from actual values. Additionally, the Autocorrelation of Residuals at Lag 1 (ACF1) indicates the presence of residual autocorrelation, with the ETS model showing a lower ACF1 value, signifying less systematic pattern in residual errors.

Lastly, I ran the accuracy function on the forecasted data from both models:

Accuracy Metrics for ARIMA and ETS Models
Training set 1764997 1797123296 222754181 -3.487738 9.98954 1.007610 -0.4964332 ARIMA
Training set1 -71070317 1375957409 295217277 -31.356867 38.44625 1.335391 0.0965495 ETS

Each model was assessed based on several accuracy diagnostics, including Mean Error (ME), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Percentage Error (MPE), Mean Absolute Percentage Error (MAPE), Mean Absolute Scaled Error (MASE), and the autocorrelation of residuals at lag 1 (ACF1).

The ARIMA model demonstrated a lower MAE, suggesting that its average errors are smaller, which typically indicates a better fit to the data. However, its forecasts appeared somewhat less realistic when considering the context of the data, and its residuals showed significant autocorrelation (ACF1 = -0.496), indicating potential overfitting or unmodeled dynamics within the data.

On the other hand, the ETS model, while exhibiting a slightly higher MAE compared to ARIMA, showed no autocorrelation in the residuals (ACF1 close to zero). This indicates that the ETS model has effectively captured the underlying patterns in the data without leaving any systematic structure unexplained. Given these results, I decided to proceed with the ETS model for further analysis and forecasting.

The decomposed data shows strong seasonal influences in the seasonal plot. Perhaps the increases in opioid purchases is due to better weather during certain seasons, when people are more likely to engage in behaviors that could lead to injury. Another possible explanation are sports that take place during certain times of the year. Or, perhaps the spikes are simply due to pharmacies stocking up near the beginning of the year.

The downward trend in purchases is clearly shown in the trend plot. This could possibly be explained by changes in prescription practices by medical staff, or public health interventions and awareness of the opioid epidemic. Regulatory changes could also have had an effect in the number of opioid prescriptions, leading to pharmacies purchasing fewer amounts.

The forecast plot is in agreement with the previous trend plot, showing a gradual increase before falling toward the end of the study period. The ETS forecast shows that this number could continue to fall over the years, but since the confidence bands cross the 0 line, we should have caution about interpreting the forecast.


The time series analysis of opioid purchases in the ARCOS dataset has highlighted several trends in opioid distribution in the U.S., offering insights that can inform public health policies and regulatory strategies. Further research using this dataset could explore deeper into the causes of spikes or declines in opioid distributions, correlate these trends with public health outcomes, and evaluate the impact of policy changes over time.

Similar Posts

Leave a Reply