|

Mapping the Geography of Binge Drinking in Colorado

After completing my most recent analysis on the spatial trends of binge drinking in the US, I noticed that binge drinking is more popular in Colorado than I had assumed. I decided to take a further look into this I used census tract shape files from the Colorado Department of Public Health containing data on heavy drinking prevalence, education, and poverty rates.

View the webmap full screen

(Zoom in to see layers)

The methodology involved a multiple linear regression analysis to rigorously examine the statistical association between poverty, education, and heavy drinking. I also calculated spatial statistics to determine if there were any spatial relationships in the data.

Spatial Autocorrelation

First, I was curious as to whether there existed the phenomenon of spatial autocorrelation, that is, tracts with similar numbers of heavy drinkers are nearer to each other. For this, I calculated Global Moran’s I, which creates an index that ranges from -1 to 1. Values closer to -1 indicate negative spatial autocorrelation, where nearby values tend to be dissimilar. Values closer to 1 indicate positive spatial autocorrelation, where similar values tend to cluster. If there is significant spatial autocorrelation, steps need to be taken before conducting a regression analysis.

The results of Moran’s I indicate that there is moderate positive spatial autocorelation (Moran’s I: 0.59, P < 0.001). There is sufficient evidence to support areas of high and low number of binge drinkers tend to cluster together.

Next, let’s check if there’s spatial autocorrelation in our explanatory variables. The percent of people over 24 years old without a high school diploma is moderately positively autocorrelated, (Moran’s I = 0.27, P<0.001)

Lastly, we do the same with percent living in poverty:

Again, there is strong evidence of moderate spatial autocorrelation (Moran’s I = 0.41, P<0.001). We will have to create spatially lagged variables for our regression analysis.

Clusters & Hotspots

I next wanted to visualize where clusters and high/low outlier tracts were located. To do so, I ran an Anselin’s Local Moran’s I analysis. The results of the analysis are shown on the webmap I created for this project. However, we can see from the scatterplot that there is a moderately strong spatial relationship within the data with R2=0.62:

The last of the spatial analyses that I calculated was a Getis-Ord Gi* hotspot analysis. This creates a map of hot and cold spots that can show areas of interest for policymakers. The results showed a surprising cluster of strong hotspots near the rocky mountains. Most surprising to me was the fact that the Denver area was mostly not significant. If you have spent any amount of time in the city, you would know that drinking is a favorite pastime for residents. The map result from this analysis is included in the webmap.

Regression Analysis

Lastly, I created a regression model to calculate how poverty and education play into heavy drinking in Colorado. Using Geoda, I calculated a spatial weights matrix using the rook contiguity and ran a multiple linear regression.

The results were interesting. I discovered that there is a statistically signficiant relationship between never having obtained a high school diploma and heavy drinking. With every 1 percent increase in persons without a diploma, there is a 0.11 per 1,000 person decrease in heavy drinking (P < 0.0001; 95% CI: (-0.14, -0.08)). The results pertaining to living in poverty were not statistically significant (P=0.46, 95% CI: (-0.014, 0.08)).Per the intercept, if there were 0 people who did not have a high school diploma, and 0 people were living in poverty, the results estimate the prevalence of heavy drinking to be 7.1 people per 1,000 (P<0.001; 95% CI: (6.94, 7.32)). These findings sheds some light on the interplay between socioeconomic factors and alcohol consumption habits. The R-squared value of the model, 0.065, indicates a weak relationship between the independent and dependent variables. It would be worthwhile to look into other factors that may lead to heavy drinking.

SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES ESTIMATION
Data set            :  heavy_drinkers
Dependent Variable  :  HeavyDrink  Number of Observations: 1188
Mean dependent var  :     6.49731  Number of Variables   :    3
S.D. dependent var  :     2.05614  Degrees of Freedom    : 1185 
R-squared           :    0.066521  F-statistic           :     42.2223
Adjusted R-squared  :    0.064945  Prob(F-statistic)     : 1.93635e-18
Sum squared residual:     4688.41  Log likelihood        :    -2501.16
Sigma-square        :     3.95646  Akaike info criterion :     5008.31
S.E. of regression  :     1.98909  Schwarz criterion     :     5023.55
Sigma-square ML     :     3.94647
S.E of regression ML:     1.98657
-----------------------------------------------------------------------------
       Variable          Coefficient      Std.Error    t-Statistic   Prob.
-----------------------------------------------------------------------------
     CONSTANT             7.12816      0.0961576        74.1299     0.00000
   PercentNoDiploma      -0.10928      0.0154897       -7.05505     0.00000
   Percent_Poverty       -0.00563386     0.00770492    -0.731204    0.46481
-----------------------------------------------------------------------------
REGRESSION DIAGNOSTICS  
MULTICOLLINEARITY CONDITION NUMBER   3.831280
TEST ON NORMALITY OF ERRORS
TEST                  DF           VALUE             PROB
Jarque-Bera            2           945.9973          0.00000
DIAGNOSTICS FOR HETEROSKEDASTICITY  
RANDOM COEFFICIENTS
TEST                  DF           VALUE             PROB
Breusch-Pagan test     2            12.2490          0.00219
Koenker-Bassett test   2             4.1411          0.12611

The Jarque-Bera test, which is a test for the normality of the model residuals, was significant at P<0.001. This signals a strong departure from normality in the residuals which can affect the validity of the regression analysis. Additionally, the Breusch-Pagan test was also significant, at P<0.001, indicating the presence of heteroscedasticity in the residuals. This means that the variance of the errors varies across levels of predictors.

It is unclear why, according to these results, an increase in the number of people who never obtain a high school diploma leads to a decrease in the number of heavy drinkers per 1,000 people. Perhaps people with a higher level of education have greater access to alcohol. Another possibility is the influence of religion, as some religions have teachings against alcohol consumption. It is likely that the results are biased due to the presence of both non-normal residuals and heteroscedasticity of the residuals.

Conclusion

While the regression analysis is likely biased, the geospatial analysis were fruitful in uncovering relationships in the data. The presence of hot and cold spots of binge drinking in Colorado both highlights regions where heightened awareness and targeted public health interventions may be fruitful. Likewise, it may be worthwhile to examine the societal characteristics of those regions identified as cold spots.

Similar Posts

Leave a Reply