Applied Spatial Statistics(八):GWR 和 MGWR 示例


这是一个基本示例笔记本,演示了如何校准 GWR 或 MGWR 模型。

安装

pip install mgwr

#pip install mgwr

Load packages

import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import pandas as pd
import libpysal as ps 
from libpysal.weights import Queen
from esda.moran import Moran

#MGWR functions
from mgwr.gwr import GWR,MGWR
from mgwr.sel_bw import Sel_BW

Load voting dataset

voting = pd.read_csv('https://raw.github.com/Ziqi-Li/gis5122/master/data/voting_2020.csv')

voting[['median_income']] = voting[['median_income']]/10000
shp = gpd.read_file("https://raw.github.com/Ziqi-Li/gis5122/master/data/us_counties.geojson")
#Merge the shapefile with the voting data by the common county_id
shp_voting = shp.merge(voting, on ="county_id")

#Dissolve the counties to obtain boundary of states, used for mapping
state = shp_voting.dissolve(by='STATEFP').geometry.boundary
variable_names = ['sex_ratio', 'pct_black', 'pct_hisp',
                  'pct_bach', 'median_income','ln_pop_den']


y = shp_voting[['new_pct_dem']].values

X = shp_voting[variable_names].values

在 GWR 或 MGWR 中,我们经常建议对独立变量和因变量进行标准化。这里的标准化意味着我们减去平均值并除以标准化值。

这种方法的好处是,获得的系数变得“无单位”,允许跨变量和位置比较变量重要性。

X = (X - X.mean(axis=0))/X.std(axis=0)
y = (y - y.mean(axis=0))/y.std(axis=0)

我们需要将坐标输入到 GWR。

coords = shp_voting[['proj_X', 'proj_Y']].values
两步拟合 GWR 模型
  • 选择最佳带宽
  • 使用最佳带宽拟合 GWR 模型

默认内核是自适应(最近邻居数)双平方。

gwr_selector = Sel_BW(coords, y, X)

gwr_bw = gwr_selector.search(verbose=True)

print("Selected optimal bandwidth is:", gwr_bw)
Bandwidth:  1219.0 , score:  4058.94
Bandwidth:  1938.0 , score:  4614.76
Bandwidth:  774.0 , score:  3557.79
Bandwidth:  499.0 , score:  3140.51
Bandwidth:  329.0 , score:  2802.09
Bandwidth:  224.0 , score:  2530.12
Bandwidth:  159.0 , score:  2375.52
Bandwidth:  119.0 , score:  2310.56
Bandwidth:  94.0 , score:  2291.48
Bandwidth:  79.0 , score:  2297.64
Bandwidth:  104.0 , score:  2296.46
Bandwidth:  89.0 , score:  2289.15
Bandwidth:  85.0 , score:  2289.31
Bandwidth:  91.0 , score:  2289.59
Bandwidth:  87.0 , score:  2291.76
Selected optimal bandwidth is: 89.0

使用最佳 bw 拟合模型

gwr_results = GWR(coords, y, X, bw=gwr_bw,name_x=variable_names).fit()

GWR 输出摘要

gwr_results.summary()
===========================================================================
Model type                                                         Gaussian
Number of observations:                                                3103
Number of covariates:                                                     7

Global Regression Results
---------------------------------------------------------------------------
Residual sum of squares:                                           1213.006
Log-likelihood:                                                   -2945.693
AIC:                                                               5905.385
AICc:                                                              5907.432
BIC:                                                             -23679.220
R2:                                                                   0.609
Adj. R2:                                                              0.608

Variable                              Est.         SE  t(Est/SE)    p-value
------------------------------- ---------- ---------- ---------- ----------
Intercept                           -0.000      0.011     -0.000      1.000
sex_ratio                            0.004      0.012      0.350      0.727
pct_black                            0.434      0.013     34.477      0.000
pct_hisp                             0.206      0.011     18.026      0.000
pct_bach                             0.574      0.017     34.171      0.000
median_income                       -0.144      0.017     -8.403      0.000
ln_pop_den                           0.227      0.014     16.502      0.000

Geographically Weighted Regression (GWR) Results
---------------------------------------------------------------------------
Spatial kernel:                                           Adaptive bisquare
Bandwidth used:                                                      89.000

Diagnostic information
---------------------------------------------------------------------------
Residual sum of squares:                                            246.915
Effective number of parameters (trace(S)):                          548.884
Degree of freedom (n - trace(S)):                                  2554.116
Sigma estimate:                                                       0.311
Log-likelihood:                                                    -475.996
AIC:                                                               2051.760
AICc:                                                              2289.148
BIC:                                                               5373.125
R2:                                                                   0.920
Adjusted R2:                                                          0.903
Adj. alpha (95%):                                                     0.001
Adj. critical t value (95%):                                          3.419

Summary Statistics For GWR Parameter Estimates
---------------------------------------------------------------------------
Variable                   Mean        STD        Min     Median        Max
-------------------- ---------- ---------- ---------- ---------- ----------
Intercept                -0.036      0.760     -2.428     -0.038      5.779
sex_ratio                -0.016      0.098     -0.463     -0.023      0.449
pct_black                 0.534      0.867     -5.910      0.608      9.513
pct_hisp                  0.327      0.373     -1.335      0.304      3.561
pct_bach                  0.406      0.236     -0.507      0.404      1.249
median_income            -0.191      0.190     -0.885     -0.176      0.289
ln_pop_den                0.249      0.253     -0.562      0.219      1.416
===========================================================================

局部估计值可以从gwr_results.params中获得,它返回一个 n x p 数组,其中 p 是模型中的预测变量的数量(包括截距)。

variable_names
['sex_ratio',
 'pct_black',
 'pct_hisp',
 'pct_bach',
 'median_income',
 'ln_pop_den']
gwr_results.params[:,4]
array([ 0.56501565,  0.61552673,  0.42141522, ..., -0.09090834,
        0.75477483,  0.39401467])
from matplotlib import colors

ax = shp_voting.plot(column=gwr_results.params[:,6],figsize=(10,5),legend=True, 
                     linewidth=0.0,aspect=1)

plt.title("Coefficients of ln_pop_den",fontsize=12)
Text(0.5, 1.0, 'Coefficients of ln_pop_den')

在这里插入图片描述

编写一些绘图代码,将参数估计表面全部可视化。我们需要将 GWR 结果与县 GeoDataFrame 连接起来。

from mpl_toolkits.axes_grid1 import make_axes_locatable
from mgwr.utils import shift_colormap,truncate_colormap
from matplotlib import cm,colors

def param_plots(result, gdf, names=[], filter_t=False,figsize=(10, 10)):
    
    #Size of the plot. Here we have a 2 by 2 layout.
    k = gwr_results.k
    
    fig, axs = plt.subplots(int(k/2)+1, 2, figsize=figsize)
    axs = axs.ravel()
    
    #The max and min values of the color bar.
    vmin = -0.8
    vmax = 0.8
    
    cmap = cm.get_cmap("bwr_r")
    norm = colors.BoundaryNorm(np.arange(-0.8,0.9,0.1),ncolors=256)
    
    for j in range(k):
        
        pd.concat([gdf,pd.DataFrame(np.hstack([result.params,result.bse]))],axis=1).plot(ax=axs[j],column=j,vmin=vmin,vmax=vmax,
                                                                                         cmap=cmap,norm=norm,linewidth=0.1,edgecolor='white',aspect=1)
        axs[j].set_title("Parameter estimates of \n" + names[j],fontsize=10)
        
        if filter_t:
            rslt_filtered_t = result.filter_tvals()
            if (rslt_filtered_t[:,j] == 0).any():
                gdf[rslt_filtered_t[:,j] == 0].plot(color='lightgrey', ax=axs[j],linewidth=0.1,edgecolor='white',aspect=1)
        
        plt.axis('off')
    
    fig = axs[j].get_figure()
    cax = fig.add_axes([0.99, 0.2, 0.025, 0.6])
    sm = plt.cm.ScalarMappable(cmap=cmap,norm=norm)
    # fake up the array of the scalar mappable. Urgh...
    sm._A = []
    fig.colorbar(sm, cax=cax)
以下是从 GWR 获得的参数估计图。每个图代表每个预测因子与 PctBach 之间的空间关系。
  • 正(负)关系以红色(蓝色)显示。
  • 较强(较弱)的关系颜色较深(较浅)。
param_plots(gwr_results, shp_voting, names=['intercept'] + variable_names,figsize = (10,15))
/var/folders/mp/9px298sd6vs8xccb_3sql0dr0000gp/T/ipykernel_19922/1374554983.py:17: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
  cmap = cm.get_cmap("bwr_r")

在这里插入图片描述

现在让我们检查一下 GWR 模型的残差。

#Here we use the Queen contiguity
w = Queen.from_dataframe(shp_voting)

#row standardization
w.transform = 'R'

residual_moran = Moran(gwr_results.resid_response.reshape(-1), w)
residual_moran.I
/var/folders/mp/9px298sd6vs8xccb_3sql0dr0000gp/T/ipykernel_14666/2963299590.py:2: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w = Queen.from_dataframe(shp_voting)
/Users/ziqili/anaconda3/lib/python3.11/site-packages/libpysal/weights/weights.py:224: UserWarning: The weights matrix is not fully connected: 
 There are 3 disconnected components.
 There are 2 islands with ids: 2441, 2701.
  warnings.warn(message)


('WARNING: ', 2441, ' is an island (no neighbors)')
('WARNING: ', 2701, ' is an island (no neighbors)')





0.1142496644549166

我们经常会观察到 GWR 模型的残差具有较低的 Moran’s I 值,这表明已考虑了空间结构。局部截距是一种内在背景效应,例如,表明有多少影响可以归因于“位置”。

GWR 推断

局部系数显著性

以下是**显著性(p<0.05)**参数估计值的图。显著性检验已调整以解决多重检验问题(Bonferroni)。

不显著的参数用灰色遮盖。

这是通过 param_plots() 函数中的 result.filter_tvals() 完成的。

param_plots(gwr_results, shp_voting, names=['intercept'] + variable_names,figsize = (10,15), filter_t=True)
/var/folders/mp/9px298sd6vs8xccb_3sql0dr0000gp/T/ipykernel_14666/3391375418.py:17: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
  cmap = cm.get_cmap("bwr_r")

在这里插入图片描述


两步拟合 MGWR 模型
  • 寻找最佳带宽
  • 使用最佳带宽拟合 MGWR 模型

注意:当数据超过 5,000 条记录时,MGWR 会变慢。

%%time
mgwr_selector = Sel_BW(coords, y, X,multi=True)
mgwr_bw = mgwr_selector.search(verbose=True)

print("Selected optimal bandwidth is:", gwr_bw)
mgwr_results = MGWR(coords, y, X, selector=mgwr_selector,name_x=variable_names).fit()
mgwr_results.summary()
===========================================================================
Model type                                                         Gaussian
Number of observations:                                                3103
Number of covariates:                                                     7

Global Regression Results
---------------------------------------------------------------------------
Residual sum of squares:                                           1213.006
Log-likelihood:                                                   -2945.693
AIC:                                                               5905.385
AICc:                                                              5907.432
BIC:                                                             -23679.220
R2:                                                                   0.609
Adj. R2:                                                              0.608

Variable                              Est.         SE  t(Est/SE)    p-value
------------------------------- ---------- ---------- ---------- ----------
Intercept                           -0.000      0.011     -0.000      1.000
sex_ratio                            0.004      0.012      0.350      0.727
pct_black                            0.434      0.013     34.477      0.000
pct_hisp                             0.206      0.011     18.026      0.000
pct_bach                             0.574      0.017     34.171      0.000
median_income                       -0.144      0.017     -8.403      0.000
ln_pop_den                           0.227      0.014     16.502      0.000

Multi-Scale Geographically Weighted Regression (MGWR) Results
---------------------------------------------------------------------------
Spatial kernel:                                           Adaptive bisquare
Criterion for optimal bandwidth:                                       AICc
Score of Change (SOC) type:                                     Smoothing f
Termination criterion for MGWR:                                       1e-05

MGWR bandwidths
---------------------------------------------------------------------------
Variable             Bandwidth      ENP_j   Adj t-val(95%)   Adj alpha(95%)
Intercept               44.000    140.857            3.575            0.000
sex_ratio             1934.000      3.412            2.442            0.015
pct_black               44.000    136.138            3.566            0.000
pct_hisp              1275.000      3.610            2.463            0.014
pct_bach               180.000     31.072            3.157            0.002
median_income           44.000    146.952            3.587            0.000
ln_pop_den             228.000     21.568            3.049            0.002

Diagnostic information
---------------------------------------------------------------------------
Residual sum of squares:                                            224.288
Effective number of parameters (trace(S)):                          483.608
Degree of freedom (n - trace(S)):                                  2619.392
Sigma estimate:                                                       0.293
Log-likelihood:                                                    -326.874
AIC:                                                               1622.964
AICc:                                                              1802.784
BIC:                                                               4550.059
R2                                                                    0.928
Adjusted R2                                                           0.914

Summary Statistics For MGWR Parameter Estimates
---------------------------------------------------------------------------
Variable                   Mean        STD        Min     Median        Max
-------------------- ---------- ---------- ---------- ---------- ----------
Intercept                -0.111      0.542     -1.276     -0.055      0.955
sex_ratio                -0.022      0.007     -0.038     -0.021     -0.010
pct_black                 0.497      0.322     -0.979      0.580      1.109
pct_hisp                  0.285      0.027      0.227      0.290      0.329
pct_bach                  0.401      0.143     -0.098      0.397      0.744
median_income            -0.197      0.168     -1.069     -0.170      0.342
ln_pop_den                0.243      0.120      0.040      0.244      0.505
===========================================================================
#Here we use the Queen contiguity
w = Queen.from_dataframe(shp_voting)

#row standardization
w.transform = 'R'

residual_moran = Moran(mgwr_results.resid_response.reshape(-1), w)
residual_moran.I
('WARNING: ', 2441, ' is an island (no neighbors)')
('WARNING: ', 2701, ' is an island (no neighbors)')


/var/folders/mp/9px298sd6vs8xccb_3sql0dr0000gp/T/ipykernel_14666/3541487640.py:2: FutureWarning: `use_index` defaults to False but will default to True in future. Set True/False directly to control this behavior and silence this warning
  w = Queen.from_dataframe(shp_voting)
/Users/ziqili/anaconda3/lib/python3.11/site-packages/libpysal/weights/weights.py:224: UserWarning: The weights matrix is not fully connected: 
 There are 3 disconnected components.
 There are 2 islands with ids: 2441, 2701.
  warnings.warn(message)





0.01835193713566798
param_plots(mgwr_results, shp_voting, names=['intercept'] + variable_names,figsize = (10,15), filter_t=True)
/var/folders/mp/9px298sd6vs8xccb_3sql0dr0000gp/T/ipykernel_14666/3391375418.py:17: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
  cmap = cm.get_cmap("bwr_r")

在这里插入图片描述

We can see that the relationships will vary at different spatial scales.


OLS vs. GWR vs. MGWR

From the comparison, we can clearly see an advantage of MGWR over GWR by allowing the bandwidth to vary across covariates.

Metric OLS GWR MGWR
R2 0.609 0.920 0.928
AICc 5907.4 2289.1 1802.7
Moran’s I of residuals 0.60 0.11 0.018

relationships will vary at different spatial scales.


OLS vs. GWR vs. MGWR

From the comparison, we can clearly see an advantage of MGWR over GWR by allowing the bandwidth to vary across covariates.

Metric OLS GWR MGWR
R2 0.609 0.920 0.928
AICc 5907.4 2289.1 1802.7
Moran’s I of residuals 0.60 0.11 0.018

Logo

GitCode 天启AI是一款由 GitCode 团队打造的智能助手,基于先进的LLM(大语言模型)与多智能体 Agent 技术构建,致力于为用户提供高效、智能、多模态的创作与开发支持。它不仅支持自然语言对话,还具备处理文件、生成 PPT、撰写分析报告、开发 Web 应用等多项能力,真正做到“一句话,让 Al帮你完成复杂任务”。

更多推荐