FUNDUS - The urban geography of inequalities - The case of Budapest

The FUNDUS project of Urbanum Lab (a tech lab empowered by the interdisciplinary Urbanum Research Foundation), as its very title suggests, aims at laying the foundations for a globally applicable, open and accessible approach (including technological as well as methodological contributions) for the assessment of how economical factors of living (e.g., real estate prices) correlate with the quality of life in a certain urban environment, as captured by environmental, infrastructural and other social indicators.

In this notebook, we will be focusing on a simple, yet central question with manifold interpretations and consequences: do higher (average) real estate prices indicate a greener environment, less prone to the heat island phenomenon? (In simpler words: does more expensive mean greener and cooler in the summer?)

Outline

In this notebook, we are using Budapest as an example, and part of our data comes from our own scraper solution to obtain average property prices from real estate offers on the internet.

In particular, the notebook is structured as follows:

  • We present technical details on the data being used,
  • we summarize the technical takeaways provided,
  • we present preliminary results of our geoeconomical analysis, and
  • we introduce a multi-domain integration platform

before we conclude and summarize the future steps to take.

How to reproduce our results

This notebook presents the first results of our exploratory data analysis. We would like to get a glimpse into the data and its possible use. First, you have to run the property price scraper. You can find more info on getting the property prices data in the repository of the project. We assume that you have a WEkEO account, if this is not the case, please register here. It is a good practice to install all the project requirements using a separated Python 3.9.10 (or higher) virtual environment. Use requirements.txt to install all dependencies. Last, you have to configure your .hdarc file using your WEkEO credentials. This article shows you how to do this.

If you would like to adapt our notebook to a different municipality or for a different time horizon, use WEkEO's online data exploration tool.

You might find the BoundingBox tool useful to get latitude and longitude coordinates of a given area.

Data used

WEkEO datasets

Dataset queries were generated by using the WEkEO online platform. The queries can be fund in the data/jsons folder

  • Global 10-daily Leaf Area Index 333m
    {
    "datasetId": "EO:CLMS:DAT:CGLS_GLOBAL_LAI300_V1_333M",
    "dateRangeSelectValues": [
      {
        "name": "dtrange",
        "start": "2022-06-01T00:00:00.000Z",
        "end": "2022-06-30T23:59:59.999Z"
      }
    ]
    }
    
  • Level 2 Land - Sea and Land Surface Temperature Radiometer (SLSTR) - Sentinel-3
    {
    "datasetId": "EO:ESA:DAT:SENTINEL-3:SL_2_LST___",
    "boundingBoxValues": [
      {
        "name": "bbox",
        "bbox": [
          18.99804053609134,
          47.42120186691113,
          19.190237776905892,
          47.58048586099437
        ]
      }
    ],
    "dateRangeSelectValues": [
      {
        "name": "position",
        "start": "2022-06-01T00:00:00.000Z",
        "end": "2022-06-30T00:00:00.000Z"
      }
    ],
    "stringChoiceValues": [
      {
        "name": "productType",
        "value": "LST"
      },
      {
        "name": "timeliness",
        "value": "Near+Real+Time"
      },
      {
        "name": "orbitDirection",
        "value": "ascending"
      },
      {
        "name": "processingLevel",
        "value": "LEVEL2"
      }
    ]
    }
    
  • Global 10-daily Fraction of Vegetation Cover 333m
    {
    "datasetId": "EO:CLMS:DAT:CGLS_GLOBAL_FCOVER300_V1_333M",
    "dateRangeSelectValues": [
      {
        "name": "dtrange",
        "start": "2022-06-01T00:00:00.000Z",
        "end": "2022-06-30T23:59:59.999Z"
      }
    ]
    }
    

External dataset

  • The square meter prices dataset was collected on 12th July 2022 using our freely available scraper. The data is in the data/aggregated folder. WARNING: Check the README.md file of the scraper to get your own data.
  • Data on the POIs was downloaded from OpenStreetMap using the pyrosm package. For more details see this repo
  • Thanks to Járókelő - a platform for street maintenance and for communication between citizens and local administrations - we got data on various issues reported by volunteers. For more on this, again, see this repo

Technical takeaways

The notebook provides a novel methodology for exploratory data analysis in the field of urban digital geography, instantiated using a limited, yet appropriate example (Budapest, Hungary). At the end of this notebook, you will know:

  • How to aggregate data using the h3 library,
  • how to visualize the data using the pydeck package, and
  • how to investigate spatial discrepancies along property prices and environmental factors

Data acquisition

Library imports

Here, we import packages for the project.

In [1]:
!pip install h3 altair pydeck xarray hda
Collecting h3
  Downloading h3-3.7.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 15.6 MB/s eta 0:00:01
Collecting altair
  Downloading altair-4.2.0-py3-none-any.whl (812 kB)
     |████████████████████████████████| 812 kB 62.4 MB/s eta 0:00:01
Collecting pydeck
  Downloading pydeck-0.7.1-py2.py3-none-any.whl (4.3 MB)
     |████████████████████████████████| 4.3 MB 71.1 MB/s eta 0:00:01█▏          | 2.8 MB 71.1 MB/s eta 0:00:01
Requirement already satisfied: xarray in /home/jovyan/.local/lib/python3.8/site-packages (0.19.0)
Requirement already satisfied: hda in /opt/conda/lib/python3.8/site-packages (0.1.0)
Requirement already satisfied: numpy in /home/jovyan/.local/lib/python3.8/site-packages (from altair) (1.17.3)
Requirement already satisfied: toolz in /home/jovyan/.local/lib/python3.8/site-packages (from altair) (0.11.1)
Requirement already satisfied: pandas>=0.18 in /home/jovyan/.local/lib/python3.8/site-packages (from altair) (0.25.3)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.8/site-packages (from altair) (2.11.2)
Requirement already satisfied: jsonschema>=3.0 in /opt/conda/lib/python3.8/site-packages (from altair) (3.2.0)
Requirement already satisfied: entrypoints in /opt/conda/lib/python3.8/site-packages (from altair) (0.3)
Requirement already satisfied: ipykernel>=5.1.2; python_version >= "3.4" in /opt/conda/lib/python3.8/site-packages (from pydeck) (6.0.2)
Requirement already satisfied: ipywidgets>=7.0.0 in /opt/conda/lib/python3.8/site-packages (from pydeck) (7.6.4)
Requirement already satisfied: traitlets>=4.3.2 in /opt/conda/lib/python3.8/site-packages (from pydeck) (4.3.2)
Requirement already satisfied: setuptools>=40.4 in /opt/conda/lib/python3.8/site-packages (from xarray) (49.6.0.post20201009)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.8/site-packages (from hda) (4.50.2)
Requirement already satisfied: requests>=2.5.0 in /opt/conda/lib/python3.8/site-packages (from hda) (2.24.0)
Requirement already satisfied: pytz>=2017.2 in /home/jovyan/.local/lib/python3.8/site-packages (from pandas>=0.18->altair) (2021.1)
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.8/site-packages (from pandas>=0.18->altair) (2.8.1)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.8/site-packages (from jinja2->altair) (1.1.1)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (0.17.3)
Requirement already satisfied: six>=1.11.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (1.15.0)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.8/site-packages (from jsonschema>=3.0->altair) (20.2.0)
Requirement already satisfied: ipython<8.0,>=7.23.1 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (7.25.0)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (1.4.1)
Requirement already satisfied: jupyter-client<7.0 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (6.1.7)
Requirement already satisfied: matplotlib-inline<0.2.0,>=0.1.0 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.1.2)
Requirement already satisfied: tornado<7.0,>=4.2 in /opt/conda/lib/python3.8/site-packages (from ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (6.1)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (3.5.1)
Requirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (5.0.8)
Requirement already satisfied: jupyterlab-widgets>=1.0.0; python_version >= "3.6" in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (1.0.1)
Requirement already satisfied: ipython-genutils~=0.2.0 in /opt/conda/lib/python3.8/site-packages (from ipywidgets>=7.0.0->pydeck) (0.2.0)
Requirement already satisfied: decorator in /opt/conda/lib/python3.8/site-packages (from traitlets>=4.3.2->pydeck) (4.4.2)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (2021.5.30)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.8/site-packages (from requests>=2.5.0->hda) (1.25.11)
Requirement already satisfied: jedi>=0.16 in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.17.2)
Requirement already satisfied: backcall in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.2.0)
Requirement already satisfied: pickleshare in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.7.5)
Requirement already satisfied: pexpect>4.3; sys_platform != "win32" in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (4.8.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (3.0.8)
Requirement already satisfied: pygments in /opt/conda/lib/python3.8/site-packages (from ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (2.7.1)
Requirement already satisfied: jupyter-core>=4.6.0 in /opt/conda/lib/python3.8/site-packages (from jupyter-client<7.0->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (4.6.3)
Requirement already satisfied: pyzmq>=13 in /opt/conda/lib/python3.8/site-packages (from jupyter-client<7.0->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (19.0.2)
Requirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (6.1.5)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /opt/conda/lib/python3.8/site-packages (from jedi>=0.16->ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.7.1)
Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.8/site-packages (from pexpect>4.3; sys_platform != "win32"->ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.6.0)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython<8.0,>=7.23.1->ipykernel>=5.1.2; python_version >= "3.4"->pydeck) (0.2.5)
Requirement already satisfied: Send2Trash in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (1.5.0)
Requirement already satisfied: terminado>=0.8.3 in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.9.1)
Requirement already satisfied: argon2-cffi in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (20.1.0)
Requirement already satisfied: nbconvert in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (5.6.0)
Requirement already satisfied: prometheus-client in /opt/conda/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.8.0)
Requirement already satisfied: cffi>=1.0.0 in /opt/conda/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (1.14.3)
Requirement already satisfied: testpath in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.4.4)
Requirement already satisfied: bleach in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (3.2.1)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (1.4.2)
Requirement already satisfied: mistune<2,>=0.8.1 in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.8.4)
Requirement already satisfied: defusedxml in /opt/conda/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.6.0)
Requirement already satisfied: pycparser in /opt/conda/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (2.20)
Requirement already satisfied: webencodings in /opt/conda/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (0.5.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (20.4)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets>=7.0.0->pydeck) (2.4.7)
Installing collected packages: h3, altair, pydeck
Successfully installed altair-4.2.0 h3-3.7.4 pydeck-0.7.1
In [2]:
import json
import os
from functools import reduce

import h3
import altair as at
import numpy as np
import pandas as pd
import pydeck as pdk
import xarray as xr
from hda import Client
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-fchafuhd because the default path (/home/jovyan/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Gathering the data

WARNING: Downloading the datasets takes time! The data will be downloaded into the current working directory. Use your favorite tools to move the downloaded files to the appropriate folders.

SLSTR

In [ ]:
c = Client(debug=True)
with open("../data/jsons/temperature.json") as infile:
    query = json.load(infile)
matches = c.search(query)
# matches.download()

LAI

In [ ]:
c = Client(debug=True)
with open("../data/jsons/lai.json") as infile:
    query = json.load(infile)
matches = c.search(query)
# matches.download()

FCOVER

In [ ]:
c = Client(debug=True)
with open("../data/jsons/fcover.json") as infile:
    query = json.load(infile)
matches = c.search(query)
# matches.download()

Your operating system and tools might be different, below you can read tips which might be useful on a Linux machine

  • The notebook assumes that all data is in the data folder.
  • Move all zip and nc files to the folder (e.g. you can use the mv command, like mv ../data "*.zip")
  • Make folders for the data fiels (cd data; mkdir leaf_data temp_data fcover)
  • Move the *.nc files to the corresponding folders (e.g. move Leaf Area Index data into leaf_data)
  • Move the .zip files into the temp_data folder `cd temp_data; unzip ".zip"; rm "*.zip"`

Cleaning and transforming data

Property square meter prices

We scraped a Hungarian real estate listing site to get property prices in Budapest. The listing entries were geocoded using the geocoder package. The geo-coordinates were indexed using the H3 hexagonal geospatial indexing system. You can check the resolution table of the cell areas here. For more details, you can check the repository of the scraper.

The data looks like this:

In [3]:
df5 = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
df6 = pd.read_csv("../data/aggregated/l6.tsv", sep="\t")
df7 = pd.read_csv("../data/aggregated/l7.tsv", sep="\t")
df8 = pd.read_csv("../data/aggregated/l8.tsv", sep="\t")

df7.head()
Out[3]:
l7 price
0 871e020caffffff 1.219298e+06
1 871e02684ffffff 1.200000e+06
2 871e030a9ffffff 1.508301e+06
3 871e03134ffffff 9.494950e+05
4 871e03449ffffff 1.197017e+06

The hexagons listed in these files constitues our area of interest.

Temperature data

The code below aggregates the average temperature data on various levels of H3 hashing and writes the results to a tsv file.

In [4]:
h3_l5 = set(df5["l5"])
h3_l6 = set(df6["l6"])
h3_l7 = set(df7["l7"])
h3_l8 = set(df8["l8"])

root_folder = "../data/temp_data"
dirs = [
    os.path.join(root_folder, d)
    for d in os.listdir(root_folder)
    if os.path.isdir(os.path.join(root_folder, d))
]


def is_within_bounding_box(lat, long):
    if 47.392134 < lat < 47.601216 and 18.936234 < long < 19.250031:
        return True
    else:
        return False


latlong_temp = {}
for inpath in dirs:
    # geodetic_tx.nc -> latitude_tx, longitude_tx
    geodetic = xr.open_dataset(
        filename_or_obj=os.path.join(inpath, "geodetic_tx.nc"), engine="netcdf4"
    )
    lat = geodetic.data_vars["latitude_tx"].to_numpy().flatten()
    long = geodetic.data_vars["longitude_tx"].to_numpy().flatten()
    # met_tx.nc -> temperature_tx
    met_tx = xr.open_dataset(
        filename_or_obj=os.path.join(inpath, "met_tx.nc"), engine="netcdf4"
    )
    temp = met_tx.data_vars["temperature_tx"].to_numpy().flatten()
    # LST_ancillary_ds.nc -> NDVI (empyt :()
    lst = xr.open_dataset(
        filename_or_obj=os.path.join(inpath, "LST_ancillary_ds.nc"), engine="netcdf4"
    )
    ndvi = lst.data_vars["NDVI"].to_numpy().flatten()

    temp_data = zip(lat, long, temp)
    temp_data = (e for e in temp_data if is_within_bounding_box(e[0], e[1]))
    for e in temp_data:
        k = (e[0], e[1])
        if latlong_temp.get(k, False):
            latlong_temp[k] = (latlong_temp[k] + e[2]) / 2
        else:
            latlong_temp[k] = e[2]

with open("../data/temp_budapest.tsv", "w") as outfile:
    h = "lat\tlong\tcelsius\tl5\tl6\tl7\tl8\n"
    outfile.write(h)
    for k, v in latlong_temp.items():
        l5 = h3.geo_to_h3(k[0], k[1], 5)
        l6 = h3.geo_to_h3(k[0], k[1], 6)
        l7 = h3.geo_to_h3(k[0], k[1], 7)
        l8 = h3.geo_to_h3(k[0], k[1], 8)
        if l5 in h3_l5 and l6 in h3_l6 and l7 in h3_l7 and l8 in h3_l8:
            o = (
                str(k[0])
                + "\t"
                + str(k[1])
                + "\t"
                + str(v - 273.15)
                + "\t"
                + l5
                + "\t"
                + l6
                + "\t"
                + l7
                + "\t"
                + l8
                + "\n"
            )
            outfile.write(o)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/tmp/ipykernel_2172/2575415027.py in <module>
      7 dirs = [
      8     os.path.join(root_folder, d)
----> 9     for d in os.listdir(root_folder)
     10     if os.path.isdir(os.path.join(root_folder, d))
     11 ]

FileNotFoundError: [Errno 2] No such file or directory: '../data/temp_data'

Global 10-daily Leaf Area Index 333m

The code below computes the average LAI and assigns H3 hash codes to the values. The results will be saved into a tsv file.

In [6]:
root_folder = "../data/leaf_data"
fs = [
    os.path.join(root_folder, f)
    for f in os.listdir(root_folder)
    if os.path.isfile(os.path.join(root_folder, f))
]

ll2lai = {}

for f in fs:
    try:
        ds = xr.open_dataset(filename_or_obj=os.path.join(f), engine="netcdf4")
        lat = ds.data_vars["LAI"]["lat"].to_numpy()
        lat = [e for e in lat if 47.392134 < e < 47.601216]
        lon = ds.data_vars["LAI"]["lon"].to_numpy()
        lon = [e for e in lon if 18.936234 < e < 19.250031]
        time = ds.data_vars["LAI"]["time"].to_numpy()[0]
        for i in range(len(lat)):
            for j in range(len(lon)):
                one_point = ds["LAI"].sel(lat=lat[i], lon=lon[i])
                vals = one_point.values[0]
                if ll2lai.get((lat[i], lon[j]), False):
                    ll2lai[(lat[i], lon[j])] = (ll2lai[(lat[i], lon[j])] + vals) / 2.0
                else:
                    ll2lai[(lat[i], lon[j])] = vals
    except Exception as exc1:
        print(exc1)
        continue

with open("../data/lai_budapest.tsv", "w") as outfile:
    h = "lat\tlong\tlai\tl5\tl6\tl7\tl8\n"
    outfile.write(h)
    for k, v in ll2lai.items():
        h5 = h3.geo_to_h3(k[0], k[1], 5)
        h6 = h3.geo_to_h3(k[0], k[1], 6)
        h7 = h3.geo_to_h3(k[0], k[1], 7)
        h8 = h3.geo_to_h3(k[0], k[1], 8)
        if h5 in h3_l5 and h6 in h3_l6 and h7 in h3_l7 and h8 in h3_l8:
            o = (
                str(k[0])
                + "\t"
                + str(k[1])
                + "\t"
                + str(v)
                + "\t"
                + str(h5)
                + "\t"
                + str(h6)
                + "\t"
                + str(h7)
                + "\t"
                + str(h8)
                + "\n"
            )
            outfile.write(o)

Global 10-daily Fraction of Vegetation Cover 333m

The code below computes the average FCOVER and assigns H3 hash codes to the values. The results will be saved into a tsv file.

In [7]:
root_folder = "../data/fcover"
fs = [
    os.path.join(root_folder, f)
    for f in os.listdir(root_folder)
    if os.path.isfile(os.path.join(root_folder, f))
]


def is_within_bounding_box(lat, long):
    if 47.392134 < lat < 47.601216 and 18.936234 < long < 19.250031:
        return True
    else:
        return False


ll2fcover = {}

for f in fs:
    try:
        ds = xr.open_dataset(filename_or_obj=os.path.join(f), engine="netcdf4")
        lat = ds.data_vars["FCOVER"]["lat"].to_numpy()
        lat = [e for e in lat if 47.392134 < e < 47.601216]
        lon = ds.data_vars["FCOVER"]["lon"].to_numpy()
        lon = [e for e in lon if 18.936234 < e < 19.250031]
        time = ds.data_vars["FCOVER"]["time"].to_numpy()[0]
        for i in range(len(lat)):
            for j in range(len(lon)):
                one_point = ds["FCOVER"].sel(lat=lat[i], lon=lon[i])
                vals = one_point.values[0]
                if ll2fcover.get((lat[i], lon[j]), False):
                    ll2fcover[(lat[i], lon[j])] = (
                        ll2fcover[(lat[i], lon[j])] + vals
                    ) / 2.0
                else:
                    ll2fcover[(lat[i], lon[j])] = vals
    except Exception as exc1:
        print(exc1)
        continue

with open("../data/fcover_budapest.tsv", "w") as outfile:
    h = "lat\tlong\tfcover\tl5\tl6\tl7\tl8\n"
    outfile.write(h)
    for k, v in ll2fcover.items():
        h5 = h3.geo_to_h3(k[0], k[1], 5)
        h6 = h3.geo_to_h3(k[0], k[1], 6)
        h7 = h3.geo_to_h3(k[0], k[1], 7)
        h8 = h3.geo_to_h3(k[0], k[1], 8)
        if h5 in h3_l5 and h6 in h3_l6 and h7 in h3_l7 and h8 in h3_l8:
            o = (
                str(k[0])
                + "\t"
                + str(k[1])
                + "\t"
                + str(v)
                + "\t"
                + str(h5)
                + "\t"
                + str(h6)
                + "\t"
                + str(h7)
                + "\t"
                + str(h8)
                + "\n"
            )
            outfile.write(o)

Visualizing the data

Maps

Square meter prices

In [5]:
df_price = pd.read_csv("../data/aggregated/l7.tsv", sep="\t")
df_price["normalized"] = 255 - (df_price["price"] / np.sqrt(np.sum(df_price["price"] ** 2)) * 1000)
layer = pdk.Layer(
    "H3HexagonLayer",
    df_price,
    get_hexagon="l7",
    auto_highlight=True,
    # elevation_scale=10,
    pickable=True,
    # elevation_range=[min(df["price"]), max(df["price"])],
    extruded=True,
    coverage=0.8,
    opacity=0.01,
    get_fill_color="[255, normalized, 0]",
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "square meter price: {price}"},
)
r.to_html("../vizs/maps/prices_h7.html")
Out[5]:

Temperature

In [6]:
df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
df.fillna(0, inplace=True)

df_temp = df.groupby("l7").mean()
df_temp.reset_index(inplace=True, level=["l7"])
df_temp["rescaled"] = [255 - ((e**3)/100) for e in df_temp["celsius"]]
layer = pdk.Layer(
    "H3HexagonLayer",
    df_temp,
    get_hexagon="l7",
    auto_highlight=True,
    pickable=True,
    extruded=True,
    coverage=0.8,
    opacity=0.05,
    get_fill_color="[255, rescaled, 0]",
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "temperature (celsius): {celsius}"},
)
r.to_html("../vizs/maps/temperature_h7.html")
Out[6]:

Leaf Area Index

In [7]:
df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
df.fillna(0, inplace=True)

df_lai = df.groupby("l7").mean()
df_lai.reset_index(inplace=True, level=["l7"])
layer = pdk.Layer(
    "H3HexagonLayer",
    df_lai,
    get_hexagon="l7",
    auto_highlight=True,
    pickable=True,
    extruded=True,
    coverage=0.9,
    opacity=0.05,
    get_fill_color="[255, 255 - (lai * 100), 0]"
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "Leaf Area Index: {lai}"},
)
r.to_html("../vizs/maps/lai_h7.html")
Out[7]:

Fraction of Vegetation Cover

In [8]:
df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")
df.fillna(0.0, inplace=True)

df_fcover = df.groupby("l7").mean()
df_fcover.reset_index(inplace=True, level=["l7"])
df_fcover["normalized"] = (df_fcover["fcover"] / np.sqrt(np.sum(df_fcover["fcover"] ** 5))) ** -2
df_fcover["normalized"][df_fcover["normalized"] == np.inf] = 255
layer = pdk.Layer(
    "H3HexagonLayer",
    df_fcover,
    get_hexagon="l7",
    auto_highlight=True,
    pickable=True,
    extruded=True,
    coverage=0.8,
    opacity=0.05,
    get_fill_color="[255, normalized, 0]",
)

view_state = pdk.ViewState(
    latitude=47.500000, longitude=19.040236, zoom=10.5, bearing=0, pitch=35
)
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    tooltip={"text": "Fraction of Vegetation Cover: {fcover}"},
)
r.to_html("../vizs/maps/fcover_h7.html")
/tmp/ipykernel_2172/3755050406.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_fcover["normalized"][df_fcover["normalized"] == np.inf] = 255
Out[8]:

Analysis

We would like to test the common conception that wealthier neighborhoods are greener and also enjoy a lower average temperature during summer. Since we have no data on wealth at this granularity, we use property square meter prices as a proxy of wealth. This is a strong and yet not often assessed assumption. In particular, we test if

  • temperature and greenness,
  • temperature and square meter prices,
  • greenness and square meter prices

are connected.

We present our findings as interactive visualizations. We start with a relatively fine H3 resolution (7), giving us a fairly tight covergae of the geographical are under investigation. However, as we notice that the geographical resolution might not match the resolution of economical data, we also experiment with lower H3 resolutions.

H3 level 7

In [9]:
at.renderers.enable('default')

data_frames = [df_price, df_temp, df_lai, df_fcover]
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['l7'],
                                            how='outer'), data_frames)
df_merged.dropna(inplace=True)
df_merged.drop(columns=["normalized_x", "lat_x", "long_x", "rescaled", "lat_y",
                "long_y", "lat", "long", "normalized_y"], inplace=True)
df_merged.head()
Out[9]:
l7 price celsius lai fcover
4 871e03449ffffff 1.197017e+06 19.883478 4.093056 0.736756
5 871e0344affffff 9.605164e+05 22.373126 3.321111 0.779938
6 871e0344bffffff 1.556710e+06 18.937453 3.557240 0.784969
7 871e03459ffffff 1.115068e+06 20.933828 3.778786 0.813342
9 871e0345dffffff 1.449016e+06 21.627415 3.191146 0.775750
In [10]:
cor_data = (df_merged.corr().stack().reset_index().rename(columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data['correlation_label'] = cor_data['correlation'].map('{:.2f}'.format)  # Round to 2 decimal
cor_data
Out[10]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.082898 0.08
2 price lai 0.130627 0.13
3 price fcover 0.085454 0.09
4 celsius price 0.082898 0.08
5 celsius celsius 1.000000 1.00
6 celsius lai -0.064254 -0.06
7 celsius fcover -0.050965 -0.05
8 lai price 0.130627 0.13
9 lai celsius -0.064254 -0.06
10 lai lai 1.000000 1.00
11 lai fcover 0.958244 0.96
12 fcover price 0.085454 0.09
13 fcover celsius -0.050965 -0.05
14 fcover lai 0.958244 0.96
15 fcover fcover 1.000000 1.00
In [11]:
base = at.Chart(cor_data).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[11]:

H3 level 6

Since our price data is collected on street name level, maybe we should use a lower resolution.

In [12]:
price_h3_df = pd.read_csv("../data/aggregated/l6.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l6").mean()
temp_h3_df.reset_index(inplace=True, level=["l6"])
lai_h3_df = lai_df.groupby("l6").mean()
lai_h3_df.reset_index(inplace=True, level=["l6"])
fcover_h3_df = fcover_df.groupby("l6").mean()
fcover_h3_df.reset_index(inplace=True, level=["l6"])

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l6'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3.head()
Out[12]:
l6 price celsius lai fcover
4 861e0344fffffff 1.149972e+06 21.141896 3.661387 0.775320
5 861e0345fffffff 1.071476e+06 21.493805 3.573483 0.794305
7 861e03607ffffff 8.155039e+05 20.747444 0.731667 0.253750
9 861e03617ffffff 7.243512e+05 20.387555 1.173775 0.331828
10 861e0361fffffff 8.734682e+05 22.131759 0.756120 0.290160
In [13]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[13]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius -0.056756 -0.06
2 price lai 0.205356 0.21
3 price fcover 0.187743 0.19
4 celsius price -0.056756 -0.06
5 celsius celsius 1.000000 1.00
6 celsius lai -0.218399 -0.22
7 celsius fcover -0.248033 -0.25
8 lai price 0.205356 0.21
9 lai celsius -0.218399 -0.22
10 lai lai 1.000000 1.00
11 lai fcover 0.967857 0.97
12 fcover price 0.187743 0.19
13 fcover celsius -0.248033 -0.25
14 fcover lai 0.967857 0.97
15 fcover fcover 1.000000 1.00
In [14]:
base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[14]:

H3 Level 5

In [15]:
price_h3_df = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l5").mean()
temp_h3_df.reset_index(inplace=True, level=["l5"])
lai_h3_df = lai_df.groupby("l5").mean()
lai_h3_df.reset_index(inplace=True, level=["l5"])
fcover_h3_df = fcover_df.groupby("l5").mean()
fcover_h3_df.reset_index(inplace=True, level=["l5"])

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l5'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3
Out[15]:
l5 price celsius lai fcover
4 851e0347fffffff 1.126057e+06 21.151154 3.594296 0.789122
6 851e0363fffffff 7.993279e+05 20.890202 1.467126 0.421782
7 851e036bfffffff 7.341340e+05 20.802957 1.497615 0.481584
8 851e0373fffffff 9.410714e+05 20.786614 3.598063 0.761485
10 851e037bfffffff 1.072051e+06 21.231511 1.795492 0.469863
22 851e1cb7fffffff 8.104594e+05 20.866897 1.859375 0.583201
In [16]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[16]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.828449 0.83
2 price lai 0.637665 0.64
3 price fcover 0.538340 0.54
4 celsius price 0.828449 0.83
5 celsius celsius 1.000000 1.00
6 celsius lai 0.115429 0.12
7 celsius fcover 0.033319 0.03
8 lai price 0.637665 0.64
9 lai celsius 0.115429 0.12
10 lai lai 1.000000 1.00
11 lai fcover 0.967581 0.97
12 fcover price 0.538340 0.54
13 fcover celsius 0.033319 0.03
14 fcover lai 0.967581 0.97
15 fcover fcover 1.000000 1.00
In [17]:
base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[17]:

Integrating other (community) data sources

In this part of the notebook, we demonstrate our unique approach to multi-domain data synthesis, while bringing in two further data sources with a distinct community-driven character.

OpenStreetMap is an open neogeography platform: the community defines the semantics of the map and populates it with POIs, i.e., data entries referring to actual places of a certain character. That character is captured by features and subfeatures, i.e., a more abstract and a more detailed categorization.

In the following, we create an integrated, multi-domain sociogeographical dataset by counting the number of POIs of certain categories.

This obsservational dataset is further augmented by data from another community-driven platform: Járókelő is a civil-empowered engagement platform for reporting issues in the urban infrasturcture and enviroment, and following up on their status. We consider all the cases (over 10000) submitted this year (2022), until Aug 20.

As this is a Hungarian platform, and, for the sake of compatibility, we used the Hungarian names as feature labels in the dataset, we provide a translation of them here:

  • Elhagyott jármű - Left vehicle
  • Parkolás - Parking
  • Graffiti - Graffiti
  • Járda - Sidewalk
  • Közvilágítás - Lighting
  • Közművek - Infrastructure
  • Forgalomtechnika - Traffic control
  • Egyéb - Misc.
  • Tömegközlekedés - Public transport
  • Kátyú - Road problems
  • Parkok és zöldterületek - Parks and green
  • Szemét - Trash
  • Kerékpárút - Bicycle roads
  • Akadálymentesítés - Accessibility

The following tables provide a data summary table and a correlation matrix for OSM features (and subfeatures, respectively) and Járókelő entries, aggregated for H3 hexagons with resolution Level 5 (where 'l5' hashes identify the hexagons). The tables have been collapsed for readability using the _df_mergedh3() function.

OSM main features and Jarokelo

In [18]:
price_h3_df = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l5").mean()
temp_h3_df.reset_index(inplace=True, level=["l5"])
lai_h3_df = lai_df.groupby("l5").mean()
lai_h3_df.reset_index(inplace=True, level=["l5"])
fcover_h3_df = fcover_df.groupby("l5").mean()
fcover_h3_df.reset_index(inplace=True, level=["l5"])

osm_pois = pd.read_csv("../data/osm/key_l5.tsv", sep="\t")
osm_pois = osm_pois.pivot_table(values="0", index=osm_pois.l5, columns="key", aggfunc="first")
osm_pois.reset_index(inplace=True, level=["l5"])
osm_pois.fillna(0, inplace=True)

jarokelo = pd.read_csv("../data/jarokelo/jarokelo_l5.tsv", sep="\t")
jarokelo = jarokelo.pivot_table(values="0", index=jarokelo.l5, columns="Category", aggfunc="first")
jarokelo.reset_index(inplace=True, level=["l5"])
jarokelo.fillna(0, inplace=True)

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df, osm_pois, jarokelo]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l5'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3
Out[18]:
l5 price celsius lai fcover amenity building craft landuse office ... Járda Kerékpárút Kátyú Közművek Közvilágítás Parkok és zöldterületek Parkolás Szemét Sziget Fesztivál Tömegközlekedés
4 851e0347fffffff 1.126057e+06 21.151154 3.594296 0.789122 1337.0 7.0 2.0 0.0 2.0 ... 5.0 0.0 10.0 4.0 1.0 14.0 11.0 6.0 0.0 2.0
6 851e0363fffffff 7.993279e+05 20.890202 1.467126 0.421782 2511.0 6.0 4.0 0.0 3.0 ... 76.0 15.0 89.0 183.0 83.0 160.0 17.0 220.0 0.0 47.0
7 851e036bfffffff 7.341340e+05 20.802957 1.497615 0.481584 906.0 2.0 1.0 0.0 0.0 ... 23.0 2.0 20.0 49.0 18.0 78.0 7.0 52.0 0.0 30.0
8 851e0373fffffff 9.410714e+05 20.786614 3.598063 0.761485 3768.0 10.0 4.0 0.0 4.0 ... 98.0 27.0 129.0 167.0 119.0 339.0 64.0 259.0 1.0 58.0
10 851e037bfffffff 1.072051e+06 21.231511 1.795492 0.469863 23740.0 25.0 38.0 1.0 29.0 ... 408.0 201.0 288.0 935.0 272.0 1158.0 218.0 1139.0 1.0 291.0
22 851e1cb7fffffff 8.104594e+05 20.866897 1.859375 0.583201 935.0 7.0 2.0 0.0 1.0 ... 6.0 0.0 16.0 5.0 4.0 8.0 1.0 9.0 0.0 2.0

6 rows × 25 columns

In [19]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[19]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.828449 0.83
2 price lai 0.637665 0.64
3 price fcover 0.538340 0.54
4 price amenity 0.502131 0.50
... ... ... ... ...
571 Tömegközlekedés Parkok és zöldterületek 0.992869 0.99
572 Tömegközlekedés Parkolás 0.984545 0.98
573 Tömegközlekedés Szemét 0.996508 1.00
574 Tömegközlekedés Sziget Fesztivál 0.725058 0.73
575 Tömegközlekedés Tömegközlekedés 1.000000 1.00

576 rows × 4 columns

In [20]:
base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=700,
    height=700
)

cor_plot + text
Out[20]:

OSM main subfeatures and Jarokelo

In [21]:
price_h3_df = pd.read_csv("../data/aggregated/l5.tsv", sep="\t")
temp_df = pd.read_csv("../data/temp_budapest.tsv", sep="\t")
lai_df = pd.read_csv("../data/lai_budapest.tsv", sep="\t")
fcover_df = pd.read_csv("../data/fcover_budapest.tsv", sep="\t")

temp_h3_df = temp_df.groupby("l5").mean()
temp_h3_df.reset_index(inplace=True, level=["l5"])
lai_h3_df = lai_df.groupby("l5").mean()
lai_h3_df.reset_index(inplace=True, level=["l5"])
fcover_h3_df = fcover_df.groupby("l5").mean()
fcover_h3_df.reset_index(inplace=True, level=["l5"])

osm_pois = pd.read_csv("../data/osm/value_l5.tsv", sep="\t")
osm_pois = osm_pois.pivot_table(values="0", index=osm_pois.l5, columns="value", aggfunc="first")
osm_pois.reset_index(inplace=True, level=["l5"])
osm_pois.fillna(0, inplace=True)

jarokelo = pd.read_csv("../data/jarokelo/jarokelo_l5.tsv", sep="\t")
jarokelo = jarokelo.pivot_table(values="0", index=jarokelo.l5, columns="Category", aggfunc="first")
jarokelo.reset_index(inplace=True, level=["l5"])
jarokelo.fillna(0, inplace=True)

h3_data_frames = [price_h3_df, temp_h3_df, lai_h3_df, fcover_h3_df, osm_pois, jarokelo]
df_merged_h3 = reduce(lambda  left,right: pd.merge(left,right,on=['l5'],
                                                   how='outer'), h3_data_frames)
df_merged_h3.dropna(inplace=True)
df_merged_h3.drop(columns=["lat_x", "long_x", "lat_y", "long_y", "lat", "long"], inplace=True)
df_merged_h3
Out[21]:
l5 price celsius lai fcover air_filling animal_boarding animal_shelter animal_training arts_centre ... Járda Kerékpárút Kátyú Közművek Közvilágítás Parkok és zöldterületek Parkolás Szemét Sziget Fesztivál Tömegközlekedés
4 851e0347fffffff 1.126057e+06 21.151154 3.594296 0.789122 0.0 1.0 1.0 1.0 0.0 ... 5.0 0.0 10.0 4.0 1.0 14.0 11.0 6.0 0.0 2.0
6 851e0363fffffff 7.993279e+05 20.890202 1.467126 0.421782 0.0 5.0 1.0 2.0 0.0 ... 76.0 15.0 89.0 183.0 83.0 160.0 17.0 220.0 0.0 47.0
7 851e036bfffffff 7.341340e+05 20.802957 1.497615 0.481584 0.0 0.0 0.0 1.0 1.0 ... 23.0 2.0 20.0 49.0 18.0 78.0 7.0 52.0 0.0 30.0
8 851e0373fffffff 9.410714e+05 20.786614 3.598063 0.761485 0.0 1.0 0.0 3.0 0.0 ... 98.0 27.0 129.0 167.0 119.0 339.0 64.0 259.0 1.0 58.0
10 851e037bfffffff 1.072051e+06 21.231511 1.795492 0.469863 2.0 5.0 1.0 2.0 27.0 ... 408.0 201.0 288.0 935.0 272.0 1158.0 218.0 1139.0 1.0 291.0
22 851e1cb7fffffff 8.104594e+05 20.866897 1.859375 0.583201 0.0 0.0 0.0 0.0 0.0 ... 6.0 0.0 16.0 5.0 4.0 8.0 1.0 9.0 0.0 2.0

6 rows × 211 columns

In [22]:
cor_data_h3 = (df_merged_h3.corr().stack().reset_index().rename(
    columns={0: 'correlation', 'level_0': 'variable', 'level_1': 'variable2'}))
cor_data_h3['correlation_label'] = cor_data_h3['correlation'].map('{:.2f}'.format)
cor_data_h3
Out[22]:
variable variable2 correlation correlation_label
0 price price 1.000000 1.00
1 price celsius 0.828449 0.83
2 price lai 0.637665 0.64
3 price fcover 0.538340 0.54
4 price air_filling 0.486479 0.49
... ... ... ... ...
43676 Tömegközlekedés Parkok és zöldterületek 0.992869 0.99
43677 Tömegközlekedés Parkolás 0.984545 0.98
43678 Tömegközlekedés Szemét 0.996508 1.00
43679 Tömegközlekedés Sziget Fesztivál 0.725058 0.73
43680 Tömegközlekedés Tömegközlekedés 1.000000 1.00

43681 rows × 4 columns

In [23]:
from altair import pipe, limit_rows, to_values
t = lambda data: pipe(data, limit_rows(max_rows=50000), to_values)
at.data_transformers.register('custom', t)
at.data_transformers.enable('custom')

base = at.Chart(cor_data_h3).encode(
    x='variable2:O',
    y='variable:O'
)

text = base.mark_text().encode(
    text='correlation_label',
    color=at.condition(
        at.datum.correlation > 0.5,
        at.value('white'),
        at.value('black')
    )
)

cor_plot = base.mark_rect().encode(
    color='correlation:Q'
).properties(
    width=10000,
    height=10000
)

cor_plot + text
/opt/conda/lib/python3.8/site-packages/altair/utils/data.py:226: AltairDeprecationWarning: alt.pipe() is deprecated, and will be removed in a future release. Use toolz.curried.pipe() instead.
  warnings.warn(
Out[23]:

Discussion and conclusion

The present notebook fulfills a threefold purpose: a hands-on educational, an analytical and an integrational one.

The first part of the notebook presents a practical approach to aggregating and visualizing geosocial data using h3 and pydeck.

As for a technosocial analysis, we investigated the connection between property prices and environmental factors. In particular, we wanted to see if having a more expensive property in Budapest, Hungary correlates with a greener environment and a cooler summer. We used WEkEO data to get information on the environment and we scraped property prices to supplement the data. Interestingly, using lower (but still not too coarse) geographical resolutions provided us with some notable (even if not very surprising) correlations between more expensive properties and the greenness of its environment (using both the LAI and the FCOVER measure for greenness. However, we could not verify any significant direct connection between higher prices and more acceptable temperature conditions! This might be a first step towards an important future investigation: if consequences of climate change can be effectively tackled on individual level by spending more money. (Maybe, they cannot.)

The third aim of our project to lay a foundation (another reason for the name FUNDUS) for a technosocial data integration and analysis platform. To this end, we have created a novel, multi-domain integration, which combines, in a unique way, heterogeneous, yet semantically interlinked data domains: FUNDUS becomes a representation of the networked society, fusing the economical, the environmental, the urban-geographical and the civic aspects of our lives.

Future directions

In the future, we would like to increase the time window to get more LAI and FCOVER data. We investigate the possibility to incorporate other datasets like OSM and get official statistics on house prices and/or income, widening the scope of the FUNDUS project, to finally turn it into a full-fledged, environmentally and economically conscious qualitiy-of-life assessment approach. As already mentioned, a particular and potentially very interesting research question arising from the present notebook: can money effectively battle the effects of climate crisis in our direct surroundings? In order to learn more, we shall primarily proceed by expanding the geographical scope of the present notebook, and maybe even considering further economical factors. Here, the scarcity of open data is still a major impediment, but we hope we can contribute to the improvement of the situation with our endeavor.