project
Class Project: Data Centres in Europe
Overview
In this project you will (1) build a Europe-wide dataset of
data centre locations, (2) compile socio-economic and
infrastructure variables at the NUTS3 level, and (3) train
models to predict where data centres are located. The goal is to
practice end-to-end data science: data collection, cleaning,
standardisation, merging, modelling, evaluation, and
interpretation.
Key Dates
| Milestone |
Date |
| Project start |
1 March |
| Mid-term Deadline (Parts 1–2) |
20 April |
| Part 3 begins |
20 April |
| Report Deadline |
June 5th or two weeks before “Primo Appello” |
Part 1 (1 March → 20 April): Data Centre Collection
Task
Collect data on as many data centres in a set of nations as
possible, and submit your entries to the TA, who will collate a
Data Centre Master Table. Each team selects the
set of nations equal to: (team number mod
#nations_sets).
Initial sources of data centers information include Data
Center Map. Other sources are also encouraged.
Definition
A “data centre” is a dedicated facility that provides
computing/storage/network infrastructure (e.g., hyperscale,
colocation, carrier-neutral sites). Generic office server rooms
should not be included unless clearly documented as a data
centre.
What to submit
- Each entry must include a unique identifier, location
details (including coordinates), its NUTS3 region, and at
least one credible source confirming the facility’s presence.
- Where relevant, include links to news mentioning the
facility and any named individuals (with public professional
profiles when available).
- See “What is NUTS3”, “Data Center Mandatory Variables” and
“Data Center Optional Variables” below.
Grading (Part 1)
Credit is proportional to the number of correct entries
submitted via this form.
Incorrect entries will be penalized.
Part 2 (1 March → 20 April): NUTS3 Variables (EU-wide)
Task
For all EU NUTS3 regions, compile a dataset containing:
- Population (NUTS3)
- Gross Domestic Product (GDP) (NUTS3)
- At least one additional variable type from the “NUTS3
Additional Variables” list (assigned by team number). Each
team selects the variable type equal to: (team number
mod #nuts_vars).
If the remainder is 0, select type 14. Example: team 16 → 16
mod 14 = 2 → chooses type 2: Wind potential.
Deliverable
A clean table keyed by NUTS3 code, with clear units, time
period, sources, and brief notes on processing and coverage. See
“NUTS3 Additional Variables” below.
Part 3 (20 April →): Train Models to Predict where Data
Centers are Located
- Explainable model (logistic) + most accurate model
NUTS3 Additional Variables (#nuts_vars)
Each team has a number. In Part 2, point 3, you have to select
the variable type equal to (team number mod 14). For example,
team 15 will choose variable type 2: “Wind potential”.
- Electricity cost, water consumption
- Other renewable sources (hydropower, biomass, geothermal)
- Actual electricity generation and grid capacity
- Fibre coverage and Very High Capacity Networks (VHCN)
- Backbone and Internet trunk maps
- European data centre map
- Drought — SPEI, SPI, soil moisture (EDO)
- Heatwaves and extreme temperatures (Copernicus CDS)
- Climate risk indicators (PESETA, EEA)
- Water Exploitation Index (WEI and WEI+)
- ICT firms and industrial intensity
- Solar potential (PVGIS)
- Wind potential (Global Wind Atlas)
- Land use and availability of industrial areas (CORINE Land
Cover)
Assigned Sets of States (#nations_sets)
- DE—Germany, BG—Bulgaria, LT—Lithuania, RS—Serbia,
MK—Macedonia, XK—Kosovo
- FR—France, DK—Denmark, AT—Austria, UA—Ukraine, CY—Cyprus,
HR—Croatia, AL—Albania
- IT—Italy, SE—Sweden, NO—Norway, RO—Romania, PT—Portugal,
HU—Hungary, EE—Estonia, IS—Iceland
- ES—Spain, CH—Switzerland, PL—Poland, TR—Turkey, EL—Greece,
LV—Latvia, SK—Slovakia, MT—Malta
- NL—The Netherlands, IE—Ireland, FI—Finland, CZ—Czech
Republic, BE—Belgium, SI—Slovenia, LU—Luxembourg,
LI—Liechtenstein
What is NUTS3
NUTS3 (Nomenclature of Territorial Units for Statistics, level
3, more here)
is the third level of the hierarchical regional classification
developed by Eurostat for the European Union. It provides a
standardized system to subdivide countries into comparable small
regions for statistical analysis, regional policy, and
socio-economic reporting.
In this project, we use the NUTS2024 classification at the
NUTS3 level for EU Member States. In addition, we include
Statistical Regions (SR) for non-EU countries that are
harmonized with the NUTS framework.
To identify the corresponding NUTS3 region for a given
geographic location, Eurostat provides an official interactive
tool (Statistical Atlas). By entering or navigating to a
specific position on the map, the associated NUTS3 region and
code can be retrieved.
How to get NUTS3 from coordinates in python
See this doc.
Data Center Mandatory Variables
- ID & location: unique id, lat, long,
address, city, zipcode, NUTS3 ID, operator
- Source: source certifying the presence of
this datacenter
- News: list the news mentioning the
datacenter
- People: list the people and their social
media profile (e.g., LinkedIn) mentioned in the news
Data Center Optional Variables
- Ownership / branding: company (permit
applicant); brand (mapped parent brand), brand source; private
equity or asset manager? (source)
- Permits & facility notes: first permit
issue year, latest permit issue year; new permit or existing
update?; facilities note; number of buildings; construction
year
- Generator & capacity: generator type;
rated capacity; total generator rated capacity (kW)
- Estimated power/use buckets: estimated
power consumption (kW/h) at 30% / 50% / 60%; size categories;
“in TWh” variants
- Water + hydrology: daily/annual water
consumption; water stress; basin/aquifer fields
- Emissions / pollutants (tons per year):
NOx, SOx, VOCs, CO2e, PM-related fields
- Environmental justice / population:
population within 1 mile; percentile fields where available
- Links / reporting metadata: links to
records; page refs; reporter; notes
Appendix — NUTS3 datasets (selected sources)
1. Socio-demographic and economic
2. Energy, renewables, environment
Some sources are not directly in NUTS3 but can be aggregated
(e.g., raster means within NUTS3 polygons).
3. Digital infrastructure
4. Climate risk
5. Water security
6. Other useful variables