Skip to main content
  1. Tags/

geolab

weeknotes 110: toy plugin

·1 min
  • my brother’s flat hunt is thoroughly complete now as i helped him move into his new place last week! he was indeed pleased with it, which pleased me, and the rest of the family too. crisis averted, phew!
  • i finished the PyQGISChallenge! mostly! i’m about half way through putting together a toy plugin as the mini project for the last couple of days. i’ll post about that specifically once its done.
  • and i sorted the last bit of OS upgrade aftermath, which was sorting out my geolab database again. this involved an excursion into the land of containers, which i haven’t visited for a while, to extract my postgres 13 database for loading into my new postgres 15 database. now everything is pretty much as it was a few weeks ago. jolly good!

dataset: Supermarket Retail Points

·1 min

created: 30/05/2024
updated: 30/05/2024

dataset source #

source:

Retail Points is a comprehensive data set of supermarket and convenience store locations across the UK. Geolytix release this as open data allowing for unrestricted use, no licensing requirements and without cost.

loading into geolab #

connect to geolab database:

psql -h localhost -U geolab -d geolab -W

create source schema:

CREATE SCHEMA __geolytix AUTHORIZATION geolab;

create table:

CREATE TABLE __geolytix.supermarket_retail_points_import
(
id numeric PRIMARY KEY,
retailer varchar(36),
fascia varchar(29),
store_name varchar(60),
add_one varchar(76),
add_two varchar(34),
town varchar(41),
suburb varchar(28),
postcode varchar(8),
long_wgs numeric,
lat_wgs numeric,
bng_e numeric,
bng_n numeric,
pqi varchar(28),
open_date varchar(8),
size_band varchar(38),
county varchar(24)
);

load data into geolab source schema:

\copy __geolytix.supermarket_retail_points_import FROM 'geolytix_retailpoints_v31_202403.csv' WITH (FORMAT csv, HEADER True, QUOTE '"')

create topic schema:

CREATE SCHEMA supermarkets AUTHORIZATION geolab;

create view:

N.B. using a view is slower than a table because the geometry is constructed every time the view is queried.

CREATE VIEW supermarkets.all
 AS
SELECT
  id,
  retailer,
  fascia,
  store_name,
  add_one,
  add_two,
  town,
  suburb,
  postcode,
  long_wgs,
  lat_wgs,
  bng_e,
  bng_n,
  pqi,
  open_date,
  size_band,
  county,
  ST_SetSRID(ST_MakePoint(long_wgs,lat_wgs),4326) AS geom
FROM __geolytix.supermarket_retail_points_import
;

dataset: OS AddressBase

·2 mins

created: 30/05/2024
updated: 30/05/2024

dataset source #

source:

AddressBase matches 29 million Royal Mail postal address to unique property reference numbers (UPRN), bringing a new dimension to the matched records.

loading into geolab #

connect to geolab database:

psql -h localhost -U geolab -d geolab -W

create source schema:

CREATE SCHEMA __ordnance_survey AUTHORIZATION geolab;

convert field names from addressbase-header.csv to lowercase:

$ cat addressbase-header.csv | sed s'/,/\n/g' | python -c "import sys; [print(line.strip().lower()) for line in sys.stdin.readlines()]"
uprn
os_address_toid
udprn
organisation_name
department_name
po_box_number
sub_building_name
building_name
building_number
dependent_thoroughfare
thoroughfare
post_town
double_dependent_locality
dependent_locality
postcode
postcode_type
x_coordinate
y_coordinate
latitude
longitude
rpc
country
change_type
la_start_date
rm_start_date
last_update_date
class

create table:

CREATE TABLE __ordnance_survey.addressbase_sample_import
(
uprn numeric PRIMARY KEY,
os_address_toid varchar(20),
udprn numeric,
organisation_name varchar(55),
department_name varchar(40),
po_box_number varchar(4),
sub_building_name varchar(25),
building_name varchar(40),
building_number numeric,
dependent_thoroughfare varchar(30),
thoroughfare varchar(30),
post_town varchar(10),
double_dependent_locality varchar(25),
dependent_locality varchar(30),
postcode varchar(7),
postcode_type varchar(1),
x_coordinate numeric,
y_coordinate numeric,
latitude numeric,
longitude numeric,
rpc numeric,
country varchar(1),
change_type varchar(1),
la_start_date varchar(10),
rm_start_date varchar(10),
last_update_date varchar(10),
class varchar(1)
);

stick field names on the data:

$ cat addressbase-header.csv sx9090.csv > addressbase.csv

load data into geolab source schema:

\copy __ordnance_survey.addressbase_sample_import FROM addressbase.csv WITH (FORMAT csv, HEADER True, QUOTE '"')

create topic schema:

CREATE SCHEMA addressbase AUTHORIZATION geolab;

create view:

N.B. using a view is slower than a table because the geometry is constructed every time the view is queried.

CREATE VIEW addressbase.sample
 AS
SELECT
  uprn,
  os_address_toid,
  udprn,
  organisation_name,
  department_name,
  po_box_number,
  sub_building_name,
  building_name,
  building_number,
  dependent_thoroughfare,
  thoroughfare,
  post_town,
  double_dependent_locality,
  dependent_locality,
  postcode,
  postcode_type,
  x_coordinate,
  y_coordinate,
  latitude,
  longitude,
  rpc,
  country,
  change_type,
  la_start_date,
  rm_start_date,
  last_update_date,
  class,
  ST_SetSRID(ST_MakePoint(longitude,latitude),4326) AS geom
FROM __ordnance_survey.addressbase_sample_import
;

weeknotes 99: penguin photos

·1 min
  • finished up my geolab database bootstrap scripts
  • started loading some datasets into geolab
  • did some more 30DaysOfQGIS
  • started reading https://brr.fyi per a recommendation from a friend – a blog from someone deployed to Antarctica to do IT stuff. the blogger has “always been a fan of remote infrastructure, logistics, and energy systems” so rather than a steady supply of penguin photos the About page informs me i’m more likely to get a steady supply of “tedious posts about infrastructure” – sounds good to me!
  • decided to investigate scrapy in relation to the web scraper i mentioned in data pipeline plans

dataset: Global Powerplants

·2 mins

created: 28/05/2024
updated: 28/05/2024

dataset source #

source:

via: Simon Willison / https://global-power-plants.datasettes.com/

from the WRI page:

The Global Power Plant Database is a comprehensive, open source database of power plants around the world. It centralizes power plant data to make it easier to navigate, compare and draw insights for one’s own analysis. The database covers approximately 30,000 power plants from 164 countries and includes thermal plants (e.g. coal, gas, oil, nuclear, biomass, waste, geothermal) and renewables (e.g. hydro, wind, solar). Each power plant is geolocated and entries contain information on plant capacity, generation, ownership, and fuel type. It will be continuously updated as data becomes available.

from https://github.com/wri/global-power-plant-database (the code used to create it):

This project is not currently maintained by WRI. There are no planned updates as of this time (early 2022). The last version of this database is version 1.3.0. If we learn of active forks or maintained versions of the code and database we will attempt to provide links in the future.

loading into geolab #

connect to geolab database:

psql -h localhost -U geolab -d geolab -W

create source schema:

CREATE SCHEMA __world_resources_institute AUTHORIZATION geolab;

create table:

N.B. note the rowid field. this is present in an export from Simon’s datasette, but it will not be present in the download from WRI.

CREATE TABLE __world_resources_institute.global_powerplants_import
(
rowid numeric PRIMARY KEY,
country varchar(3),
country_long varchar(32),
name varchar(87),
gppd_idnr varchar(12),
capacity_mw numeric,
latitude numeric,
longitude numeric,
primary_fuel varchar(14),
other_fuel1 varchar(12),
other_fuel2 varchar(7),
other_fuel3 varchar(7),
commissioning_year numeric,
owner varchar(140),
source varchar(73),
url varchar(795),
geolocation_source varchar(83),
wepp_id varchar(25),
year_of_capacity_data numeric,
generation_gwh_2013 numeric, 
generation_gwh_2014 numeric, 
generation_gwh_2015 numeric, 
generation_gwh_2016 numeric, 
generation_gwh_2017 numeric, 
generation_gwh_2018 numeric, 
generation_gwh_2019 numeric, 
generation_data_source varchar(73), 
estimated_generation_gwh_2013 numeric, 
estimated_generation_gwh_2014 numeric, 
estimated_generation_gwh_2015 numeric, 
estimated_generation_gwh_2016 numeric, 
estimated_generation_gwh_2017 numeric, 
estimated_generation_note_2013 varchar(15),
estimated_generation_note_2014 varchar(15), 
estimated_generation_note_2015 varchar(15), 
estimated_generation_note_2016 varchar(15),
estimated_generation_note_2017 varchar(18)
);

load data into geolab source schema:

\copy __world_resources_institute.global_powerplants_import FROM 'global-power-plants.csv' WITH (FORMAT csv, HEADER True, QUOTE '"')

create topic schema:

CREATE SCHEMA powerplants AUTHORIZATION geolab;

create view:

N.B. using a view is slower than a table because the geometry is constructed every time the view is queried.

CREATE VIEW powerplants.global
 AS
SELECT 
  rowid,
  country,
  country_long,
  name,
  gppd_idnr,
  capacity_mw,
  latitude,
  longitude,
  primary_fuel,
  other_fuel1,
  other_fuel2,
  other_fuel3,
  commissioning_year,
  owner,
  source,
  url,
  geolocation_source,
  wepp_id,
  year_of_capacity_data,
  generation_gwh_2013,
  generation_gwh_2014,
  generation_gwh_2015,
  generation_gwh_2016,
  generation_gwh_2017,
  generation_gwh_2018,
  generation_gwh_2019,
  generation_data_source,
  estimated_generation_gwh_2013,
  estimated_generation_gwh_2014,
  estimated_generation_gwh_2015,
  estimated_generation_gwh_2016,
  estimated_generation_gwh_2017,
  estimated_generation_note_2013,
  estimated_generation_note_2014,
  estimated_generation_note_2015,
  estimated_generation_note_2016,
  estimated_generation_note_2017,
  ST_SetSRID(ST_MakePoint(longitude,latitude),4326) AS geom
FROM __world_resources_institute.global_powerplants_import
;