- blog/
GEOG 868: Mapping the Class Roster
In Lesson 3 of GEOG 868 I played around with PostgreSQL and PostGIS for the first time. For the project at the end you are given a class_roster.txt
file that contains a list of students and their postal codes. The task was to join that with data provided earlier in the lesson and then plot the students’ locations on a map in QGIS. Here’s how I went about doing that.
Project Brief
The project brief provided some hints and tips about what you should be doing, which I translated into roughly 4 stages:
- set up a database and schema
- import the data into the database
- do the geometry column and join query bit
- connect QGIS to PostGIS and make a map
Stage 1: set up a database and schema
I already had my Lesson3db
database from working through the lesson, so I used that as there was no particular need to have a separate database for this. However, a separate schema made sense, as this project was a new and different topic to the existing schemas from the lesson (which were usa
and nyc_poi
).
CREATE SCHEMA class_roster
AUTHORIZATION postgres
;
Stage 2: import the data into the database
Before jumping into this I had a quick ponder, in the form of sketching an ER diagram, about what I was intending to do. The general plan was:
- table for
students
(names and their postal codes) - table for
postal_codes
(point geometry of postal code centroids) - and I may as well put the counties and states shapefiles in PostGIS too
A fairly unexciting ER diagram in the end – 4 boxes, 2 of them joined with a line. But a useful exercise nevertheless.
First up, creating the tables.
Starting with the postal_codes
table, I took roughly the following steps:
- created
postal_codes
table - used
pgcli \copy
to insert CSV intopostal_codes
table - used
AddGeometryColumn()
to add ageom
column topostal_codes
table
Then the students
table, I took roughly the following steps:
- created
students
table (with an ID field) - created
csv_students
(without an ID field) - used
pgcli \copy
to insert CSV intocsv_students
table - used
INSERT ... SELECT
to populatestudents
table with contents ofcsv_students
table
I say roughly, because I forgot about the \copy
command needing the source CSV and destination table to have the same number of columns, so I DROP
’d and recreated tables a couple of times.
I decided the students
table should have an ID rather than use the student’s firstname+lastname as a primary key, so to facilitate that I created the temporary csv_students
table. My reason for the ID field being “something something future proofing” – like if this was a real school database, you might eventually end up with two people with the same name, so firstname+lastname a good primary key does not make.
Here is the SQL and pgcli commands I used:
creating postal_codes
table:
CREATE TABLE class_roster.postal_codes
(
code character varying(5),
lon real,
lat real,
PRIMARY KEY (code)
)
;
ALTER TABLE IF EXISTS class_roster.postal_codes
OWNER to postgres
;
populating postal_codes
table:
\copy class_roster.postal_codes FROM 'postal_codes.txt' WITH (FORMAT csv, HEADER True, QUOTE '"')
adding geometry column to postal_codes
table:
SELECT AddGeometryColumn('class_roster', 'postal_codes', 'geom', 4269, 'POINT', 2)
;
creating students
table:
CREATE TABLE class_roster.students
(
id serial,
last_name character varying(40),
first_name character varying(40),
postal_code character varying(5),
PRIMARY KEY (id)
)
;
ALTER TABLE IF EXISTS class_roster.students
OWNER to postgres
;
creating temporary csv_students
table:
CREATE TABLE class_roster.csv_students
(
last_name character varying(40),
first_name character varying(40),
postal_code character varying(5),
)
;
ALTER TABLE IF EXISTS class_roster.csv_students
OWNER to postgres
;
populating csv_students
table:
\copy class_roster.csv_students FROM '868_roster_sp2_22.txt' WITH (FORMAT csv, HEADER True, QUOTE '"')
populating students
table:
INSERT INTO class_roster.students (last_name, first_name, postal_code)
(
SELECT last_name, first_name, postal_code
FROM class_roster.csv_students
)
;
With all the necessary data in the database, I just had the counties and states shapefiles left to import. These could have been added to the QGIS project directly rather than to PostGIS first, but I thought I may as well add them to the database since I figured how to do that earlier in Lesson 3. I did this with ogr2ogr
:
ogr2ogr -f PostgreSQL PG:"host=localhost port=5432 dbname=Lesson3db user=postgres password=......" -lco SCHEMA=class_roster -nlt GEOMETRY counties.shp
ogr2ogr -f PostgreSQL PG:"host=localhost port=5432 dbname=Lesson3db user=postgres password=......" -lco SCHEMA=class_roster -nlt GEOMETRY -nln states States.shp
Stage 3: do the geometry column and join query bit
Now for the geometry column bit. The hint from the project brief about string concatenation in SQL led me to construct the following UPDATE
query to populate the geom
column I added to the postal_codes
table earlier.
It builds a text string, such as POINT(-123.61 41.22)
, and feeds it into the ST_GeomFromText
PostGIS function which turns it into a geometry object.
UPDATE class_roster.postal_codes
SET geom = ST_GeomFromText('POINT(' || lon || ' ' || lat || ')')
;
Next up is the crux of the exercise. By doing an INNER JOIN
of the students
table and the postal_codes
table on the postal code field, and selecting the student’s name and the geom
field, I get a set of results that gives me the coordinates of the centroid for each student’s postal code – that’s exactly what I’m trying to map!
SELECT
students.last_name || ', ' || students.first_name AS name,
students.postal_code,
postal_codes.geom
FROM students
INNER JOIN postal_codes
ON students.postal_code = postal_codes.code
ORDER BY name ASC
;
At this point, there are at least two options for mapping the results of that query with QGIS. You could create a table that shows the results of that query, or you could create a view that shows the results of that query.
As the project brief asks you to consider the easier of making updates or insertions to the class roster, a view is a better choice than a table. That’s because creating a table of the query results would be like a snapshot at the time you created the table, whereas a view is not a snapshot and instead consults the source tables it is based on whenever it is run. So, if a new student was added to the students
table, or a correction made to a postal code (whether that be the stated postal code for a given student in the students
table, or the location of the centroid for a postal code in postal_codes
table) the results returned by the view would have those updates.
This is the SQL I used for creating a view:
CREATE VIEW class_roster.vw_student_postal_codes
AS
SELECT
students.last_name || ', ' || students.first_name AS name,
students.postal_code,
postal_codes.geom
FROM students
INNER JOIN postal_codes
ON students.postal_code = postal_codes.code
ORDER BY name ASC
;
ALTER TABLE class_roster.vw_student_postal_codes
OWNER TO postgres
;
Stage 4: connect QGIS to PostGIS and make a map
Last up, mapping the data!
I connected QGIS to my PostGIS server (well, it was already connected from earlier, but now my new schema was there too). I added my vw_student_postal_codes
view / layer to my map, along with counties and states, and tada! Job done. Nearly.
I rather like cartography side of things, so I spent some time making the map look a bit nicer.
First I changed the projection to one I prefer for the US.
- from EPSG:4269 (NAD83)
- to EPSG:2163 (US National Atlas Equal Area)
Secondly I played around with layer ordering and symbology until the state lines, county lines, and colours were all harmonious.
Lastly I decided to add labels so you can see which student is from where. This provided an opportunity to try out “leader lines” or “callout lines” – something I knew was in the QGIS Map Design book I bought a few months ago, and had been meaning to experiment with.