In Lesson 3 of GEOG 868 I played around with PostgreSQL and PostGIS for the first time. For the project at the end you are given a
class_roster.txt file that contains a list of students and their postal codes. The task was to join that with data provided earlier in the lesson and then plot the students’ locations on a map in QGIS. Here’s how I went about doing that.
The project brief provided some hints and tips about what you should be doing, which I translated into roughly 4 stages:
- set up a database and schema
- import the data into the database
- do the geometry column and join query bit
- connect QGIS to PostGIS and make a map
Stage 1: set up a database and schema
I already had my
Lesson3db database from working through the lesson, so I used that as there was no particular need to have a separate database for this. However, a separate schema made sense, as this project was a new and different topic to the existing schemas from the lesson (which were
CREATE SCHEMA class_roster AUTHORIZATION postgres ;
Stage 2: import the data into the database
Before jumping into this I had a quick ponder, in the form of sketching an ER diagram, about what I was intending to do. The general plan was:
- table for
students(names and their postal codes)
- table for
postal_codes(point geometry of postal code centroids)
- and I may as well put the counties and states shapefiles in PostGIS too
A fairly unexciting ER diagram in the end – 4 boxes, 2 of them joined with a line. But a useful exercise nevertheless.
First up, creating the tables.
Starting with the
postal_codes table, I took roughly the following steps:
pgcli \copyto insert CSV into
AddGeometryColumn()to add a
students table, I took roughly the following steps:
studentstable (with an ID field)
csv_students(without an ID field)
pgcli \copyto insert CSV into
INSERT ... SELECTto populate
studentstable with contents of
I say roughly, because I forgot about the
\copy command needing the source CSV and destination table to have the same number of columns, so I
DROP’d and recreated tables a couple of times.
I decided the
students table should have an ID rather than use the student’s firstname+lastname as a primary key, so to facilitate that I created the temporary
csv_students table. My reason for the ID field being “something something future proofing” – like if this was a real school database, you might eventually end up with two people with the same name, so firstname+lastname a good primary key does not make.
Here is the SQL and pgcli commands I used:
CREATE TABLE class_roster.postal_codes ( code character varying(5), lon real, lat real, PRIMARY KEY (code) ) ; ALTER TABLE IF EXISTS class_roster.postal_codes OWNER to postgres ;
\copy class_roster.postal_codes FROM 'postal_codes.txt' WITH (FORMAT csv, HEADER True, QUOTE '"')
adding geometry column to
SELECT AddGeometryColumn('class_roster', 'postal_codes', 'geom', 4269, 'POINT', 2) ;
CREATE TABLE class_roster.students ( id serial, last_name character varying(40), first_name character varying(40), postal_code character varying(5), PRIMARY KEY (id) ) ; ALTER TABLE IF EXISTS class_roster.students OWNER to postgres ;
CREATE TABLE class_roster.csv_students ( last_name character varying(40), first_name character varying(40), postal_code character varying(5), ) ; ALTER TABLE IF EXISTS class_roster.csv_students OWNER to postgres ;
\copy class_roster.csv_students FROM '868_roster_sp2_22.txt' WITH (FORMAT csv, HEADER True, QUOTE '"')
INSERT INTO class_roster.students (last_name, first_name, postal_code) ( SELECT last_name, first_name, postal_code FROM class_roster.csv_students ) ;
With all the necessary data in the database, I just had the counties and states shapefiles left to import. These could have been added to the QGIS project directly rather than to PostGIS first, but I thought I may as well add them to the database since I figured how to do that earlier in Lesson 3. I did this with
ogr2ogr -f PostgreSQL PG:"host=localhost port=5432 dbname=Lesson3db user=postgres password=......" -lco SCHEMA=class_roster -nlt GEOMETRY counties.shp
ogr2ogr -f PostgreSQL PG:"host=localhost port=5432 dbname=Lesson3db user=postgres password=......" -lco SCHEMA=class_roster -nlt GEOMETRY -nln states States.shp
Stage 3: do the geometry column and join query bit
Now for the geometry column bit. The hint from the project brief about string concatenation in SQL led me to construct the following
UPDATE query to populate the
geom column I added to the
postal_codes table earlier.
It builds a text string, such as
POINT(-123.61 41.22), and feeds it into the
ST_GeomFromText PostGIS function which turns it into a geometry object.
UPDATE class_roster.postal_codes SET geom = ST_GeomFromText('POINT(' || lon || ' ' || lat || ')') ;
Next up is the crux of the exercise. By doing an
INNER JOIN of the
students table and the
postal_codes table on the postal code field, and selecting the student’s name and the
geom field, I get a set of results that gives me the coordinates of the centroid for each student’s postal code – that’s exactly what I’m trying to map!
SELECT students.last_name || ', ' || students.first_name AS name, students.postal_code, postal_codes.geom FROM students INNER JOIN postal_codes ON students.postal_code = postal_codes.code ORDER BY name ASC ;
At this point, there are at least two options for mapping the results of that query with QGIS. You could create a table that shows the results of that query, or you could create a view that shows the results of that query.
As the project brief asks you to consider the easier of making updates or insertions to the class roster, a view is a better choice than a table. That’s because creating a table of the query results would be like a snapshot at the time you created the table, whereas a view is not a snapshot and instead consults the source tables it is based on whenever it is run. So, if a new student was added to the
students table, or a correction made to a postal code (whether that be the stated postal code for a given student in the
students table, or the location of the centroid for a postal code in
postal_codes table) the results returned by the view would have those updates.
This is the SQL I used for creating a view:
CREATE VIEW class_roster.vw_student_postal_codes AS SELECT students.last_name || ', ' || students.first_name AS name, students.postal_code, postal_codes.geom FROM students INNER JOIN postal_codes ON students.postal_code = postal_codes.code ORDER BY name ASC ; ALTER TABLE class_roster.vw_student_postal_codes OWNER TO postgres ;
Stage 4: connect QGIS to PostGIS and make a map
Last up, mapping the data!
I connected QGIS to my PostGIS server (well, it was already connected from earlier, but now my new schema was there too). I added my
vw_student_postal_codes view / layer to my map, along with counties and states, and tada! Job done. Nearly.
I rather like cartography side of things, so I spent some time making the map look a bit nicer.
First I changed the projection to one I prefer for the US.
- from EPSG:4269 (NAD83)
- to EPSG:2163 (US National Atlas Equal Area)
Secondly I played around with layer ordering and symbology until the state lines, county lines, and colours were all harmonious.
Lastly I decided to add labels so you can see which student is from where. This provided an opportunity to try out “leader lines” or “callout lines” – something I knew was in the QGIS Map Design book I bought a few months ago, and had been meaning to experiment with.