Home » Practical, Inclusive Database Design

Practical, Inclusive Database Design

This activity should be completed in your project groups. Elect one person to take notes, and submit a PDF of your responses to the discussion questions, as well as any other discussion that you have, to Gradescope.

Plan for this module

In this module, we will go through a case study of:

  • How backend designs impact end users
  • How to improve a database design for inclusivity*
  • How database views can make schema evolution easier

* Important disclaimer: This is an evolving space, and discussion/recommendations are also evolving!

Student database for Springfield Elementary


The Springfield Elementary School maintain a students database where they want to:

  • Record personal and biological info of students for identity proofing, compliance, and communication
  • Record students’ legal parents and/or guardians
    • Each student must have at least one legal parent or guardian

Below is the old schema of their database. To keep things simple for this module, assume the kinds of info captured by the old schema is mostly sufficient: i.e., don’t worry about other things like emails, social security numbers, etc.

CREATE TABLE Person(
  id INTEGER PRIMARY KEY NOT NULL,
  first_name VARCHAR(20) NOT NULL CHECK(first_name NOT LIKE '% %'),
  middle_name VARCHAR(20) CHECK(middle_name NOT LIKE '% %'),
  last_name VARCHAR(20) NOT NULL CHECK(last_name NOT LIKE '% %'),
  sex CHAR(1) NOT NULL CHECK(sex IN ('M', 'F')),
  dob DATE NOT NULL,
  address VARCHAR(200) NOT NULL,
  phone INTEGER CHECK(phone BETWEEN 0 AND 9999999999));

CREATE TABLE Student(
  id INTEGER PRIMARY KEY NOT NULL REFERENCES Person(id),
  father INTEGER REFERENCES Person(id),
  mother INTEGER REFERENCES Person(id),
  guardian INTEGER REFERENCES Person(id),
  CHECK(father IS NOT NULL
        OR mother IS NOT NULL
        OR guardian IS NOT NULL));

What are the problems of this old schema?

  1. Suppose you write down all the dependencies; would the schema be a normal form?
    • Recall normal forms are a way for algorithms to detect/remove redundancies in schema automatically
  2. Do you see any problems beyond what the algorithm can detect?
  3. Among the problems you found above:
    • Which ones could be fixed at the user interface level alone, e.g., by changing the wording on web pages?
    • Or at the application level, e.g., by changing how you SELECT, INSERT, DELETE or UPDATE the tables?
    • Which ones cannot be fixed without changing the schema?

Take some time to think about these questions. When you are ready, or if you feel stuck spotting problems, check out the scenarios below.


See Scenario 1

Scenario 1

A few exchange students just arrived at Springfield. Here are what their names look like on their travel document:

  • Gong Sun Zan (Bo Gui) from China
  • Benito Pablo Juárez García from Mexico
  • Ikponmwosa Kamiolakamioluwalamibe from Nigeria

Do you see some problems of the old schema now?

See problems and fixes
  • The first/middle/last componentization of names, the length limit of 20 per component, and the insistence on having no space within each are Anglocentric.

Here some good resources/guidelines regarding names:

  • https://www.w3.org/International/questions/qa-personal-names
  • https://designsystem.digital.gov/patterns/create-a-user-profile/name/

Here is a possible fix:

[sourcecode language=”sql”] CREATE TABLE Person(…
full_name VARCHAR(300) NOT NULL, — legal & respects cultural convention
  family_name VARCHAR(100), — substr used for sorting/backward-compatibility
  middle_name VARCHAR(100), — substr used for backward-compatibility
  given_name VARCHAR(100), — substr used for backward-compatibility
  preferred_name VARCHAR(80), — used for communication

[/sourcecode]

Note that:

  • Full name respects the persons’ cultures
  • Optional family/middle/given names (not last/first, which are confusing labels) are used for some purposes, e.g.: ORDER BY COALESCE(family_name, '') || ', ' || full_name
  • Optional preferred names are used for communication
  • For performance, you may still cap the lengths reasonably (your mileage will vary on different DBMS’s)

See Scenario 2

Scenario 2

A new student at Springfield Elementary, Aria Zavala, is adopted by lesbian parents. Her biological parents were a heterosexual couple, but she also had a gestational surrogate mother.

Do you see some more problems of the old schema now?

See problems and fixes
  • Some families don’t have the traditional father/mother roles
  • Sex isn’t binary, and recording students’ gender identity is good for inclusive communication
  • Beyond legal parents/guardians, does the school really need to know about biological/surrogate parents?

Here some good resources/guidelines regarding gender, sex, and adoption wording:

  • https://designsystem.digital.gov/patterns/create-a-user-profile/gender-identity-and-sex/
  • https://en.wikipedia.org/wiki/Language_of_adoption

Here is a possible fix:

[sourcecode language=”sql”] CREATE TABLE Person(…
  sex CHAR(1) NOT NULL CHECK(sex IN (‘M’, ‘F’, ‘X’)), — to match legal doc

CREATE TABLE Student(…
  gender_identity VARCHAR(20), — use NULL for “prefer not to answer”
  parent1 INTEGER REFERENCES Person(id),
  parent2 INTEGER REFERENCES Person(id),

  CHECK(parent1 IS NOT NULL
OR parent2 IS NOT NULL …
[/sourcecode]

Note that:

  • Still need sex for identity proofing, but extend to M/F/X
    • U.S. adopted M/F/X coding for passport in 2022
  • Don’t force parents to declare father/mother roles
    • Could add optional, self-declared roles
  • Can’t just fold parents into guardians: legally, there are still distinctions between them
  • We let students declare gender identity, but collecting this info from parents/guardians is perhaps unnecessary

A new schema

Putting together the fixes we developed above for various problems in the old schema, we have a new one:

CREATE TABLE Person(
  id INTEGER PRIMARY KEY NOT NULL,
  full_name VARCHAR(400) NOT NULL, -- legal & respecting cultural convention
  family_name VARCHAR(200), -- part used for sorting/backward-compatibility
  middle_name VARCHAR(100), -- part used for backward-compatibility
  given_name VARCHAR(100), -- part used for backward-compatibility
  preferred_name VARCHAR(80), -- used for communication
  sex CHAR(1) NOT NULL CHECK(sex IN ('M', 'F', 'X’)), -- to match legal doc
  dob DATE NOT NULL,
  address VARCHAR(200) NOT NULL,
  phone INTEGER);
CREATE TABLE Student(
  id INTEGER PRIMARY KEY NOT NULL REFERENCES Person(id),
  gender_identity VARCHAR(20), -- use NULL for “prefer not to answer”
  parent1 INTEGER REFERENCES Person(id),
  parent2 INTEGER REFERENCES Person(id),
  guardian INTEGER REFERENCES Person(id),
  CHECK(parent1 IS NOT NULL 
        OR parent2 IS NOT NULL
        OR guardian IS NOT NULL));

Note that we have also removed 10-digit requirement on phone numbers, which seems to assume U.S. numbers (unless there is a clearly stated school policy that require domestic numbers).

Next challenge: schema evolution

Changes to database schema are super-disruptive:

  • Old data must be converted
    • What if old data is lacking? Are there reasonable default or placeholder values?
  • Old code must be rewritten
    • What if you don’t have the expertise/resource to rewrite all legacy code?
  • Migrating to the new database/applications may interrupt normal service

There is a market for fancy migration tools, but just plain database views can go a long way!

Use views to test new schema/code

If you can define the transformation of old data to new schema in SQL, you can define a view for each table in the new schema, e.g.

CREATE VIEW NewPerson AS
SELECT id,
  first_name || COALESCE(' ' || middle_name, '') || ' ' || last_name AS full_name,
  last_name AS family_name, first_name AS given_name,
  NULL AS preferred_name, -- allow user to update when rolled out
  sex, -- allow user to update when rolled out
  dob, address, phone
FROM Person;

Note that:

  • NewStudent view would involve info from both Person and Student: left as an exercise
  • You can test data transformation as well as queries over new schema before deploying them!
    • No data copied; no worry about freshness/consistency

Use views to keep old code running

After converting the database, you can define a view for each table in the old schema, e.g.

CREATE VIEW OldPerson AS
SELECT id,
  COALESCE(given_name, full_name) AS first_name,
  TRIM(SUBSTRING(full_name, 
                 LENGTH(COALESCE(given_name,'')) + 1,
                 LENGTH(full_name) - LENGTH(COALESCE(given_name,'')||COALESCE(family_name,'')))) AS middle_name,
  COALESCE(family_name, full_name) AS last_name,
  sex, dob, address, phone
FROM Person;

Note that:

  • This works well for old data, but it is hard to map new names with no family/given parts back to old schema
    • Here we just default every part to be the full name
  • OldStudent view is left as an exercise
  • Old code can just refer to these views instead
    • With some SCHEMA trick you do not even need to change the table/view names in SQL!

Take-away points

  • Ethical design is not just for user interfaces — backend also matters
  • Database schema can limit or empower interfaces
  • Try putting yourself in users’ shoes,and remember there are more than just “majority” users
  • Database views are a neat way to help with schema evolution