PostgreSQL Tips

The version of PostgreSQL we have installed on the course VM is 9.3. Complete manuals are available from the PostgreSQL site. The most useful sections to this class are probably “Tutorial” and “The SQL Language”.

Command-Line Processor

Interactive Mode

Inside your VM shell, you can start the command-line SQL processor psql in the interactive mode by entering the following command:
psql congress
Here, congress is the name of the database you want to access. When you are inside psql, you will get a prompt that looks like the following:
congress=#
You can now use any SQL commands (CREATE, DROP, SELECT, INSERT, DELETE) to manipulate the database. Remember to use semicolon (“;“) to terminate SQL statements. Inside psql, you will also find several (non-SQL) commands useful (these do not need to be terminated by semicolon):
\d: list all tables and views
\d NAME: display the schema information for a table or view with NAME
\i FILE.sql: execute the SQL commands stored in a file named FILE.sql
\q: quit the command-line processor
The command-line processor provides advanced command-line editing and auto-completion support similar to what you would find in popular interactive shells such as bash.

Immediate Mode

You can use psql with the -c option to execute a psql command or SQL statement directly from your shell; e.g.:
psql congress -c "\d"
psql congress -c "select count(*) from persons"

After each command/statement completes, control is immediately returned back to your shell. Note that you do not need to end the SQL statements with ; in this case. You may need to enclose command/statement strings in quotes to prevent the shell from interpreting them (for example, if you do not quote * in the above, the shell would expand * into a list of files in your current directory).
This method of invoking psql may be useful when writing shell scripts. Sometimes you might want to use an extra option -a to have psql echo the command/statement back, which is helpful in debugging the output from scripts.

Batch mode

You can write multiple psql commands and SQL statements in one text file (say, named statements.sql), and then use the command psql congress -af statements.sql to execute them all. Use “;” to signal the end of each SQL statement in the file. You will see examples of this usage in /opt/datacourse/data/congress/db/setup.sh, which executes files create.sql and load.sql in the batch mode.

Two-Window Setup When Working with psql

This setup is very convenient when writing and debugging database queries. Basically, you open one terminal to run psql interactively, and a second window for your favorite text editor—it can be another VM shell running a terminal-based editor such as vim or nano (see Shell Basics for more info). Or, if you have a local VM, you can use an editor in your host and edit the files in the shared folder.

You write SQL in a .sql file in the second window, save it (and make sure it’s in the same working directory as psql), and then switch to the terminal running psql and run the file with the \i command.

Loading a Database

You can find an example of loading a database from data files in /opt/datacourse/data/congress/db/. First, you need to prepare the data files. The data files in the load/ subdirectory use a delimited ASCII format. Each line of the file represents a row, with columns of the row separated by vertical bar |. Then, use the \COPY command to load tables from data (see load.sql for syntax). Note that you can specify the option of loading an empty column value as a special value NULL instead of the empty string.

Finally, remember to check any loading errors reported. Errors such as invalid foreign key references are useful in spotting data quality issues.