The version of PostgreSQL we have installed on the course VM is 9.3. Complete manuals are available from the PostgreSQL site. The most useful sections to this class are probably “Tutorial” and “The SQL Language”.
Command-Line Processor
Interactive Mode
Inside your VM shell, you can start the command-line SQL processor psql
in the interactive mode by entering the following command:
psql congress
Here, congress
is the name of the database you want to access. When you are inside psql
, you will get a prompt that looks like the following:
congress=#
You can now use any SQL commands (CREATE
, DROP
, SELECT
, INSERT
, DELETE
) to manipulate the database. Remember to use semicolon (“;
“) to terminate SQL statements. Inside psql
, you will also find several (non-SQL) commands useful (these do not need to be terminated by semicolon):
\d
: list all tables and views
\d NAME
: display the schema information for a table or view with NAME
\i FILE.sql
: execute the SQL commands stored in a file named FILE.sql
\q
: quit the command-line processor
The command-line processor provides advanced command-line editing and auto-completion support similar to what you would find in popular interactive shells such as bash
.
Immediate Mode
You can use psql
with the -c
option to execute a psql
command or SQL statement directly from your shell; e.g.:
psql congress -c "\d"
psql congress -c "select count(*) from persons"
After each command/statement completes, control is immediately returned back to your shell. Note that you do not need to end the SQL statements with ;
in this case. You may need to enclose command/statement strings in quotes to prevent the shell from interpreting them (for example, if you do not quote *
in the above, the shell would expand *
into a list of files in your current directory).
This method of invoking psql
may be useful when writing shell scripts. Sometimes you might want to use an extra option -a
to have psql
echo the command/statement back, which is helpful in debugging the output from scripts.
Batch mode
You can write multiple psql
commands and SQL statements in one text file (say, named statements.sql
), and then use the command psql congress -af statements.sql
to execute them all. Use “;
” to signal the end of each SQL statement in the file. You will see examples of this usage in /opt/datacourse/data/congress/db/setup.sh
, which executes files create.sql
and load.sql
in the batch mode.
Two-Window Setup When Working with psql
This setup is very convenient when writing and debugging database queries. Basically, you open one terminal to run psql
interactively, and a second window for your favorite text editor—it can be another VM shell running a terminal-based editor such as vim
or nano
(see Shell Basics for more info). Or, if you have a local VM, you can use an editor in your host and edit the files in the shared folder.
You write SQL in a .sql
file in the second window, save it (and make sure it’s in the same working directory as psql
), and then switch to the terminal running psql
and run the file with the \i
command.
Loading a Database
You can find an example of loading a database from data files in /opt/datacourse/data/congress/db/
. First, you need to prepare the data files. The data files in the load/
subdirectory use a delimited ASCII format. Each line of the file represents a row, with columns of the row separated by vertical bar |
. Then, use the \COPY
command to load tables from data (see load.sql
for syntax). Note that you can specify the option of loading an empty column value as a special value NULL
instead of the empty string.
Finally, remember to check any loading errors reported. Errors such as invalid foreign key references are useful in spotting data quality issues.