Database

Version 1 (Paul Holland, 04/12/2012 10:20 pm)

1 1 Paul Holland
h1. Database
2 1 Paul Holland
3 1 Paul Holland
h2. Creating a new instance of the database
4 1 Paul Holland
5 1 Paul Holland
The Postgres command, pg_dump, can be used to dump the sql statements required to build the database schema and define users, functions, views, etc. When used with the options below, this will dump only the schema (not any database contents) to file ossdb.sql.
6 1 Paul Holland
7 1 Paul Holland
<code>
8 1 Paul Holland
pg_dump --create -f ossdb.sql -s ossdb
9 1 Paul Holland
</code>
10 1 Paul Holland
11 1 Paul Holland
For our purposes, the only user should be user "fossy". So the next step is to create that user (role) by adding to ossdb.sql:
12 1 Paul Holland
13 1 Paul Holland
<code>
14 1 Paul Holland
create role fossy with option createdb login password fossy
15 1 Paul Holland
</code>
16 1 Paul Holland
17 1 Paul Holland
**NOTE:** The password must match that in /usr/local/share/fossology/dbconnect/
18 1 Paul Holland
19 1 Paul Holland
After the this load file (ossdb.sql) has been created, keep it in a version control system to make future deployments easier. 
20 1 Paul Holland
21 1 Paul Holland
h3. DB API (libfossdb)
22 1 Paul Holland
23 1 Paul Holland
**NOTE: Not all of us agree with the logic below and prefer to use direct libpq calls since they are much more powerful and we don't have to maintain the interface described here.**
24 1 Paul Holland
25 1 Paul Holland
There are two ways that applications can access the DB. Either they can use the Postgres function calls, or they can use the generic DB API.
26 1 Paul Holland
27 1 Paul Holland
Programs that use the Postgres library have access to every possible function and return value available from Postgres. However, there are some detracting issues to consider:
28 1 Paul Holland
29 1 Paul Holland
  * Complexity. While the Postgres API gives a very complete level of control, it is also overly complicated.
30 1 Paul Holland
  * Portability. Developing for Postgres mean developing for a specific version of Postgres. If the database is every upgraded or replaced with a different DB, then you will need to modify every program that uses the DB.
31 1 Paul Holland
  * Consistency. Since every program will implement their DB interface independently, it is very possible for one program to be missing a critical function that another program has. Every program will be different. 
32 1 Paul Holland
33 1 Paul Holland
Although there are some generic DB APIs available, I found them to either be too complicated or in need of wrappers to work with our DB configuration. The solution was libfossdb -- the generic DB API. This is a very simple API for accessing the DB used by the FOSSology analysis system. The functions are minimalistic and intended to be used without knowledge of Postgres (but knowledge of SQL is required).
34 1 Paul Holland
35 1 Paul Holland
By using the **libfossdb**, all applications use the same DB functions. If a problem occurs or we change databases, then only the API needs to be altered. In addition, the API ensures that all tools implement the DB interface the same way. 
36 1 Paul Holland
37 1 Paul Holland
h2. Data Structures
38 1 Paul Holland
39 1 Paul Holland
Applications using libfossdb manage a "void *" structure that contains DB information. Under normal circumstances, the application should never need to access the actual contents of this structure. The structure is allocated by DBopen() and freed by DBclose().
40 1 Paul Holland
41 1 Paul Holland
For specific details, the structure is as follows:
42 1 Paul Holland
43 1 Paul Holland
<pre>
44 1 Paul Holland
struct dbapi
45 1 Paul Holland
  {
46 1 Paul Holland
  /****
47 1 Paul Holland
   This is a DB-specific structure
48 1 Paul Holland
   It holds connections and results.
49 1 Paul Holland
   All manipulations should use the DB API to access it.
50 1 Paul Holland
   If the DB ever changes, just change this structure (for state)
51 1 Paul Holland
   and the DB functions.
52 1 Paul Holland
   ****/
53 1 Paul Holland
  PGconn *Conn; /* DB-specific connection */
54 1 Paul Holland
  PGresult *Res; /* result from query */
55 1 Paul Holland
 } ;
56 1 Paul Holland
typedef struct dbapi dbapi;
57 1 Paul Holland
</pre>
58 1 Paul Holland
59 1 Paul Holland
**NOTE**: If the DB or API changes, then this structure may change! Non-libfossdb functions should not depend on the contents of this structure! (This is why all applications of the API simple manage a void pointer. They don't need to see the internals.)
60 1 Paul Holland
61 1 Paul Holland
h3. Functions
62 1 Paul Holland
63 1 Paul Holland
The following functions are defined by the libfossdb.
64 1 Paul Holland
65 1 Paul Holland
**void * DBopen ()**
66 1 Paul Holland
67 1 Paul Holland
This function initiates a connection to the DB and returns a handle for the DB connection. This function will use a configuration file that contains the DB login account and password.
68 1 Paul Holland
69 1 Paul Holland
# If the environment variable $DBCONF is defined, then it specifies the path to the configuration file.
70 1 Paul Holland
# If $DBCONF is undefined, then /etc/ossdb/dbconnect/ossdb is used. 
71 1 Paul Holland
72 1 Paul Holland
**void DBclose (void *DB)**
73 1 Paul Holland
74 1 Paul Holland
This function closes the DB connection created by DBopen() and frees the data structure. Even if the DB connection closes prematurely, this function must be called to free the memory.
75 1 Paul Holland
76 1 Paul Holland
77 1 Paul Holland
**void * DBmove (void *DB)**
78 1 Paul Holland
79 1 Paul Holland
The DB structure stores the connection and the results from the last SQL command. If you need to created a SQL query from a DB results (e.g., looping over the DB results), then you will need to move the results from the last command to a different handle. This function detaches the results from the DB handle and returns a new handle containing the results.
80 1 Paul Holland
81 1 Paul Holland
<code>DBnew = DBmove(DBold);</code>
82 1 Paul Holland
83 1 Paul Holland
There are some warnings when using this function:
84 1 Paul Holland
85 1 Paul Holland
  * The results (DBnew) contains data but not an open DB connection. You cannot use this handle to communicate with the DB but you can use it to access the results from a previous query.
86 1 Paul Holland
  * When you are finished with the handle, you must call DBclose(). For example: DBnew = DBmove(DBold); DBclose(DBnew); Closing the data handle does not close the DB connection from the original (DBold) handle.
87 1 Paul Holland
  * There is no way to re-attach the data to the old handle. 
88 1 Paul Holland
89 1 Paul Holland
90 1 Paul Holland
**int DBaccess (void *DB, char *SQL)**
91 1 Paul Holland
92 1 Paul Holland
This is the main function for the entire API. Given a DB handle created from DBopen() and an SQL query, perform the query. There are a few possible result codes:
93 1 Paul Holland
94 1 Paul Holland
  * 0. This value indicates that the query was successful, but there are no results. This usually happens when the SQL query performs an INSERT or UPDATE.
95 1 Paul Holland
  * 1. This value indicates that the query was successful, and there are results. This usually happens when the SQL query performs a SELECT. **NOTE**: The results may be empty -- zero returns rows is different from no returned rows.
96 1 Paul Holland
  * -1. The query failed due to a constraint error. The exact constraint is not specified.
97 1 Paul Holland
  * -2. The query failed due to an unspecified error.
98 1 Paul Holland
  * -3. The query timed out. By default the timeout is set to two minutes. 
99 1 Paul Holland
100 1 Paul Holland
101 1 Paul Holland
**int DBdatasize (void *DB)**
102 1 Paul Holland
103 1 Paul Holland
If there is data available, this identifies the number of rows returned. If there is no data (DBaccess() returned 0) or the DB handle is invalid, then -1 is returned. If the DBaccess() returned 1 but there is no data, then 0 is returned.
104 1 Paul Holland
105 1 Paul Holland
106 1 Paul Holland
**int DBcolsize (void *DB)**
107 1 Paul Holland
108 1 Paul Holland
Returns the number of columns in the result. If there is no data (DBaccess() returned 0) or the DB handle is invalid, then -1 is returned. If the DBaccess() returned 1 but there is no data, then the number of columns will still be returned.
109 1 Paul Holland
110 1 Paul Holland
111 1 Paul Holland
**char * DBgetcolname (void *DB, int Col)**
112 1 Paul Holland
113 1 Paul Holland
Returns the name of a specific column, or NULL if the column does not exist. This returns a pointer to a static memory location. The caller must **NOT** modify or free this value.
114 1 Paul Holland
115 1 Paul Holland
116 1 Paul Holland
**int DBgetcolnum (void *DB, char *ColName)**
117 1 Paul Holland
118 1 Paul Holland
Given a string, return the number of the column whose name matches the string. -1 is returned if the column does not exist.
119 1 Paul Holland
120 1 Paul Holland
121 1 Paul Holland
**char * DBgetvalue (void *DB, int Row, int Col)**
122 1 Paul Holland
123 1 Paul Holland
Return the contents of a cell in the results table, or NULL if the column does not exist. This returns a pointer to a static memory location. The caller must **NOT** modify or free this value.
124 1 Paul Holland
125 1 Paul Holland
126 1 Paul Holland
**char * DBgetvaluename(DB,Row,ColName)**
127 1 Paul Holland
128 1 Paul Holland
This macro returns the contents of a cell, but the column is specified by name rather than number. NULL is returned if the row or column does not exist. This returns a pointer to a static memory location. The caller must **NOT** modify or free this value.
129 1 Paul Holland
130 1 Paul Holland
131 1 Paul Holland
**int DBisnull (void *DB, int Row, int Col)**
132 1 Paul Holland
133 1 Paul Holland
The DBgetvalue() function does not distinguish between a non-existant cell and one with a value of NULL. This function determines if the cell is defined and has a value of NULL. It returns 1 if the cell is NULL and 0 if it is non-NULL or undefined. 
134 1 Paul Holland
135 1 Paul Holland
h3. Sample Usage
136 1 Paul Holland
137 1 Paul Holland
An example use of the libfossdb API:
138 1 Paul Holland
139 1 Paul Holland
<pre>
140 1 Paul Holland
void * DB; /* DB handle */
141 1 Paul Holland
int rc; /* return code */
142 1 Paul Holland
char *SQL; /* the SQL query to run */
143 1 Paul Holland
int i,j; /* iterators */
144 1 Paul Holland
145 1 Paul Holland
DB = DBopen();
146 1 Paul Holland
if (!DB) { fprintf(stderr,"ERROR: Could not connect to the DB\n"); exit(1); }
147 1 Paul Holland
rc = DBaccess(DB,SQL);
148 1 Paul Holland
switch(rc)
149 1 Paul Holland
  {
150 1 Paul Holland
  case 0: /* Ok, Got no results. Likely due to an INSERT or UPDATE SQL query */
151 1 Paul Holland
    break;
152 1 Paul Holland
  case 1: /* Got results.  Likely due to a SELECT query */
153 1 Paul Holland
    /* Display results */
154 1 Paul Holland
    for(i=0; i<DBgetdatasize(DB); i++) /* for each row */
155 1 Paul Holland
      {
156 1 Paul Holland
      printf("=====\n");  /* record separator */
157 1 Paul Holland
      for(j=0; j<DBcolsize(DB); j++) /* for each column */
158 1 Paul Holland
        {
159 1 Paul Holland
        printf("%s: %s\n",DBgetcolname(DB,j), DBgetvalue(DB,i,j));
160 1 Paul Holland
        }
161 1 Paul Holland
      }
162 1 Paul Holland
  default:
163 1 Paul Holland
     fprintf(stderr,"ERROR running SQL command: %s\n",SQL);
164 1 Paul Holland
  }
165 1 Paul Holland
 DBclose(DB);
166 1 Paul Holland
</pre>
167 1 Paul Holland
168 1 Paul Holland
169 1 Paul Holland
h3. Test Applications
170 1 Paul Holland
171 1 Paul Holland
The following test applications are provided with the libfossdb library.
172 1 Paul Holland
173 1 Paul Holland
**dbtest**
174 1 Paul Holland
175 1 Paul Holland
This stand-alone program runs DBopen(). It reads SQL commands from stdin and calls DBaccess(). All results are displayed to stdout. When EOF is received (stdin is closed), DBclose() is called.
176 1 Paul Holland
177 1 Paul Holland
This is a very poor replacement for psql. In particlar, psql can handle multiple-line SQL commands and returns very verbose messages. In contrast, the dbtest program is more than adequate for testing the library's functions.
178 1 Paul Holland
179 1 Paul Holland
180 1 Paul Holland
**dbq**
181 1 Paul Holland
182 1 Paul Holland
This program queries the job and jobqueue tables using the same calls that the Scheduler uses.
183 1 Paul Holland
184 1 Paul Holland
<pre>
185 1 Paul Holland
Usage: ./dbq <command> [args]
186 1 Paul Holland
  Commands:
187 1 Paul Holland
    list :: list ALL elements in the queue.
188 1 Paul Holland
      If args are provided then only list those queue items
189 1 Paul Holland
    top :: list top elements in the queue.
190 1 Paul Holland
      If args are provided then only list those queue items
191 1 Paul Holland
    add  :: add a queue item.
192 1 Paul Holland
      Args are field=value pairs to be inserted.  They should be SQL compliant
193 1 Paul Holland
      If no args, then you will be prompted for every value
194 1 Paul Holland
    update :: update an existing queue item.
195 1 Paul Holland
      1st arg type of record modify: 'job' or 'jobqueue'.
196 1 Paul Holland
      2nd arg is the ID of the record to modify.
197 1 Paul Holland
      Remaining Args are field=value pairs to be modified.
198 1 Paul Holland
      They should be SQL compliant
199 1 Paul Holland
      If no args, then you will be prompted for every value
200 1 Paul Holland
    delete :: remove an existing queue item.
201 1 Paul Holland
      Args are JOB IDs of the queue item to delete.
202 1 Paul Holland
      This also removes all associated JOBQUEUE records.
203 1 Paul Holland
</pre>