Database
Version 1 (Paul Holland, 04/12/2012 10:20 pm)
| 1 | 1 | Paul Holland | h1. Database |
|---|---|---|---|
| 2 | 1 | Paul Holland | |
| 3 | 1 | Paul Holland | h2. Creating a new instance of the database |
| 4 | 1 | Paul Holland | |
| 5 | 1 | Paul Holland | The Postgres command, pg_dump, can be used to dump the sql statements required to build the database schema and define users, functions, views, etc. When used with the options below, this will dump only the schema (not any database contents) to file ossdb.sql. |
| 6 | 1 | Paul Holland | |
| 7 | 1 | Paul Holland | <code> |
| 8 | 1 | Paul Holland | pg_dump --create -f ossdb.sql -s ossdb |
| 9 | 1 | Paul Holland | </code> |
| 10 | 1 | Paul Holland | |
| 11 | 1 | Paul Holland | For our purposes, the only user should be user "fossy". So the next step is to create that user (role) by adding to ossdb.sql: |
| 12 | 1 | Paul Holland | |
| 13 | 1 | Paul Holland | <code> |
| 14 | 1 | Paul Holland | create role fossy with option createdb login password fossy |
| 15 | 1 | Paul Holland | </code> |
| 16 | 1 | Paul Holland | |
| 17 | 1 | Paul Holland | **NOTE:** The password must match that in /usr/local/share/fossology/dbconnect/ |
| 18 | 1 | Paul Holland | |
| 19 | 1 | Paul Holland | After the this load file (ossdb.sql) has been created, keep it in a version control system to make future deployments easier. |
| 20 | 1 | Paul Holland | |
| 21 | 1 | Paul Holland | h3. DB API (libfossdb) |
| 22 | 1 | Paul Holland | |
| 23 | 1 | Paul Holland | **NOTE: Not all of us agree with the logic below and prefer to use direct libpq calls since they are much more powerful and we don't have to maintain the interface described here.** |
| 24 | 1 | Paul Holland | |
| 25 | 1 | Paul Holland | There are two ways that applications can access the DB. Either they can use the Postgres function calls, or they can use the generic DB API. |
| 26 | 1 | Paul Holland | |
| 27 | 1 | Paul Holland | Programs that use the Postgres library have access to every possible function and return value available from Postgres. However, there are some detracting issues to consider: |
| 28 | 1 | Paul Holland | |
| 29 | 1 | Paul Holland | * Complexity. While the Postgres API gives a very complete level of control, it is also overly complicated. |
| 30 | 1 | Paul Holland | * Portability. Developing for Postgres mean developing for a specific version of Postgres. If the database is every upgraded or replaced with a different DB, then you will need to modify every program that uses the DB. |
| 31 | 1 | Paul Holland | * Consistency. Since every program will implement their DB interface independently, it is very possible for one program to be missing a critical function that another program has. Every program will be different. |
| 32 | 1 | Paul Holland | |
| 33 | 1 | Paul Holland | Although there are some generic DB APIs available, I found them to either be too complicated or in need of wrappers to work with our DB configuration. The solution was libfossdb -- the generic DB API. This is a very simple API for accessing the DB used by the FOSSology analysis system. The functions are minimalistic and intended to be used without knowledge of Postgres (but knowledge of SQL is required). |
| 34 | 1 | Paul Holland | |
| 35 | 1 | Paul Holland | By using the **libfossdb**, all applications use the same DB functions. If a problem occurs or we change databases, then only the API needs to be altered. In addition, the API ensures that all tools implement the DB interface the same way. |
| 36 | 1 | Paul Holland | |
| 37 | 1 | Paul Holland | h2. Data Structures |
| 38 | 1 | Paul Holland | |
| 39 | 1 | Paul Holland | Applications using libfossdb manage a "void *" structure that contains DB information. Under normal circumstances, the application should never need to access the actual contents of this structure. The structure is allocated by DBopen() and freed by DBclose(). |
| 40 | 1 | Paul Holland | |
| 41 | 1 | Paul Holland | For specific details, the structure is as follows: |
| 42 | 1 | Paul Holland | |
| 43 | 1 | Paul Holland | <pre> |
| 44 | 1 | Paul Holland | struct dbapi |
| 45 | 1 | Paul Holland | { |
| 46 | 1 | Paul Holland | /**** |
| 47 | 1 | Paul Holland | This is a DB-specific structure |
| 48 | 1 | Paul Holland | It holds connections and results. |
| 49 | 1 | Paul Holland | All manipulations should use the DB API to access it. |
| 50 | 1 | Paul Holland | If the DB ever changes, just change this structure (for state) |
| 51 | 1 | Paul Holland | and the DB functions. |
| 52 | 1 | Paul Holland | ****/ |
| 53 | 1 | Paul Holland | PGconn *Conn; /* DB-specific connection */ |
| 54 | 1 | Paul Holland | PGresult *Res; /* result from query */ |
| 55 | 1 | Paul Holland | } ; |
| 56 | 1 | Paul Holland | typedef struct dbapi dbapi; |
| 57 | 1 | Paul Holland | </pre> |
| 58 | 1 | Paul Holland | |
| 59 | 1 | Paul Holland | **NOTE**: If the DB or API changes, then this structure may change! Non-libfossdb functions should not depend on the contents of this structure! (This is why all applications of the API simple manage a void pointer. They don't need to see the internals.) |
| 60 | 1 | Paul Holland | |
| 61 | 1 | Paul Holland | h3. Functions |
| 62 | 1 | Paul Holland | |
| 63 | 1 | Paul Holland | The following functions are defined by the libfossdb. |
| 64 | 1 | Paul Holland | |
| 65 | 1 | Paul Holland | **void * DBopen ()** |
| 66 | 1 | Paul Holland | |
| 67 | 1 | Paul Holland | This function initiates a connection to the DB and returns a handle for the DB connection. This function will use a configuration file that contains the DB login account and password. |
| 68 | 1 | Paul Holland | |
| 69 | 1 | Paul Holland | # If the environment variable $DBCONF is defined, then it specifies the path to the configuration file. |
| 70 | 1 | Paul Holland | # If $DBCONF is undefined, then /etc/ossdb/dbconnect/ossdb is used. |
| 71 | 1 | Paul Holland | |
| 72 | 1 | Paul Holland | **void DBclose (void *DB)** |
| 73 | 1 | Paul Holland | |
| 74 | 1 | Paul Holland | This function closes the DB connection created by DBopen() and frees the data structure. Even if the DB connection closes prematurely, this function must be called to free the memory. |
| 75 | 1 | Paul Holland | |
| 76 | 1 | Paul Holland | |
| 77 | 1 | Paul Holland | **void * DBmove (void *DB)** |
| 78 | 1 | Paul Holland | |
| 79 | 1 | Paul Holland | The DB structure stores the connection and the results from the last SQL command. If you need to created a SQL query from a DB results (e.g., looping over the DB results), then you will need to move the results from the last command to a different handle. This function detaches the results from the DB handle and returns a new handle containing the results. |
| 80 | 1 | Paul Holland | |
| 81 | 1 | Paul Holland | <code>DBnew = DBmove(DBold);</code> |
| 82 | 1 | Paul Holland | |
| 83 | 1 | Paul Holland | There are some warnings when using this function: |
| 84 | 1 | Paul Holland | |
| 85 | 1 | Paul Holland | * The results (DBnew) contains data but not an open DB connection. You cannot use this handle to communicate with the DB but you can use it to access the results from a previous query. |
| 86 | 1 | Paul Holland | * When you are finished with the handle, you must call DBclose(). For example: DBnew = DBmove(DBold); DBclose(DBnew); Closing the data handle does not close the DB connection from the original (DBold) handle. |
| 87 | 1 | Paul Holland | * There is no way to re-attach the data to the old handle. |
| 88 | 1 | Paul Holland | |
| 89 | 1 | Paul Holland | |
| 90 | 1 | Paul Holland | **int DBaccess (void *DB, char *SQL)** |
| 91 | 1 | Paul Holland | |
| 92 | 1 | Paul Holland | This is the main function for the entire API. Given a DB handle created from DBopen() and an SQL query, perform the query. There are a few possible result codes: |
| 93 | 1 | Paul Holland | |
| 94 | 1 | Paul Holland | * 0. This value indicates that the query was successful, but there are no results. This usually happens when the SQL query performs an INSERT or UPDATE. |
| 95 | 1 | Paul Holland | * 1. This value indicates that the query was successful, and there are results. This usually happens when the SQL query performs a SELECT. **NOTE**: The results may be empty -- zero returns rows is different from no returned rows. |
| 96 | 1 | Paul Holland | * -1. The query failed due to a constraint error. The exact constraint is not specified. |
| 97 | 1 | Paul Holland | * -2. The query failed due to an unspecified error. |
| 98 | 1 | Paul Holland | * -3. The query timed out. By default the timeout is set to two minutes. |
| 99 | 1 | Paul Holland | |
| 100 | 1 | Paul Holland | |
| 101 | 1 | Paul Holland | **int DBdatasize (void *DB)** |
| 102 | 1 | Paul Holland | |
| 103 | 1 | Paul Holland | If there is data available, this identifies the number of rows returned. If there is no data (DBaccess() returned 0) or the DB handle is invalid, then -1 is returned. If the DBaccess() returned 1 but there is no data, then 0 is returned. |
| 104 | 1 | Paul Holland | |
| 105 | 1 | Paul Holland | |
| 106 | 1 | Paul Holland | **int DBcolsize (void *DB)** |
| 107 | 1 | Paul Holland | |
| 108 | 1 | Paul Holland | Returns the number of columns in the result. If there is no data (DBaccess() returned 0) or the DB handle is invalid, then -1 is returned. If the DBaccess() returned 1 but there is no data, then the number of columns will still be returned. |
| 109 | 1 | Paul Holland | |
| 110 | 1 | Paul Holland | |
| 111 | 1 | Paul Holland | **char * DBgetcolname (void *DB, int Col)** |
| 112 | 1 | Paul Holland | |
| 113 | 1 | Paul Holland | Returns the name of a specific column, or NULL if the column does not exist. This returns a pointer to a static memory location. The caller must **NOT** modify or free this value. |
| 114 | 1 | Paul Holland | |
| 115 | 1 | Paul Holland | |
| 116 | 1 | Paul Holland | **int DBgetcolnum (void *DB, char *ColName)** |
| 117 | 1 | Paul Holland | |
| 118 | 1 | Paul Holland | Given a string, return the number of the column whose name matches the string. -1 is returned if the column does not exist. |
| 119 | 1 | Paul Holland | |
| 120 | 1 | Paul Holland | |
| 121 | 1 | Paul Holland | **char * DBgetvalue (void *DB, int Row, int Col)** |
| 122 | 1 | Paul Holland | |
| 123 | 1 | Paul Holland | Return the contents of a cell in the results table, or NULL if the column does not exist. This returns a pointer to a static memory location. The caller must **NOT** modify or free this value. |
| 124 | 1 | Paul Holland | |
| 125 | 1 | Paul Holland | |
| 126 | 1 | Paul Holland | **char * DBgetvaluename(DB,Row,ColName)** |
| 127 | 1 | Paul Holland | |
| 128 | 1 | Paul Holland | This macro returns the contents of a cell, but the column is specified by name rather than number. NULL is returned if the row or column does not exist. This returns a pointer to a static memory location. The caller must **NOT** modify or free this value. |
| 129 | 1 | Paul Holland | |
| 130 | 1 | Paul Holland | |
| 131 | 1 | Paul Holland | **int DBisnull (void *DB, int Row, int Col)** |
| 132 | 1 | Paul Holland | |
| 133 | 1 | Paul Holland | The DBgetvalue() function does not distinguish between a non-existant cell and one with a value of NULL. This function determines if the cell is defined and has a value of NULL. It returns 1 if the cell is NULL and 0 if it is non-NULL or undefined. |
| 134 | 1 | Paul Holland | |
| 135 | 1 | Paul Holland | h3. Sample Usage |
| 136 | 1 | Paul Holland | |
| 137 | 1 | Paul Holland | An example use of the libfossdb API: |
| 138 | 1 | Paul Holland | |
| 139 | 1 | Paul Holland | <pre> |
| 140 | 1 | Paul Holland | void * DB; /* DB handle */ |
| 141 | 1 | Paul Holland | int rc; /* return code */ |
| 142 | 1 | Paul Holland | char *SQL; /* the SQL query to run */ |
| 143 | 1 | Paul Holland | int i,j; /* iterators */ |
| 144 | 1 | Paul Holland | |
| 145 | 1 | Paul Holland | DB = DBopen(); |
| 146 | 1 | Paul Holland | if (!DB) { fprintf(stderr,"ERROR: Could not connect to the DB\n"); exit(1); } |
| 147 | 1 | Paul Holland | rc = DBaccess(DB,SQL); |
| 148 | 1 | Paul Holland | switch(rc) |
| 149 | 1 | Paul Holland | { |
| 150 | 1 | Paul Holland | case 0: /* Ok, Got no results. Likely due to an INSERT or UPDATE SQL query */ |
| 151 | 1 | Paul Holland | break; |
| 152 | 1 | Paul Holland | case 1: /* Got results. Likely due to a SELECT query */ |
| 153 | 1 | Paul Holland | /* Display results */ |
| 154 | 1 | Paul Holland | for(i=0; i<DBgetdatasize(DB); i++) /* for each row */ |
| 155 | 1 | Paul Holland | { |
| 156 | 1 | Paul Holland | printf("=====\n"); /* record separator */ |
| 157 | 1 | Paul Holland | for(j=0; j<DBcolsize(DB); j++) /* for each column */ |
| 158 | 1 | Paul Holland | { |
| 159 | 1 | Paul Holland | printf("%s: %s\n",DBgetcolname(DB,j), DBgetvalue(DB,i,j)); |
| 160 | 1 | Paul Holland | } |
| 161 | 1 | Paul Holland | } |
| 162 | 1 | Paul Holland | default: |
| 163 | 1 | Paul Holland | fprintf(stderr,"ERROR running SQL command: %s\n",SQL); |
| 164 | 1 | Paul Holland | } |
| 165 | 1 | Paul Holland | DBclose(DB); |
| 166 | 1 | Paul Holland | </pre> |
| 167 | 1 | Paul Holland | |
| 168 | 1 | Paul Holland | |
| 169 | 1 | Paul Holland | h3. Test Applications |
| 170 | 1 | Paul Holland | |
| 171 | 1 | Paul Holland | The following test applications are provided with the libfossdb library. |
| 172 | 1 | Paul Holland | |
| 173 | 1 | Paul Holland | **dbtest** |
| 174 | 1 | Paul Holland | |
| 175 | 1 | Paul Holland | This stand-alone program runs DBopen(). It reads SQL commands from stdin and calls DBaccess(). All results are displayed to stdout. When EOF is received (stdin is closed), DBclose() is called. |
| 176 | 1 | Paul Holland | |
| 177 | 1 | Paul Holland | This is a very poor replacement for psql. In particlar, psql can handle multiple-line SQL commands and returns very verbose messages. In contrast, the dbtest program is more than adequate for testing the library's functions. |
| 178 | 1 | Paul Holland | |
| 179 | 1 | Paul Holland | |
| 180 | 1 | Paul Holland | **dbq** |
| 181 | 1 | Paul Holland | |
| 182 | 1 | Paul Holland | This program queries the job and jobqueue tables using the same calls that the Scheduler uses. |
| 183 | 1 | Paul Holland | |
| 184 | 1 | Paul Holland | <pre> |
| 185 | 1 | Paul Holland | Usage: ./dbq <command> [args] |
| 186 | 1 | Paul Holland | Commands: |
| 187 | 1 | Paul Holland | list :: list ALL elements in the queue. |
| 188 | 1 | Paul Holland | If args are provided then only list those queue items |
| 189 | 1 | Paul Holland | top :: list top elements in the queue. |
| 190 | 1 | Paul Holland | If args are provided then only list those queue items |
| 191 | 1 | Paul Holland | add :: add a queue item. |
| 192 | 1 | Paul Holland | Args are field=value pairs to be inserted. They should be SQL compliant |
| 193 | 1 | Paul Holland | If no args, then you will be prompted for every value |
| 194 | 1 | Paul Holland | update :: update an existing queue item. |
| 195 | 1 | Paul Holland | 1st arg type of record modify: 'job' or 'jobqueue'. |
| 196 | 1 | Paul Holland | 2nd arg is the ID of the record to modify. |
| 197 | 1 | Paul Holland | Remaining Args are field=value pairs to be modified. |
| 198 | 1 | Paul Holland | They should be SQL compliant |
| 199 | 1 | Paul Holland | If no args, then you will be prompted for every value |
| 200 | 1 | Paul Holland | delete :: remove an existing queue item. |
| 201 | 1 | Paul Holland | Args are JOB IDs of the queue item to delete. |
| 202 | 1 | Paul Holland | This also removes all associated JOBQUEUE records. |
| 203 | 1 | Paul Holland | </pre> |