Using More Perl in PostgreSQL
Pages: 1, 2
Handing Data Back to the Database
On the other side of the coin, you can also now return records to PostgreSQL, either singly or in sets. The case of a set of records is probably the most interesting and useful. The simple way to return a set of records is to hand them back to PostgreSQL via the return statement, as an arrayref of hashrefs, with the column names as keys of the hashes, one hash per row. Notice that this is exactly the way that PostgreSQL hands rowsets to PL/Perl via spi_exec_query(). In fact, you could take the rows member of the result from spi_exec_query() and just return it, as long as it had the right keys in each hash.
Any missing field in a return value will become a NULL, as will any field whose value is undef.
You can also hand back a single record to PostgreSQL as a single hashref. PostgreSQL 8.1 allows the use of arrayrefs too.
How is this useful? Consider a task that is not easy in SQL but fairly simple using PL/Perl. Suppose that you have a table of exam scores. The table contains the name of the person with the score, the class he or she is in, and the actual score. The sample script contains some necessary type definitions as well as a PL/Perl function to populate the table with some test data. The goal is to augment that information with the ranking of the people, but only within their class. The function get_ranks() in the sample script will do that. Notice that the SQL orders the results by class and then descending score, in order to do control break processing.
In general, you need to use a PL function for any processing that needs to keep track of state, as this does. This example keeps track of the class so that it can reset the rank counter when it changes. The processing simply gets each row, looks to see whether there's a change in class and resets the counter if there is, stores the rank in the row, and stores the row on an arrayref to return at the end of processing. The ordering breaks ties in score (that's why it sorts by name), but implementing a tie-keeping mechanism would be a fairly simple alteration.
You can use a function like this one that returns a set of records as a table expression in the FROM clause of a SELECT query, just as you would a function returning a single value in the SELECT or WHERE clauses. The result looks like:
andrew=# select * from get_ranks();
name | class | score | class_rank
-------+--------+-------+------------
Ellen | green | 91 | 1
Anne | green | 37 | 2
Bill | green | 30 | 3
May | green | 23 | 4
Joe | green | 11 | 5
Fred | green | 1 | 6
Joe | red | 93 | 1
Fred | red | 69 | 2
Bill | red | 53 | 3
May | red | 51 | 4
Anne | red | 45 | 5
Ellen | red | 14 | 6
Anne | yellow | 88 | 1
Ellen | yellow | 68 | 2
May | yellow | 25 | 3
Joe | yellow | 21 | 4
Bill | yellow | 11 | 5
Fred | yellow | 8 | 6
(18 rows)
Memory Problems with Lots of Data
Fetching data into PL/Perl using the above method requires storing the whole resultset in Perl's main memory. Similarly, if you just return an array of hashrefs, PL/Perl first creates these in Perl's memory all at once, before handing them off to PostgreSQL. Because databases are often about handling huge amounts of data, which can frequently exceed the size of available memory, this is not a good idea. In version 8.0 there's no real solution--you just have to make sure you don't SELECT or return huge data sets that will use lots of memory. Version 8.1 of PostgreSQL provides some much better ways of doing things that do not suffer from the memory bloat problem. The credit for this goes to Abhijit Menon-Sen, who has done some great work on Perl and PostgreSQL.
The first tip is that when fetching a large data set, you should use spi_query() and spi_fetchrow() instead of spi_exec_query(). spi_query() returns a handle to use as the parameter to spi_fetchrow(), which returns one row at a time, or undef when there are no more rows. So you never need to get all the rows into Perl memory at one time. Just fetch one row at a time in a loop.
To return values, use a similar mechanism--in fact, to return large amounts of data your function should never return it directly at all--the return value of the function should be undef! Instead, use a new callback function called return_next(), which you give a single return value.
The function get_ranks_revised() in the sample script uses the these new mechanisms to produce the same results as above. Transforming the function was quite trivial, and the new version not only uses less memory but runs faster too.
Next Article
The final article in this series will examine sharing data between PL/Perl functions, turning on strict mode, Unicode string handling, and the road map for further enhancements to PL/Perl.
Andrew Dunstan works for a small consulting and software company in the Triangle area of North Carolina, and contributes to PostgreSQL as an enthusiastic hobbyist as well as a sometime professional user.
Return to the Databases DevCenter
You must be logged in to the O'Reilly Network to post a talkback.



