AWS Database Blog
Identify PostgreSQL semantic issues with the plpgsql_check extension
One critical aspect during the migration process from Oracle to PostgreSQL is identifying any semantic issues that may arise when translating Oracle-specific code to PostgreSQL. Although Oracle performs both semantic and syntax checks during code compilation, PostgreSQL focuses solely on syntax checks during code compilation, leaving room for potential semantic discrepancies in the converted code base. This could result in issues during functional testing and application runtime. To address this issue, database developers working with PL/pgSQL code in PostgreSQL can use the plpgsql_check extension.
In this post, we explain how you can use the plpgsql_check
extension in PostgreSQL to catch potential semantic errors and elevate your overall code quality.
Solution overview
The plpgsql_check
extension provides a set of tools to perform semantic checks on your PL/pgSQL code. plpgsql_check
is most useful for developers who are working with large or complex SQL code bases, because it can help identify and resolve issues early in the development cycle. This extension helps in identifying a wide range of issues, including syntax errors, undefined variables, unused variables, and other potential bugs. It provides a detailed list of all identified issues, along with relevant information about the line of code where the issue was detected and a clear description of the problem.
The plpgsql_check
extension has rich set of usage patterns, the details of which can be found in the GitHub repository. This post dives deep into the following usage patterns:
- Functions and procedures for semantic checks in PostgreSQL
- Triggers for semantic checks in PostgreSQL
- Dependency list
- Performance warnings
Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL support many extensions for the PostgreSQL database engine. For more details, refer to Extension versions for Amazon RDS for PostgreSQL and Extension versions for Amazon Aurora PostgreSQL.
However, this extension is not yet supported by Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition. For this post, we use standard PostgreSQL installed on an Amazon Elastic Compute Cloud (Amazon EC2) machine and explain how plpgsql_check
can help developers identify and correct errors before running the code. If you’re migrating to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL, you can run a pg_dump operation and employ pg_restore to transfer the metadata to the EC2 instance.
Prerequisites
To get started with the solution described in this post, you should have the following prerequisites:
- An active AWS account
- A PostgreSQL database installed on Amazon EC2
- The
plpgsql_check
extension installed as per the instructions in the GitHub repository
Challenges with semantic checks in PostgreSQL
Although Oracle performs both semantic and syntax checks during PL/SQL code compilation, PostgreSQL focuses solely on syntax checks. This behavior can lead to the following unforeseen semantic issues during functional testing when migrating from Oracle to PostgreSQL:
- Semantic issues exist within the code and don’t arise during the compilation, but are reported when you run the PL/pgSQL code
- When there are multiple semantic issues, not all issues are reported when you run the PL/pgSQL code
- Semantic issues might not get detected even while running the function because the error conditions may not arise
Let’s understand these issues through the following examples.
Example 1
The following example uses the PL/pgSQL function calculate_area
, which calculates the area of a square where users supply the length through the function parameter:
We get the following output:
Clearly, the function has two bugs:
- The
DUAL
table does not exist in PostgreSQL by default - The variable
p_width
is not declared within the function scope but it’s getting accessed
These types of semantic issues don’t arise during the compilation of the function in PostgreSQL. However, they may appear when the function is run. Therefore, to identify and address these semantic issues within your PL/pgSQL code, it requires additional efforts to run the function and iteratively fix errors one by one:
Sometimes, you might not detect the bugs even while running the function, because the error conditions may not arise. To better understand this, let’s consider an example.
Create a table t1
with two columns, a
and b
:
We get the following output:
Let’s create a function that loops through the table data and does certain processing:
We get the following output:
Now run the function:
This example illustrates how manually converting and testing functions could increase the likelihood of errors, potentially reducing the overall code quality.
Common semantic issues that plpgsql_check can identify
The plpgsql_check
extension can identify various semantic issues in the PL/pgSQL code that may cause problems, such as the following:
- Undefined variables – If you reference a variable that hasn’t been declared or is out of scope,
plpgsql_check
will flag it as an error. - Unused variables – It can highlight variables that are declared but never used within the function body.
- Type mismatches – It helps identify type mismatches that might lead to runtime errors. For example, calling a procedure with a mismatch between parameter types. The function return type mismatches between the declared variable and the expected type.
- Control flow problems – It identifies potential logic errors in conditional statements, loops, or branches:
- Incorrect use of transactions, such as mismatches between
BEGIN
andCOMMIT
statements. - Incorrect use of exceptions, especially when there is no exception block for a raised exception within the same block.
- Incorrect use of transactions, such as mismatches between
- Incorrect function calls – It detects when a function is called with the wrong number or type of arguments or when a dependent object doesn’t exist in the database.
- Trigger-related issues – For PL/pgSQL triggers, it helps find problems related to
OLD
/NEW
record usage, trigger control, and data type mismatches. - Incorrect use of SQL statements – It identifies issues like
SELECT
statement andINTO
clause mismatches in terms of column counts. - Incorrect use of subqueries – It detects improper usage of subqueries, such as missing or extra columns in subqueries or missing aliases.
- Incorrect use of cursors – It identifies incorrect cursor usage, like failing to open or close cursors properly.
- Incorrect use of arrays – It detects issues with array usage, such as data type mismatches between array elements and table columns.
Important semantic check functions
The plpgsql_check
extension provides a set of functions that allow you to check the syntax and semantics of PL/pgSQL code. The following are the main functions:
- plpgsql_check_function – Checks the syntax and semantics of a specific PL/pgSQL function and returns the result as text
- plpgsql_check_function_tb – Returns a human-readable tabular result of the check performed by
plpgsql_check_function
Let’s dive into these functions and illustrate their usage through examples.
Create the following table to understand how the two functions work with a few PL/pgSQL code examples:
In this example, we intentionally include two errors:
- A wrong reference to the identifier
r.student_name
—there is no such columnstudent_name
inbigtable
- The
RETURN
statement is missing
Although you have two semantic issues, the preceding function code will still be compiled with no errors. However, you may encounter runtime errors while running the function.
In your first run, pass an input to the function with value 1000
:
The command fails with an error: the control reached the end of the function without RETURN
. This is because the value 1000
exists in bigtable
and the row is processed but could not find the RETURN
keyword to return the output value.
In a second run, change the input parameter to 1001
:
The command fails with a different error as following: record r has no field called student_name
.
Now let’s understand how you can reveal the bugs through plpgsql_check_function
and plpgsql_check_function_tb
without even needing to run the code.
Semantic checks using plpgsql_check_function
This function analyzes a single PL/pgSQL function. It takes a function name or function OID as an argument and returns a table of issues found, with columns such as lineno
, position
, and message
to identify the location and nature of each issue.
Check the syntax and semantics of the function discussed in the previous section using plpgsql_check_function
:
In this case, the error message points to the line where the error occurred. However, it’s noteworthy that only one error is retrieved by default unless optional arguments are configured to obtain all errors (we discuss this in more detail in subsequent examples).
Semantic checks using plpgsql_check_function_tb
The plpgsql_check_function_tb
function closely resembles plpgsql_check_function
, except that it returns results in a tabular format with details like the line number of the function where the issue is, the error message, and other helpful information. This makes it straightforward for developers to understand errors and fix them quickly. The function is particularly useful for creating reports that list issues and provide a lot of information for careful examination and solving problems. See the following example code:
We get the following output.
Semantic checks for triggers in PostgreSQL
Triggers within PostgreSQL are linked to a designated table, view, or foreign table, as well as to a function named your_trigger_function_name
, which runs specific business operations outlined within it. The trigger function must be defined prior to creating the trigger itself. It must be declared as a function that takes no arguments and returning type trigger
. This function is run each time the trigger is activated by any triggering event.
Challenges for semantic checks for triggers
Let’s create a sample table, a trigger, and the trigger function to understand how you can check semantic issues within a trigger function:
Create a trigger function that inserts a record in the audit_employees_deleted
table for every employee deleted from the employees
table:
We get the following output:
In the preceding trigger function, there are two bugs:
- The
RETURN
keyword is missing DELETE
triggers can’t refer toNEW
However, the compilation is successful, and because trigger functions are a special type of functions, you can’t test simply by calling them.
Therefore, you need to create the trigger that calls the trigger function when it’s activated. Let’s create a trigger that is activated when you delete any record from the employees
table and calls the trg_fnc_delete_employee()
function to perform an insert operation in the audit_employees_deleted
table:
We get the following output:
Test manually to see the semantic issue
As explained previously, triggers are activated when the triggering event occurs. In the following example, the triggering event is DELETE
, to delete an employee from the employees
table:
Semantic checks for triggers using plpgsql_check_function_tb
You can also use the plpgsql_check
extension to check if a specified trigger is syntactically and semantically bug free. Because a trigger function and the trigger are tightly coupled and associated with a table, you need to specify the table name along with the trigger function name to plpgsql_check_function
or plpgsql_check_function_tb
to find the semantic issues. See the following code:
We get the following output.
See the following code for correct trigger semantic checking:
We get the following output.
Find semantic errors and warnings using optional arguments
Both plpgsql_check_function
and plpgsql_check_function_tb
don’t reveal all semantic errors by default. You need to pass some optional arguments to get the full list of semantic errors within your PL/pgSQL code. Let’s explore some commonly used optional arguments. You can get the full list of optional arguments supported by these two functions in the GitHub repo.
fatal_errors
In the previous examples, only one error is reported by plpgsql_check_function
and plpgsql_check_function_tb
. To retrieve all errors in a PL/pgSQL function, you can use the fatal_errors
optional parameter, which is of the BOOLEAN
type. The default value of fatal_errors
is TRUE
, indicating that the function will stop on the first error to prevent massive error reports.
The syntax is as follows:
fatal_errors boolean DEFAULT true
Let’s test the plpgsql_check_function_tb
function with the optional argument:
We get the following output:
With the fatal_errors
optional argument and plpgsql_check_function_tb
function, developers can generate all unknown bugs within the code and fix all of them in the development phase, thereby increasing code quality. This also helps error propagation to the next phase like functionality testing or integration testing.
Let’s fix all the issues reported within the function and compile once again to check if the plpgsql_check_function_tb
function reports any other bugs:
We get the following output:
After the fixed function is compiled, you can check it again to see that there are no additional errors:
We get the following output:
There are no more code semantic issues within the function. However, there could be some performance improvement areas that developers might overlook while migrating code from Oracle or writing any PL/pgSQL functions or procedures in PostgreSQL. The plpgsql_check
extension can also help you identify some performance improvement indicators. Let’s explore how you can get performance improvement recommendations from plpgsql_check
.
performance_warnings
This optional argument aids in identifying performance-related warnings, such as data type casting issues and implicit casts in the WHERE
clause, which can be the reason why indexes are not used and many other performance improvement indicators.
The syntax is as follows:
performance_warnings boolean DEFAULT false
Let’s explore the functionality of the performance_warnings
optional argument to identify some potential performance bottlenecks and recommendations generated through the plpgsql_check_function_tb
function:
We get the following output:
From the preceding screenshot, you can see the performance improvement recommendations:
- Type casting can be dangerous sometimes when fetching data from the table. This is because the table data fetch could go to a sequence scan instead of an index scan when the table column data type doesn’t match the data provided. In our code, there is the statement
SELECT * FROM bigtable WHERE id = _id
, where the table id column data type isinteger
and the value passed from the argument isnumeric
type. Because of this, this step could be costly because the index scan is skipped. - PostgreSQL functions by default are created with the volatility classification set as
VOLATILE
unless otherwise specified. This is an important attribute because it can greatly increase performance when used correctly. For more information, refer to Function Volatility Categories. Because this function performs only read operations in the form ofSELECT
statements, it can be marked asSTABLE
to have better performance.
Let’s fix all performance improvement recommendations and test the function again:
We get the following output:
After the fixed function is compiled, you can check it again to see that there are no additional errors:
We get the following output:
Output formats
The function plpgsql_check_function
has three possible output formats: text
, json
, or xml
. You can always transform the output format based on your application needs, as shown in the following example, where we format the output as a JSON
document:
These are some of the commonly used optional arguments. For the full list of optional arguments supported by plpgsql_check_function
and plpgsql_check_function_tb
, refer to Optional arguments.
Additional utility queries for semantic checks on all code objects
In this section, we present additional utility queries designed to facilitate semantic checks on all code objects within a PostgreSQL database. The following queries use the plpgsql_check_function
function to comprehensively verify non-trigger PL/pgSQL functions, trigger PL/pgSQL functions, and all PL/pgSQL functions and procedures.
Check all non-trigger plpgsql user created functions:
Check all trigger plpgsql functions:
Check all plpgsql functions (functions or trigger functions with defined triggers):
Limitations
When working with static code, plpgsql_check
is expected to catch practically all issues. However, as with any tool, it has certain limitations:
- False positives and negatives – Like any static analysis tool,
plpgsql_check
may generate false positives (flagging code as problematic when it’s not) or false negatives (not catching certain issues). - Dynamic SQL –
plpgsql_check
performs static analysis of PL/pgSQL code, meaning that it examines the code without actually running it. Code blocks that use dynamic SQL can be challenging to analyze without running them.plpgsql_check
may not fully understand the dynamic SQL and might not be able to perform accurate analysis in such cases. - Schema changes – The extension checks the code within the context of the current schema. If there are dependencies on other schema database objects,
plpgsql_check
might not catch them accurately. - Temporary tables –
plpgsql_check
can’t analyze queries in conjunction with temporary tables that are created in the plpgsql function during runtime. - Version compatibility – As PostgreSQL evolves, new language features and syntax may be introduced, and
plpgsql_check
might not always support the latest features. - Refcursors –
plpgsql_check
can’t be used to detect the structure of referenced cursors because it’s dynamic in nature.
Conclusion
In this post, we described the use of the plpgsql_check
extension to identify common mistakes in the early stages of the development process, leading to more reliable and efficient code. Despite its limitations, plpgsql_check
remains an essential tool for pinpointing potential issues in PL/pgSQL code. This approach is recommended for developers who are working on large or complex SQL code bases, and offers the following benefits:
- Catching errors early and saving time – The
plpgsql_check
extension helps developers detect errors early in the development process, thereby reducing the time and effort needed for debugging and fixing problems - Improving Code Quality – By examining both the syntax and semantics of the PL/pgSQL code, the
plpgsql_check
extension helps developers write better code that is more reliable and semantically bug-free (considering the limitations listed in the previous section)
Feel free to leave a comment if you have any questions.
About the Authors
Sashikanta Pattanayak works as a Lead Consultant with the Professional Services team at AWS, specializing in database modernization, optimization, and migration strategies. His role involves collaborating with clients to architect scalable, resilient, and secure solutions within the AWS Cloud environment. He specializes in homogeneous and heterogeneous database migrations.
Bikash Chandra Rout is a Lead Database Consultant with the Professional Services team at Amazon Web Services. Bikash focuses on helping customers build highly available, cost-effective database solutions and migrate their commercial engines to the AWS Cloud. He is passionate about databases and analytics.
Jitender Kumar is a Senior Lead Consultant with the Professional Services team at Amazon Web Services. He has worked on multiple databases as DBA and is skilled in SQL performance tuning and database migrations. He focuses on database migration to AWS and works with customers from assessment to implementation.
Prashant Borse is a Lead Consultant with the Professional Services team at Amazon Web Services. He has worked on multiple migration projects and is skilled in SQL performance tuning and database optimizations.