SQL queries

I had an opportunity to work on a specific screen that dealt with displaying the sales history (or payment information) associated with a customer. The issue with the screen was

  • It dealt with half million data
  • The response time was about 4 to 10 sec

After applying the explain plan on the query,

  • A full table scan as there were no proper indexes used
  • Higher number of rows were processed that will have high I/O
  • The CPU Cost is higher due to higher I/O operations
We analyzed the explain plan and provided the below fixes for the query. After providing the fix the response time showed an improvement by 80%.
  1. 1. Date Conversion Problems

e.g., In the filter criteria of the query a data function was applied to the table column.

DATE_FORMAT(CONVERT_TZ(fetchInfo.createdTime, '+00:00', '+5:30'), '%Y-%m-%d') = CURRENT_DATE()

How it behaves:
In the above statement, for every record in the database, the DATE_FORMAT function will be executed. This increase the time taken for I/O opeations and will consume more temporary space

Instead of applying the data function to the records in the database, its less cost effective to use it in the filter criteria value as shown below:

fetchInfo.createdTime between date_format(CONVERT_TZ(current_date(), '+00:00', '-5:30'),
'%Y-%m-%d %H:%i:%s') and date_format(CONVERT_TZ(now(), '+00:00', '-5:30'), '%Y-%m-%d %H:%i:%s')
  1. 2. Unnecessary joins

Use joins only whenever necessary. For e.g., in the query for this page, there was a need to fire two queries. One for pagination and another for fetching the data. For fetching the data, we would need inner joins for displaying additional information. However for showing the total record count on the page, there is no necessity to apply the entire filter criteria.

  1. 3. Indexing Issue

The columns that are used as part of filter criteria were not indexed. This led to a full table scan. By creating indexes, the full table scan was avoided and overall processing time has been decreased.

Additional Info:

  • There will be cases where in the column would have been indexed but SQL is not considering the index created for the column. In such a scenario, we can force usage of index using hints. Many a times, we will use NULL as part of filter criteria.
  • Applying build in function (like to_date, to_char) will invalidate the usage of index. e.g. WHERE to_char(created_date,’YYYY-MM-DD’) = ‘2014-06-07’. To effectively use index for this scenario, you can a function bases index for that column say Create index idx_created_date on product(trunc(created_date).
  • If NULL values are frequently used as part of filtering, then adding indexing for NULL values in columns would help improve the performance.
  1. 4. Usage of IN operator in Sub-query


WHERE advanceto_.payment_id in (select payment_id  from bill_payment payment0_ where paymento_.bill_id = '281480')

How it behaves:

IN keyword compare all value in the corresponding sub query column and the SQL engine will scan all the records fetched from the inner query.

When to use:

IN Operator is not always performance intensive. It should be prefered when

  1. The sub-query is static i.e., when you have static table here the number of reocrds will not significantly increase.
  2. When the sub-query result is very small
  3. When most of the filter is applied to the sub-query

Alternatives :

  1. JOINS – A join is a class of subqueries that always have corresponding equivalent query written. Subquery will be executing two select statements while the join will only be selecting one select statement, so by this attribute a join statement will always be faster. For the scenario that we are referring to, we do not need a sub-query to achieve the solution
  2. EXISTS – This will just return true or false depending on whether the sub-query returned any results or not. While using EXISTS, we need to use ‘select 1′ in the sub-query. if we are using EXISTS, the SQL engine will stop the scanning process as soon as it found a match. When your inner query fetching thousand of rows then EXIST would be better choice. Sometimes usage of MINUS in case of EXISTS will also yield better performance.

Some best practices while writing queries

  1. Use alias names for referencing columns.
  2. Avoid using Having, NOT IN, IN and LIKE
  3. Avoid using in-built function on indexed column
  4. At times Stored procedures yield better performances than SQL queries
  5. Sometimes it less costly to do sorting programmatically as compared to queries

Hope this information would help you all !!!