Install
openclaw skills install sql-profilerAnalyzes SQL queries and EXPLAIN outputs to identify performance issues, explain causes, and provide specific optimization suggestions with examples across m...
openclaw skills install sql-profilerThe sql-profiler skill helps data engineers and developers identify and optimize performance bottlenecks in their SQL queries. It provides in-depth analysis of query plans, suggests specific optimizations, and explains complex database concepts in plain English, supporting various SQL dialects including Databricks SQL, PostgreSQL, Spark SQL, and ANSI SQL.
The sql-profiler skill is invoked with the /sql-profiler command, followed by a subcommand and relevant arguments.
analyze - Analyze a SQL query for performance issuesAnalyzes a given SQL query and identifies potential performance bottlenecks.
Syntax:
/sql-profiler analyze --query "SELECT * FROM my_table WHERE ..." [--dialect postgresql|databricks|sparksql|ansi] [--explain-output "ACTUAL EXPLAIN OUTPUT HERE"]
Arguments:
--query (required): The SQL query string to analyze.--dialect (optional): The SQL dialect. Supported: postgresql, databricks, sparksql, ansi. Defaults to ansi if not specified.--explain-output (optional): The output from EXPLAIN or EXPLAIN ANALYZE for the query. Providing this significantly improves the accuracy and depth of the analysis.Example:
/sql-profiler analyze --query "SELECT customer_name, SUM(order_total) FROM orders GROUP BY customer_name ORDER BY SUM(order_total) DESC LIMIT 10;" --dialect postgresql
Example with EXPLAIN output:
/sql-profiler analyze --query "SELECT * FROM large_table WHERE created_at < '2023-01-01' AND status = 'active';" --dialect databricks --explain-output "== Physical Plan ==
*(1) Project [id#123, created_at#124, status#125]
+- *(1) Filter (isnotnull(created_at#124) AND (created_at#124 < 2023-01-01) AND isnotnull(status#125) AND (status#125 = active))
+- *(1) FileScan csv [id#123, created_at#124, status#125] Batched: false, DataFilters: [isnotnull(created_at#124), (created_at#124 < 2023-01-01), isnotnull(status#125), (status#125 = active)], Format: CSV, Location: InMemoryFileIndex[dbfs:/user/hive/warehouse/large_table], PartitionFilters: [], PushedFilters: [IsNotNull(created_at), LessThan(created_at,2023-01-01), IsNotNull(status), EqualTo(status,active)], ReadSchema: struct<id:string,created_at:timestamp,status:string>
"
Output: The command will return a detailed analysis including:
explain-plan - Interpret EXPLAIN/EXPLAIN ANALYZE outputProvides a human-readable interpretation of a raw EXPLAIN or EXPLAIN ANALYZE output.
Syntax:
/sql-profiler explain-plan --output "RAW EXPLAIN OUTPUT HERE" [--dialect postgresql|databricks|sparksql|ansi]
Arguments:
--output (required): The full text output from an EXPLAIN or EXPLAIN ANALYZE command.--dialect (optional): The SQL dialect the EXPLAIN output belongs to. Defaults to ansi.Example:
/sql-profiler explain-plan --output "Aggregate (cost=250.75..250.76 rows=1 width=36) (actual time=0.089..0.089 rows=1 loops=1)
-> Sort (cost=250.75..250.76 rows=1 width=36) (actual time=0.088..0.088 rows=1 loops=1)
Sort Key: (sum(orders.order_total)) DESC
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=250.72..250.73 rows=1 width=36) (actual time=0.082..0.082 rows=1 loops=1)
Group Key: orders.customer_name
Batches: 1 Memory Usage: 24kB
-> Seq Scan on orders (cost=0.00..200.00 rows=10000 width=16) (actual time=0.003..0.024 rows=10000 loops=1)
" --dialect postgresql
Output: A plain English breakdown of the query plan, highlighting:
optimize - Get specific optimization suggestions for a queryDirectly asks for optimization suggestions for a query without full analysis, assuming common issues.
Syntax:
/sql-profiler optimize --query "SELECT * FROM customers WHERE region = 'EMEA';" [--dialect postgresql|databricks|sparksql|ansi]
Arguments:
--query (required): The SQL query to optimize.--dialect (optional): The SQL dialect. Defaults to ansi.Example:
/sql-profiler optimize --query "SELECT o.order_id, c.customer_name FROM orders o JOIN customers c ON o.customer_id = c.customer_id WHERE c.registration_date < '2022-01-01';" --dialect sparksql
Output:
Problem: When a database needs to find specific rows based on conditions in WHERE clauses, JOIN conditions, or ORDER BY clauses, it often performs a "full table scan" (reads every row). This is slow for large tables.
Solution: Creating an index on the columns used in these conditions allows the database to quickly jump to the relevant rows, similar to using an index in a book.
Impact: Can dramatically reduce query execution time, especially for large tables and selective queries.
Problem: Reading every single row in a table to find a small subset of data. This is inefficient. Solution: Often solved by adding appropriate indexes, partitioning large tables, or rewriting queries to filter earlier. Impact: Avoids unnecessary I/O and CPU usage, leading to faster queries.
Problem: Occurs when an application executes N additional queries for each result of an initial query. For example, fetching a list of users, then for each user, fetching their associated orders in separate queries.
Solution: Use JOIN operations, IN clauses, or subqueries to fetch all related data in a single, more efficient query. In some ORMs, "eager loading" helps.
Impact: Reduces the number of round trips to the database, significantly speeding up data retrieval.
Problem:
ON clause).EXPLAIN to see the join order and adjust if necessary (e.g., using hints, though generally not recommended unless absolutely needed).
Impact: Correct joins prevent performance disasters and ensure data accuracy. Optimized join order can significantly reduce intermediate result set sizes.Databricks SQL and Spark SQL operate on a distributed architecture. Optimizations often involve:
BROADCAST hints for small tables.JOIN strategies, GROUP BY, and ORDER BY clauses is crucial.OPTIMIZE ... ZORDER BY helps colocate related data in the same set of files, reducing the amount of data read for queries with high-cardinality columns.PostgreSQL is a powerful relational database. Key optimizations include:
VACUUM and ANALYZE: Regularly running these commands helps the query planner make accurate decisions and reclaims space from dead tuples.WITH clauses (CTEs): Can improve readability but sometimes prevent the optimizer from pushing down predicates, leading to materialized CTEs that are less efficient.SELECT *.WHERE clauses to reduce data before joins or aggregations.WHERE clauses on indexed columns: WHERE YEAR(date_column) = 2023 prevents index usage. Instead, WHERE date_column BETWEEN '2023-01-01' AND '2023-12-31'.UNION vs UNION ALL: UNION implies a distinct sort, UNION ALL does not. Use UNION ALL if duplicates are acceptable and performance is critical.JOINs are more performant than correlated subqueries.LIMIT with ORDER BY: If you only need a few rows, combine LIMIT with an ORDER BY to return the most relevant data efficiently.The sql-profiler skill provides AI-driven suggestions. While highly effective, it's crucial to:
This skill is a powerful assistant, not a replacement for human expertise and rigorous testing.
Found a bug or have a suggestion? Open an issue or submit a pull request on the ClawHub repository.
MIT License.