When you first look at an execution plan, it seems like a wall of text: Creating and reading an execution planĬreating an execution plan is easy: prefix your query with “explain”: As a practical matter, I don’t think that more fields would change anything about this article: one of the Redshift’s strengths is its use of columnar data storage, which means that Redshift ill ignore any fields that aren’t used in a query. In the real world, there would be more fields in each table, as in this Segment example, which includes extensive context for a page view. It’s sorted on the timestamp column, which is a good overall practice, since most business queries are based on time. When loaded into Redshift, the data is distributed based on the userid column, because that’s used to join the tables together. In addition, each table has a unique identifier for the event and a column that contains the event name (this last being a “leaky abstraction” from the source data, but one that I’ve seen numerous times). It has a timestamp and a user identifier, along with a count of the number of items in the cart and their total value. CHECKOUT_COMPLETE contains events when the user finishes the checkout pipeline.It’s the same information as in the product page, along with the quantity added. ADD_TO_CART contains events generated when the user clicks an “Add to Cart” button.PRODUCT_PAGE contains events generated by looking at a product: timestamp, user identifier, and product identifier.There are three tables that make up this dataset, representing three different actions that the user might take: ![]() If you want to try this out, you can find the data generator here, along with CloudFormation templates to deploy a Redshift database, and instructions for loading the data into it. To do this, I use the simulated “clickstream” data that I’ve used for my last few posts. In this post I walk through several execution plans, explain what Redshift is doing in each, and highlight the parts of plans that indicate problems. Execution plans are one of the primary tools to optimize your database queries, but they can be daunting to read and understand.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |