Monday, June 16, 2014

SQL Server 2008 R2 Analysis Services Performance Guide Summary - Optimizing MDX

  • Before beginning optimization, you need reproducible cold-cache baseline measurements. To this you have to clear the FE (Formula engine) cache and the SE (Storage Engine) cache by running:
    <ClearCache xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
      <Object>
        <DatabaseID>Butmah Cube 4</DatabaseID>
      </Object>
    </ClearCache>

    Also you have to clear the file system cache by using either Fsutil.exe, RAMMap (careful when using Fsutil or RAMMap, both cause disruption to service), or Analysis Services Stored Procedure Project (CodePlex).
  • Execute the query you want to optimize and then use SQL Server Profiler with Query Subcube Verbose and Get Data From Aggregation events. The text in Query Subcube Verbose contains information for each attribute in every dimension:
    • 0: Indicates that the attribute is not included in query (the All member is hit).
    • * : Indicates that every member of the attribute was requested.
    • + : Indicates that two or more members of the attribute were requested.
    • - : Indicates that a slice below granularity is requested.
    • <integer value> : Indicates that a single member of the attribute was hit. The integer represents the member’s data ID.
  • Isolate the cause of the performance issue, then look if the problem lies in the FE or in the SE, by finding the amount of time the SE spends scanning data, so look for the "Duration" spend by event subclass = 22, if this is the case, consider optimizing dimension design, designing aggregations, or using partitions to improve query performance. In addition, you may want to consider optimizing the disk subsystem. If the majority of time is not spent in the storage engine but in the query processor, focus on optimizing the MDX script or the query itself. Note, the problem can involve both the formula and storage engines. A “fragmented query space” can be diagnosed with SQL Server Profiler if you see many query subcube events generated by a single query. Each request may not take long, but the sum of them may. If this is the case, consider warming the cache to make sure subcubes and calculations are already cached. Also, consider rewriting the query to remove arbitrary shapes, because arbitrary subcubes cannot be cached. If the cube and MDX query are already fully optimized, you may consider doing thread, memory, and configuration tuning of the cube. You may even want to look at larger hardware.
  • Almost always, performance obtained by using subspace (or block computation) mode is superior to that obtained by using cell-by-cell (nor naïve) mode. Some of the most common reasons that cause NOT to use subspace mode are: Set aliases, Late binding in functions (A query is late-bound if an argument can be evaluated only in context, like LinkMember, StrToSet, StrToMember, and StrToValue), User-defined stored procedures, LookupCube, and cell level security.
  • The Analysis Services engine is very efficient at using sparsity of the data to improve performance. Adding calculations with nonempty values replacing empty values does not allow Analysis Services to eliminate these rows.
  • When you write expressions as products of two other expressions, place the sparser one on the left-hand side. Recall, an expression is sparse if there are few non-null values compared to the total number of cells.
  • One of the most common errors in MDX scripting is using IIf when the condition depends on cell coordinates (e.g. [Date].[Calendar].CurrentMember.Level) instead of values. If the condition depends on cell coordinates, use scopes and assignments (as described earlier).
  • The IIf function takes three arguments:
    IIf(<condition>, <then branch>, <else branch>)
    Where the condition evaluates to true, the value from the then branch is used; otherwise the else branch expression is used. Note the term "used", one or both branches may be evaluated even if the value is not used. It may be cheaper for the engine to evaluate the expression over the entire space and use it when needed (called eager plan), than it would be to chop up the space into a potentially enormous number of fragments and evaluate only where needed (termed strict plan).... read more about when and how to force eager in page 53.
  • Partial expressions (those that are part of a calculated member or assignment) and cell properties are not cached. So if an expensive subexpression is used more than once, consider creating a separate calculated member, then use that new calculated member in the original one, this allows the query processor to cache and reuse... page 56.
Reference:  SQL Server 2008 R2 Analysis Services Performance Guide

No comments:

Post a Comment