Sunday, June 1, 2014

Manually Design Aggregation in SSAS Cubes

As one of the first questions when designing a cube is to ask the management and users what reports they are looking for, and they are expecting them to run fast as any BI solutions is marketed to be; so one of the very essential steps in building your cube is to design aggregations. You probably start by running the Aggregation Design Wizard which will look to the relationships between dimensions attributes, natural user hierarchies, and some attribute properties then build some aggregations based on that, yet the Aggregation Design Wizard does not know what queries the users will run, as a result you will end up with some aggregations that will be useful when using a natural user hierarchies, in addition to some other aggregations on the level of the most granular level of each dimension and some other attributes levels, but not a really satisfying aggregations that will be hit in those heavy coasty reports. Then you start thinking about Usage-Based Optimization, but you don’t have usage log yet, and you can’t ask the users to start generating reports and running queries because you don’t want them to have a bad first impression about this promised to be the fast reporting solution, especially when you know in advance the main reports they are looking for or at least some of them.
Before starting manually designing aggregation remember that aggregations are pre-calculations of measures on an attribute OR COMBINATION OF ATTRIBUTES level of granularity, for example if someone wanted to know the total number of ALL the acts (like fouls, goals, saves, tackles… etc.) done by a soccer player, so instead of doing the counting of those acts in the time of executing the query, which probably takes time to complete, aggregation in this case is a pre-counting of the ALL acts for each player which needs just to be fetched from the hard drive or sometimes even the cache. Now if another report is to show the total number of a certain act, say fouls, done by a certain player, you can design an aggregate to accommodate the counting of each act done by each player, this is what will be illustrated in this post. Note that, in the last case, having two aggregations, one on the level of the player and another on the level of the act will not serve the previous request, it has to be one aggregation on the level of the combination of the players and the acts (Remember that to find if the aggregation was used or not, you have to run SQL Profiler and add “Get Data From Aggregation”, if it appears when running a query the an aggregation was used).
Using the last soccer cube example, and to test manually design aggregations:
  1. In BIDS, under your cube, open the Aggregations tab.
  2. Click the “Advance view” icon in the Aggregations tab tool bar.
  3. From the “Measure Group” choose the measure group you want to design aggregations for.
  4. Click “New Aggregation Design” again in the Aggregation tab tool bar. A pop up window will appear, there you can name the aggregation … in this example I’ll name it “Manual Aggr”.
  5. Pressing OK will list all the dimensions. Expanding any of them will list its attributes.
  6. Click “New Aggregation” from the tool bar on the aggregation bar. This will add new columns of unchecked check boxes, titled A and the sequential number (so if it is the first column it will be A0).
  7. In the first example I will choose just [Actor] attribute from the [D Actor Player]:

  8. In the Aggregation tab tool bar, click “Assign Aggregation Design”. In the popup window make sure that the “Manual Aggr” selected from the Aggregation designs drop down menu. Check the partition you want to assign the aggregation to. Then click OK.
  9. Go back to the “Standard View” in the Aggregation tab. Now you have to get and fill the “Estimated Partition Size” property, and you can find this in the Partition tab.
  10. Reprocess the partition.
  11. To check if the Manual Aggr is being used, open the profiler and connect to the SSAS. Add “Get Data From Aggregation” event to the traced events. Start the trace.
  12. First I will run this very simple query which will get every player with the total number of all the acts he did:
    select [Measures].[F Events Count] on 0,
    [D Actor Player].[Actor].children on 1
    from [FBA Cube]
  13. Going back to the Profiler we will find that the aggregation has been used:
  14. If we change the query to get just how many fouls committed by each player (clear the cache first, and clear the trace window if you like!); so:
    select [Measures].[F Events Count] on 0,
    [D Actor Player].[Actor].children on 1
    from [FBA Cube]
    where [D Act].[Act].[Foul]
  15. This time we will not find “Get Data From Aggregation” in the trace:

    and this is expected because the query is in a grain lower than the only aggregation that we have.
  16. You may think to ADD ANOTHER aggregation on just the granularity level of the [Act] attribute from the [D Act] dimension, so you will have:
  17. After reprocessing the cube, try to rerun the last query, and again you will have no “Get Data From Aggregation” in the trace, as both of the aggregations have higher granularity than the query.
  18. What we have to do is to design an aggregation on the same granularity level as the query (or lower):
  19. Now if you rerun the last query you will find “Get Data From Aggregation” in the trace.

Bottom line, for the aggregation to be used by the Analysis Services Storage Engine to answer a query, the aggregation granularity level has to be less than or equal the granularity level of the query, otherwise it will not be used. Worth mentioning that the closer the aggregation granularity level to the query granularity level the faster and better performance, as the Storage Engine will have to do less work and calculation to aggregate and roll up the available aggregation to the requested level of granularity and answer the Formula Engine which in turn answers the query.

No comments:

Post a Comment