As one of the first questions when designing a
cube is to ask the management and users what reports they are looking for, and
they are expecting them to run fast as any BI solutions is marketed to be; so
one of the very essential steps in building your cube is to design
aggregations. You probably start by running the Aggregation Design Wizard which
will look to the relationships between dimensions attributes, natural user
hierarchies, and some attribute properties then build some aggregations based
on that, yet the Aggregation Design Wizard does not know what queries the users
will run, as a result you will end up with some aggregations that will be
useful when using a natural user hierarchies, in addition to some other
aggregations on the level of the most granular level of each dimension and some
other attributes levels, but not a really satisfying aggregations that will be
hit in those heavy coasty reports. Then you start thinking about Usage-Based
Optimization, but you don’t have usage log yet, and you can’t ask the users to
start generating reports and running queries because you don’t want them to
have a bad first impression about this promised to be the fast reporting
solution, especially when you know in advance the main reports they are looking
for or at least some of them.
Before starting manually designing
aggregation remember that aggregations are pre-calculations of measures on
an attribute OR COMBINATION OF ATTRIBUTES level of granularity, for example if
someone wanted to know the total number of ALL the acts (like fouls, goals,
saves, tackles… etc.) done by a soccer player, so instead of doing the counting
of those acts in the time of executing the query, which probably takes time to
complete, aggregation in this case is a pre-counting of the ALL acts for each
player which needs just to be fetched from the hard drive or sometimes even the
cache. Now if another report is to show the total number of a certain act, say
fouls, done by a certain player, you can design an aggregate to accommodate the
counting of each act done by each player, this is what will be illustrated in
this post. Note that, in the last case, having two aggregations, one on the
level of the player and another on the level of the act will not serve the
previous request, it has to be one aggregation on the level of the combination
of the players and the acts (Remember that to find if the aggregation was used
or not, you have to run SQL Profiler and add “Get Data From Aggregation”, if it
appears when running a query the an aggregation was used).
Using the last soccer cube example, and to test
manually design aggregations:
- In BIDS, under your cube, open the
Aggregations tab.
- Click the “Advance view” icon in the
Aggregations tab tool bar.
- From the “Measure Group” choose the measure
group you want to design aggregations for.
- Click “New Aggregation Design” again in the
Aggregation tab tool bar. A pop up window will appear, there you can name
the aggregation … in this example I’ll name it “Manual Aggr”.
- Pressing OK will list all the dimensions.
Expanding any of them will list its attributes.
- Click “New Aggregation” from the tool bar
on the aggregation bar. This will add new columns of unchecked check
boxes, titled A and the sequential number (so if it is the first column it
will be A0).
- In the first example I will choose just
[Actor] attribute from the [D Actor Player]:
- In the Aggregation tab tool bar, click
“Assign Aggregation Design”. In the popup window make sure that the
“Manual Aggr” selected from the Aggregation designs drop down menu. Check
the partition you want to assign the aggregation to. Then click OK.
- Go back to the “Standard View” in the
Aggregation tab. Now you have to get and fill the “Estimated Partition
Size” property, and you can find this in the Partition tab.
- Reprocess the partition.
- To check if the Manual Aggr is being used,
open the profiler and connect to the SSAS. Add “Get Data From Aggregation”
event to the traced events. Start the trace.
- First I will run this very simple query
which will get every player with the total number of all the acts he did:
select [Measures].[F Events Count] on 0,[D Actor Player].[Actor].children on 1from [FBA Cube] - Going back to the Profiler we will find
that the aggregation has been used:
- If we change the query to get just how many
fouls committed by each player (clear the cache first, and clear the trace
window if you like!); so: select [Measures].[F Events Count] on 0,[D Actor Player].[Actor].children on 1from [FBA Cube]where [D Act].[Act].[Foul]
- This time we will not find “Get Data From
Aggregation” in the trace:
and this is expected because the query is in a grain lower than the only aggregation that we have. - You may think to ADD ANOTHER
aggregation on just the granularity level of the [Act] attribute from the
[D Act] dimension, so you will have:
- After reprocessing the cube, try to rerun
the last query, and again you will have no “Get Data From Aggregation” in
the trace, as both of the aggregations have higher granularity than the
query.
- What we have to do is to design an
aggregation on the same granularity level as the query (or lower):
- Now if you rerun the last query you will find “Get Data From Aggregation” in the trace.
Bottom line, for the aggregation to be used by the Analysis Services Storage Engine to answer a query, the aggregation granularity level has to be less than or equal the granularity level of the query, otherwise it will not be used. Worth mentioning that the closer the aggregation granularity level to the query granularity level the faster and better performance, as the Storage Engine will have to do less work and calculation to aggregate and roll up the available aggregation to the requested level of granularity and answer the Formula Engine which in turn answers the query.
No comments:
Post a Comment