SV - PerformanceGuidelines - Verification Academy
SV - PerformanceGuidelines - Verification Academy
These guidelines are aimed at enabling you to identify coding idioms that are likely to affect testbench performance. Please note that a number of these guidelines run
counter to other recommended coding practices and a balanced view of the trade off between performance and methodology needs to be made.
Whilst some of the code structures highlighted might be recognized and optimized out by a compiler, this may not always be the case due to the side effects of
supporting debug, interactions with PLI code and so on. Therefore, there is almost always a benefit associated with re-factoring code along the lines suggested.
SystemVerilog shares many common characteristics with mainstream software languages such as C, C++ and Java, and some of the guidelines presented here would be
relevant to those languages as well. However, SystemVerilog has some unique capabilities and short-comings which might cause the unwary user to create low
performance and memory hungry code without realizing it.
Tuning the performance of a testbench is made much easier the use of code profiling tools. A code profile can identify 'hot-spots' in the code, and if these places can be
refactored the testbench is almost invariably improved. In the absence of a profiling tool, visual code inspection is required but this takes time and concentration. These
guidelines are intended to be used before coding starts, and for reviewing code in the light of code profiling or by manual inspection.
https://verificationacademy.com/cookbook/sv/performanceguidelines 1/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
Contents
1 Code Profiling
2 Loop Guidelines
3 Decision Guidelines
3.1 Short-circuit logic expressions
3.2 Refactoring logical decision logic
3.3 Refactoring arithmetic decision logic
3.4 Priority encoding
4 Task and Function Call Guidelines
4.1 In-Lining Code
4.2 Task And Functional Call Argument Passing
5 Class Performance Guidelines
5.1 Avoid Unnecessary Object Construction
5.2 Direct Variable Assignment Is Faster Than set()/get() Methods
5.3 Avoid Method Chains
6 Array Guidelines
6.1 Use Associative Array Default Values
7 Avoiding Work
8 Constraint Performance Guidelines
8.1 Other Constraint Examples
9 Covergroup Performance Guidelines
9.1 Bin Control
9.2 Sample Control
10 Assertion Performance Guidelines
10.1 Unique Triggering
10.2 Safety vs Liveness
10.3 Assertion Guards
10.4 Keep Assertions Simple
10.5 Avoid Using Pass And Fail Messages
10.6 Avoid Multiple Clocks
Code Profiling
Code profiling is an automatic technique that can be used during a simulation run to give you an idea of where the 'hot-spots' are in the testbench code. Running a
code profile is a run time option, which if available, will be documented in the simulator user guide. See the "Profiling Performance and Memory Use" chapter in the
Questa User Guide for more information.
When your testbench code has reached a reasonable state of maturity and you are able to reliably run testcases, then it is always worth running the profiling tool. Most
code profilers are based on sampling; they periodically record which lines of code are active and which procedural calls are in progress at a given point in time. In order
to get a statistically meaningful result, they need to be run for a long enough time to collect a representative sample of the code activity.
In a well written testbench with no performance problems, the outcome of the sampling will be a flat distribution across the testbench code. However, if the analysis
shows that a particular area of the testbench is showing up in a disproportionate number of samples then it generally points to a potential problem with that code.
With constrained random testbenches it is always worth running through alternative testcases with different seeds whilst analyzing the profiling report since these may
throw light on different coding issues.
https://verificationacademy.com/cookbook/sv/performanceguidelines 2/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
Loop Guidelines
Loop performance is determined by:
The work that goes on within the loop should be kept to a minimum, and the checks made on the loop bounds should have a minimum overhead. Here are some
examples of good and bad loop practices:
Setting a variable to the size of the array before the loop starts saves the overhead of calculating the array.size() on every iteration.
The foreach() loop construct is typically higher performance than for(int i = 0; i < <val>; i++) for smaller arrays.
The lookup of the exponent value in the associative array on every loop iteration is unnecessary, since it can be looked up at the beginning of the loop.
https://verificationacademy.com/cookbook/sv/performanceguidelines 3/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
In this example, an array with unique entries is being searched within a loop for a given value. Using break in the second example terminates the evaluation of the
loop as soon as a match is found.
Decision Guidelines
When making a decision on a logical or arithmetic basis there are a number of optimizations that can help improve performance:
With an AND evaluation, if the the first term of the expression is untrue, the rest of the evaluation is skipped:
With an OR evaluation, if the first term of the expression is true, then the rest of the evaluation is skipped:
if(A || B || C)begin
// do something
end
If the terms in the expression have a different level of "expense", then the terms should be ordered to compute the least expensive first:
if(B.size() > 0) begin if(A && (B.size() > 0) && B[$] == 42) begin
if(B[$] == 42) begin // do something
if(A) begin end
// do something
end
end
end
If the inexpensive expression A evaluates untrue, then the other expensive conditional tests do not need to be made.
A slightly less obvious variant, which saves the computation required to arrive at a decision if C is not true.
https://verificationacademy.com/cookbook/sv/performanceguidelines 4/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
In the above example, refactoring the boolean condition removes one logical operation, using A as a short-circuit potentially reduces the active decision logic
Priority encoding
If you know the relative frequency of conditions in a decision tree, move the most frequently occurring conditions to the top of the tree. This most frequently applies
to case statements and nested ifs.
Most of the time, the case statement exits after one check saving further comparisons.
https://verificationacademy.com/cookbook/sv/performanceguidelines 5/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
// ready is not valid most of the time // ready is not valid most of the time
// read cycles predominate // read cycles predominate
// //
if(write_cycle) begin if(ready) begin
if(addr inside {[2000:10000]}) begin if(read_cycle) begin
if(ready) begin // do something
// do something end
end else begin
end if(addr inside {[2000:10000]}) begin
end // do something
else if(read_cycle) begin end
if(ready) begin end
// do something end
end
end
In the higher performance version of this example, if ready is not valid, the rest of the code does not get evaluated. Then the read_cycle check is made, which removes
the need for the write_cycle check.
https://verificationacademy.com/cookbook/sv/performanceguidelines 6/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
In the lower performance version of the code, a queue of ints and a string are copied into the function. As the queue grows in length, this becomes increasingly
expensive. In the higher performance version, both the int queue and the string arguments are references, this avoids the copy operation and speeds up the execution
of the function.
//
// Function that returns an object handle
//
function bus_object get_next(bus_state_t bus_state);
bus_object bus_txn = new();
if(bus_state.status == active) begin
bus_txn.addr = bus_state.addr;
bus_txn.opcode = bus_state.opcode;
bus_txn.data = bus_state.data;
return bus_txn;
end
return null;
endfunction: get_next
//
// Function that returns an object handle
//
function bus_object get_next(bus_state_t bus_state);
bus_object bus_txn;
// Only construct the bus_txn object if necessary:
if(bus_state.status == active) begin
bus_txn = new();
bus_txn.addr = bus_state.addr;
bus_txn.opcode = bus_state.opcode;
bus_txn.data = bus_state.data;
end
return bus_txn;// Null handle if not active
endfunction: get_next
It is not necessary to construct the bus transaction object, the function will return a null handle if it is not constructed.
https://verificationacademy.com/cookbook/sv/performanceguidelines 7/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
Constructing the write_req object is redundant since its handle is re-assigned by the get from the bus_write_req_fifo.
Making an assignment to the data variable within the class using its hierarchical path is more efficient than calling a method to set()/get() it. However, if the set()/get()
method does more than a simple assignment - e.g. a type conversion or a checking operation on the arguments provided, then the method approach should be used.
Note that : this guideline is for performance and flouts the normal OOP guideline that data variables within a class should only be accessible via methods. Using direct
access methods to get to variables improves performance, but comes at the potential cost of making the code less reusable and relies on the assumption that the user
knows the name and type of the variable in question.
https://verificationacademy.com/cookbook/sv/performanceguidelines 8/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
The second implementation extends the mailbox directly and avoids the extra layer in the first example.
https://verificationacademy.com/cookbook/sv/performanceguidelines 9/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
In the first example, a function call is implemented as a chain, whereas the second example has a single method and will have a higher performance. Your code may be
more complex, but it may have method call chains that you could unroll.
Array Guidelines
SystemVerilog has a number of array types which have different characteristics, it is worth considering which type of array is best suited to the task in hand. The
following table summarizes the considerations.
Dynamic Array Array size determined/changed during simulation. Array indexing is efficient.
Less
int a_ray[]; Index is by integer. Managing size is important.
Index is by a defined type, not an integer Efficient for sparse storage or random access
Associative arrays
Has methods to aid management More Becomes more inefficient as it grows, but elements can be deleted
int a_ray[string];
Sized or unsized at compile time, grows with use Non-integer indexing can raise abstraction
For example, it may be more efficient to model a large memory space that has only sparse entries using an associative array rather than using a static array. However, if
the associative array becomes large because of the number of entries then it would become more efficient to implement to use a fixed array to model the memory
space.
https://verificationacademy.com/cookbook/sv/performanceguidelines 10/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
Avoiding Work
The basic principle here is to avoid doing something unless you have to. This can manifest itself in various ways:
1. Minimize the number of active rand variables - if a value can be calculated from other random fields then it should be not be rand
2. Use minimal data types - i.e. bit instead of logic, tune vectors widths to the minimum required
3. Use hierarchical class structures to break down the randomization, use a short-circuit decision tree to minimize the work
4. Use late randomization to avoid unnecessary randomization
5. Examine repeated use of in-line constraints - it may be more efficient to extend the class
6. Avoid the use of arithmetic operators in constraints, especially *, /, % operators
7. Implication operators are bidirectional, using solve before enforces the probability distribution of the before term(s)
8. Use the pre-randomize() method to pre-set or pre-calculate state variables used during randomization
9. Use the post-randomize() method to calculate variable values that are dependent on random variables.
10. Is there an alternative way of writing a constraint that means that it is less complicated?
The best way to illustrate these points is through an example - note that some of the numbered points above are referenced as comments in the code:
https://verificationacademy.com/cookbook/sv/performanceguidelines 11/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
https://verificationacademy.com/cookbook/sv/performanceguidelines 12/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
MONO_inside_c.constraint_mode(0);
MONO_size_c.constraint_mode(0);
YCbCr_inside_c.constraint_mode(0);
YCbCr_size_c.constraint_mode(0);
RGB_inside_c.constraint_mode(0);
RGB_size_c.constraint_mode(0);
mode = vid_type;
case (vid_type)
MONO : begin
this.MONO_inside_c.constraint_mode(1);
this.MONO_size_c.constraint_mode(1);
end
YCbCr: begin
this.YCbCr_inside_c.constraint_mode(1);
this.YCbCr_size_c.constraint_mode(1);
end
RGB : begin
this.RGB_inside_c.constraint_mode(1);
this.RGB_size_c.constraint_mode(1);
end
default : `uvm_error(get_full_name(),
"!!!!No valid video format selected!!!\n\n", UVM_LOW);
endcase
function new(string name = "video_frame_item");
super.new(name);
endfunction
endclass: video_frame_item
https://verificationacademy.com/cookbook/sv/performanceguidelines 13/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
The two code fragments are equivalent in functionality, but have a dramatic difference in execution time. The re-factored code makes a number of changes which speed
up the generation process dramatically:
In the original code, the size of the array is calculated by randomizing two variables - length and array size. This is not necessary since the video frame is a fixed size
that can be calculated from other properties in the class.
The length of the array is calculated using a multiplication operator inside a constraint
In the first example, the content of the data array is calculated by the constraint solver inside a foreach() loop. This is unnecessary and is expensive for larger arrays.
Since these values are within a predictable range they can be generated in the post_randomize() method.
https://verificationacademy.com/cookbook/sv/performanceguidelines 14/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
The enum types live_freeze_t and video_mode_e will have an underlying integer type by default, the refactored version uses the minimal bit types possible.
The original version uses a set of constraint_mode() and rand_mode() calls to control how the randomization works, this is generally less effective than coding the
constraints to take state conditions into account.
In effect, the only randomized variable in the final example is the live_freeze bit.
The first version of the constraint uses a modulus operator to set the lowest two bits to zero, the second version does this directly avoiding an expensive arithmetic
operation
Lower Performance Version
enum bit[3:0] {ADD, SUB, DIV, OR, AND, XOR, NAND, MULT} opcode_e;
opcode_e ins;
constraint select_opcodes_c {
ins dist {ADD:=7, SUB:=7, DIV:=7, MULT:=7};
}
enum bit[3:0] {ADD, SUB, DIV, OR, AND, XOR, NAND, MULT} opcode_e;
opcode_e ins;
constraint select_opcodes_c {
ins inside {ADD, SUB, DIV, MULT};
}
The two versions of the constraint are equivalent in the result they produce, but the first one forces a distribution to be solved which is much more expensive than
limiting the ins value to be inside a set.
Bin Control
Each coverpoint automatically translates to a set of bins or counters for each of the possible values of the variable sampled in the coverpoint. This would equate to
2**n bins where n is the number of bits in the variable, but this is typically limited by the SystemVerilog auto_bins_max variable to a maximum of 64 bins to avoid
problems with naive coding (think about how many bins a coverpoint on a 32 bit int would produce otherwise). It pays to invest in covergroup design, creating bins
that yield useful information will usually reduce the number of bins in use and this will help with performance. Covergroup cross product terms also have the potential
to explode, but there is syntax that can be used to eliminate terms.
https://verificationacademy.com/cookbook/sv/performanceguidelines 15/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
In the first covergroup example, the defaults are used. Without the max_auto_bins variables in place, there would be 256 bins for both A and B and 256*256 bins for
the cross and the results are difficult to interpret. With max_auto_bins set to 64 this reduces to 64 bins for A, B and the cross product, this saves on performance but
makes the results even harder to understand. The right hand covergroup example creates some user bins, which reduces the number of theoretical bins down to 48
bins for A and B and 2304 for the cross. This improves performance and makes the results easier to interpret.
Sample Control
A common error with covergroup sampling is to write a covergroup that is sampled on a fixed event such as a clock edge, rather than at a time when the values
sampled in the covergroup are valid. Covergroup sampling should only occur if the desired testbench behavior has occurred and at a time when the covergroup
variables are a stable value. Careful attention to covergroup sampling improves the validity of the results obtained as well as improving the performance of the
testbench.
In the first example, the covergroup is sampled on the rising edge of the clock and the iff(valid) guard determines whether the bins in the covergroup are incremented
or not, this means that the covergroup is sampled regardless of the state of the valid line. In the second example, the built-in sample() method is used to sample the
covergroup ONLY when the valid flag is set. This will yield a performance improvement, especially if valid is infrequently true.
Unique Triggering
The condition that starts the evaluation of a property is checked every time it is sampled. If this condition is ambiguous, then an assertion could have multiple
evaluations in progress, which will potentially lead to erroneous results and will definitely place a greater load on the simulator.
https://verificationacademy.com/cookbook/sv/performanceguidelines 16/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
In the first example, the property will be triggered every time the req signal is sampled at a logic 1, this will lead to multiple triggers of the assertion. In the second
example, the property is triggered on the rising edge of req which is a discrete event. Other strategies for ensuring that the triggering is unique is to pick unique
events, such as states that are known to be only valid for a clock cycle.
Safety vs Liveness
A safety property is one that has a bound in time - e.g. 2 clocks after req goes high, rsp shall go high. A liveness property is not bound in time - e.g. rsp shall go high
following req going high. When writing assertions it is important to consider the life-time of the check that is in progress, performance will be affected by assertions
being kept in flight because there is no bound on when they complete. Most specifications should define some kind of time limit for something to happen, or there will
be some kind of practical limit that can be applied to the property.
property req_rsp;
@(posedge clk);
$(posedge req) |=>
(req & ~rsp)[*1:2]
##1 (req && rsp)[->1] // Unbound condition - within any number of clocks
##1 (~req && ~rsp);
endproperty: req_rsp
property req_rsp;
@(posedge clk);
$rose(req) |=>
(req & ~rsp)[*1:4] // Bounds the condition to within 1-4 clocks
##1 (req && rsp)
##1 (~req && ~rsp);
endproperty: req_rsp
Assertion Guards
Assertions can be disabled using the iff(condition) guard construct. This makes sure that the property is only sampled if the condition is true, which means that it can
be disabled using a state variable. This is particularly useful for filtering assertion evaluation during reset or a time when an error is deliberately injected. Assertions
can also be disabled using the system tasks $assertoff() and $asserton(), these can be called procedurally from within SystemVerilog testbench code. These features can
be used to manage overall performance by de-activating assertions when they are not valid or not required.
https://verificationacademy.com/cookbook/sv/performanceguidelines 17/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
Note that even leaving a blank begin ... end for the pass clause causes a performance hit.
SV/PerformanceGuidelines Discussion
SOLVED TITLE REPLIES VIEWS POSTED UPDATED ACTIONS
The Verification Methodology Cookbook content is provided by Mentor Graphics' Verification Methodology Team. Please contact us
(mailto:[email protected]?subject=verification-methodology-cookbook-feedback-SV/PerformanceGuidelines) with any enquiries, feedback, or
bug reports.
https://verificationacademy.com/cookbook/sv/performanceguidelines 18/19
12/8/22, 3:32 PM SV/PerformanceGuidelines | Verification Academy
#TodayMeetsTomorrow
Cloud Community About Us VA - Contact Us
(https://twitter.com/search?
q=%23todaymeetstomorrow)
(https://www.sw.siemens.com/en-(https://community.sw.siemens.com/s/)
(https://www.sw.siemens.com/en-(https://verificationacademy.co
(https://eda.sw.siemens.com/en- (https://www.plm.automation.siemens.com/global/en/your-
US/contact-eda)
(https://www.plm.automation.siemens.com/global/en/products/mindsphere/)
(https://www.plm.automation.siemens.com/global/en/our-
story/offices.html)
(https://www.plm.automation.siemens.com/global/en/) (https://www.plm.automation.siemens.com/global/en/our-
US/signin)
US/portfolio/) (https://www.plm.automation.siemens.com/global/en/our-
story/partners/)
Trust Center
(https://www.sw.siemens.com/en-
US/trust-center)
https://verificationacademy.com/cookbook/sv/performanceguidelines 19/19