vlsi-soc.blogspot.com Open in urlscan Pro
2a00:1450:4001:828::2001  Public Scan

URL: http://vlsi-soc.blogspot.com/
Submission: On January 05 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

VLSI SoC Design: Concepts, Perspective & Implementation




PAGES

 * Home
 * About Me
 * Computer Architecture
 * DFT
 * Low Power Methodology
 * Physical Design
 * Puzzles
 * SoC Design
 * Tcl / Perl






JUNE 14, 2019


IR DROP ANALYSIS - II


A couple of years back, I wrote about IR Drop Analysis in one of my earlier
posts. Fortunately. I got to work on IR Drop Analysis more extensively over past
couple of months, and I thought I'll share my perspective gained from the work
in form of a new post! 



During static timing analysis, the voltage (Vdd) at all the devices is assumed
to be a constant. Similarly, the ground pin (Vss) is assumed to be held at a
constant 0 V. In reality, this voltage is not a constant and it varies with
time. This variance in the voltage on the power and ground lines is referred to
as Power noise and Ground bounce respectively. This noise is collectively
referred to as Power noise. IR drop on the data path cells will impact
setup-timing, while on the clock cells, it may cause both setup and hold timing
problems. 



Voltage Droop and Ground Bounce



The robustness of power grid needs to be tested thoroughly under various modes
of operation. These two modes are referred to as Static IR Drop and Dynamic IR
Drop.
I.                    Static IR Drop
Static IR drop takes into account the average current drawn from the power grid
assuming average switching conditions. This analysis is performed early in the
design cycle when simulation vectors are not quite available to the design
teams. Instead, static IR drop relies on average data switching to compute the
average current drawn from the power grid over 1 clock cycle. 

Static IR drop can highlight power grid weakness in the design. Static IR drop
violations spread all across the design point to the fact that the power grid
needs to be re-designed to reduce the overall power grid resistance. There may
be cases where static IR drop violations may be concentrated around the regions
with inherent power grid weaknesses- like the regions with one-sided power
delivery- around the floorplan boundary, around the macros, within the macro
channels.

Power distribution network is usually a mesh in top most metal layers with
strategic drop downs to lower metal layers which eventually feed the standard
cells. Power is routed in top metal layers to keep the resistance minimum which
will also ensure uniform power delivery to all parts of the chip.


If PDN is not design carefully, it will result in creation of one-sided power
delivery which will create areas of high resistance.

Power grid strengthening can be achieved by:

 *  Making the power grid denser by adding wider PG straps to improve the
   current conductivity.
 * Incrementally inserting via or via ladders along the power grid to drop from
   a higher metal layer to lower metal layers.



Increasing the clock frequency (with or without optimizing for higher frequency
target) has a direct impact on static IR drop, because it increases the average
current drawn from the power grid.



Lowering the clock frequency decreases the average current, and hence also
decreases static IR drop



II.                     Dynamic IR Drop
Dynamic IR drop, also known as Instantaneous Voltage Drop (IVD), is the
instantaneous drop in the voltage rails because of high transient current drawn
from the power grid. Dynamic IR drop takes into account the instantaneous
current drawn from the power grid in a switching event. This analysis is usually
performed towards the end of design cycle when design team has the simulation
vectors available from their functional or test pattern simulations. This mode
of analysis is most time consuming, but nevertheless critical to ensure no
surprises on silicon.
Dynamic IR drop is a function of:

 * Power Distribution Network (PDN): Just like the static IR drop, weak PDN
   affects dynamic IR as well. A weaker power grid is not equipped to meet the
   peak current demand by switching standard cells and it exacerbates the
   dynamic IR drop.
 * Simultaneous Switching: Higher simultaneous switching of standard cells tends
   to create local hotspots where peak current demand is higher, which causes
   voltage to drop in these hotspots.



Potential ways to mitigate dynamic voltage drop are as follows:
 * Augmenting the power grid to minimize PG resistance- Adding more power/ground
   straps facilitate better distribution of current to the standard cells,
   thereby reducing the susceptibility to dynamic IR drop. 
 * Cell Padding- Another effective way to reduce dynamic IR drop is to space
   apart cells which switch simultaneously to reduce the peak current demand
   from the power grid. This works especially well for clock cells which tend to
   display temporal switching and spatial locality.




Cell Spacing to solve instantaneous voltage drop

 * Downsizing- Downsizing cells reduces the instantaneous current demand, with a
   possible downside on setup timing.

Downsizing cells to solve instantaneous voltage drop

 * Splitting the output capacitance- The amount of current drawn from the power
   grid is directly proportional to output capacitance that’s being driven.
   Splitting the output capacitance can reduce the peak current demand, and also
   improve timing in most cases.

Split output capacitance to reduce peak current drawn from the power grid

 * Inserting decap cells- Decap cells are decoupling capacitors that tend to act
   as charge reservoirs that can supply current to the standard cells in event
   of high requirement, especially when there’s simultaneous switching of cells
   in a local region. However, just like any capacitor, decaps tend to be leaky
   and add to the leakage power dissipated in the design. 




Inserting decaps to minimize dynamic voltage drop



With shrinking geometries, designs are moving from gate-dominated designs to
wire-dominated designs. Also, the operating frequencies have been increasing.
More signal wires mean lesser routing resources for the power distribution
network. Moreover, lower technology nodes allow higher packing density of
standard cells. Higher frequencies cause higher switching resulting in higher
voltage droop and higher ground bounce.

Due diligence is necessary not just to design the power grid but also to analyze
and fix the dynamic IR drop violations to avoid seeing any timing surprises on
silicon.



Posted by Naman at 5:09 PM 11 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: Capacitance, Cell Padding, Clock Frequency. Downsizing, Decap, Dynamic
IR Drop, Ground Bounce, Instantaneous Voltage Drop, IVD, Power Distribution
Network, Static IR Drop, Voltage Droop



DECEMBER 16, 2018


MAZE ROUTER (LEE'S ALGORITHM)


In this post, let's talk about Maze Routing Algorithm which is a manifestation
of Breadth First Search (BFS) Algorithm to find the shortest path between two
nodes in a grid.


A crude version of this algorithm is also known as Lee's Algorithm. I will
discuss Lee's Algorithm, and few improvements to that to improve the run time
and memory.


Here's the problem statement. You need to connect the node S (source) and the
node T (target or the destination) with the shortest possible path. These nodes
are shown in red. The grids in blue represent a routing blockage, meaning you
cannot route over these grids. You'll need to find a way around these to reach
the destination node.



Problem Statement: Maze Router (Lee's Algorithm)

VLSI routes are laid orthogonal in X and Y direction. Diagonal (also known as
X-routing) is usually forbidden. Let's say you need to start out from node S,
you have 4 possible directions in which you can proceed:

4 possible directions from a given node

The number 1 represents the distance traveled from the source node. Once you
have traveled a distance 1, here is how your grid looks like:


Grid after traveling a distance of 1 unit

Similarly, after traveling a distance of 2 units, the grid is shown below. 



Grid after 2 iterations of Lee's algorithm



Now you've hit a wall, and it will be apparent that you cannot hop over the wall
(or the blockage) from the next figure. After multiple iterations of the Lee's
algorithm, the grid would look something this:

Grid after 8 iterations of Lee's algorithm


Continue doing this till you hit the target or the destination node.

Grid after you hit the target node

Now you need to backtrace from the target to the source following successive
lower integers to find the shortest path. Note that you may have many possible
shortest choices, but all of them are guaranteed to be the shortest. Usually,
there's a cost associated with turns (vias in a physical context), so
practically, you may assign a weight or parameter to minimize the number of
turns to choose among more than 1 possible shortest paths.

Back-tracing to find the shortest path

This embodiment of the Lee's algorithm has high complexity, especially if the
grid size is higher than the one shown in the example above. Notice how much
wasteful computation we had to perform over to the right. This can be minimized
if we initiate the same computation from both the target and the source, and
back-trace to the target and the source respectively once the two wavefronts
(the one in green and one in yellow) intersect. This results in far less time
complexity and much less wasteful computations.



Modification to the Lee's algorithm to start computation from both target and
the source



One another possible improvement to the above algorithm is the memory required
to save the distance numbers for each node. Imagine a 10x10 grid. The worst
distance could be 100, and you'd require 7 bits to store numbers up to 100. That
means a worst space complexity of 700 bits for 10x10 grid. For 20x20 grid, worst
distance could be 400, requiring 9 bits per box and a total space complexity of
3600 bits. In order to reduce the complexity, it's possible to go only up to 3
while counting, and then counting down to 1, and so on.. Back-tracing is
slightly more complicated, but it saves you a ton of space!














Posted by Naman at 7:50 PM 3 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: Global Routing, Lee's Algorithm, Maze Router, Routing



NOVEMBER 03, 2018


RC INTERCONNECT CORNERS


While PVT corners are pretty straight forward to understand, many designers
often feel confused with RC corners. In this post, I'll explain different RC
corners that we need to sign-off our ASIC design on.


First try and understand the source of variations in R and C that lead to
multiple RC corners. Some of the sources are:
Chemical Mechanical Planarization (CMP) process which removes the excess
materials deposited while manufacturing. Variability of photolithography
equipment. Some inconsistencies during the metal etching, where you might etch
little bit more or little bit less which can directly impact the thickness of
the interconnect wires.

Now, let’s take a look at the cross-section of semiconductor interconnects and
see how does Resistance and Capacitance of the wires vary.
 Resistance: Resistance depends on length and cross-section area, of course,
with resistance being directly proportional to wire length, and being inversely
proportional to the cross-section area. As temperature increases, resistance
usually increases.


Capacitance: It also depends on the interconnect dimensions and is directly
proportional to the cross-section area and inversely proportional to the
distance between the two wires. Physics 101. Ground and coupling capacitances.










Figure 1: Cross-section of a semiconductor inter-connects








While length is dependent on the design, other parameters are dependent on the
technology node, where minimum wire-pitch, spacing, widths are defined in the
tech-file. While discussing about RC corners, let’s limit our focus to the
technology parameters: W, T, S and H.

Another important observation here is W and S are inversely correlated: increase
in W means smaller S, and vice-versa. Rest all the parameters: W, T and H are
uncorrelated. Variations in W and T manifest in different effects on Resistance
and the capacitance. The wire delay, which is a rough function of R*C, is not a
linear function of interconnect width.



Figure 2: Delay vs interconnect width




There may exist a sweet spot for interconnect width where R*C is minimum. Let’s
call this W-opt. It would vary from one technology node to another. For widths
smaller than the W-opt, resistance dominates in R*C and we would see maximum
delay at W-min. For interconnect widths greater than W-opt, capacitance
dominates in R*C and we would see maximum delay at W-max. For interconnect
widths across W-opt, it might be difficult to say which corner: W-min or W-max
would yield the worst delay value.

You might be looking for a straight forward answer, but you won’t find one here.
J This story was important to help you connect the dots with the discussion on
relative strength of aggressor and victim, and which case would produce the
worst signal noise. The answer is it depends, and one cannot claim that a
particular RC interconnect corner would always yield the worst noise results. It
will depend a lot on the victim’s interconnect and aggressor’s switching
characteristics. For example, now you understand how victim delay changes with
interconnect parameters- W, T, H etc. Let’s say you performed a sensitivity
analysis by changing the widths of the wires by delta amount.


W1 = W – ΔW and W2 = W + ΔW. If the delay of the wire is like:


Delay at width  W1 < Delay at width W < Delay at width W2, ... It means your
wire lies in capacitance dominated region.


Similarly, if:


Delay at W1 > Delay at W > Delay at W2, ... Your wire lies in resistance
dominated region.

Delay at W1 > Delay at W < Delay at W2, ... Your wire lies across the W-opt in
the graph.

This explains how delays of wires changes across RC corners. We have 4 RC
interconnect corners: Cmin, RCmin, Cmax, RCmax.









If your wire lies in the capacitance dominated region, it would be more
susceptible to the impact of coupling capacitance, and hence any switching
activity on the aggressor. Although, the noise may increase or decrease
depending on the relative switching characteristics of aggressor and the victim,
as discussed in the PVT corner section.


For example:
 RCmax: Although a bad design, but let’s say you have a very long wire (large L)
in lower metal layers (small W, T) resistance would dominate and you would see
the worst delay in the RCmax corner. RCmax is usually the most critical corner
for setup timing closure. This would manifest when: Cc is minimum, and (R*Cg) is
maximum.


RCmin: Let’s say you have many min paths in your design, and you’re looking for
best delay numbers which can potentially result in hold time failures, you would
look for RCmin corner, where you have many short nets (R would therefore be
negligible), and capacitance would be minimum because of maximum spacing (S) and
height (H). This would usually be the hold critical corner. Cc is maximum, and
(R*Cg) is minimum.


Cmax: In presence of noise, you would want to check the corners with worst
coupling capacitance. That would be your Cmax corner. This might also produce
the worst delay for short nets for which resistance would be minimal. Cc is
maximum.


Cmin: Cc is minimum, R is maximum and Cg is minimum. The short nets in min paths
with minimal resistance (with or without aggressors) might see hold violations.

I was hoping no one would notice that in the table above, S and H are mentioned
to be positively correlated, but in reality, and as also mentioned earlier, they
are inversely correlated. This assumption might make the analysis more
interesting, but I reckon EDA tools would assume the worst case and add
additional pessimism in the timing analysis.

In a nutshell, it’s impossible to signoff noise at just one PVT or one
interconnect corner. Also, designers take into account accurate aggressor
switching activity in order to compute the worst noise impact. Assuming static
aggressors, and taking into account only the coupling and the ground capacitance
(Cc + Cg) may produce optimistic results, and subsequent failures on silicon.

References:

 1.  “Parametric Analysis to Determine Accurate Interconnect Extraction Corners
    for Design Performance”, by Mutlu, Le, Molina and Celik. IEEE 2009.
 2. “Interconnect Performance Corners considering Crosstalk Noise”, by
    Gandikota, Blaauw, Sylvester. IEEE 2009.



Posted by Naman at 11:02 AM 6 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: Cmax, Cmin, CMP, Coupling Capacitance, Ground Capacitance, Interconnect
Delays, PVT Corner, RC Corner, RCmax, RCmin, Variations in interconnect delays



APRIL 24, 2018


FALSE PATH V/S CASE ANALYSIS V/S DISABLE TIMING


Often people have asked me the difference between set_false_path,
set_case_analysis and set_disable_timing. While the difference between these
three is quite easy, it's the implications that leave many designers stumped.


Let me take a shot at explaining the difference.


1. FALSE PATH: All the timing paths which designers know won't be exercised on
the fly, and they don't really need to meet any timing constraints on that path
can be marked as false paths.
Tools would compute delays on all arcs on the false-path, would try to meet
slopes/max-fanout/max-capacitance targets for all nodes along the path, but
these paths would never surface up as timing (setup and hold) violations.
However, if designers are too concerned about meeting slope and max cap targets,
they usually prefer to mark such paths as set_multicycle_path instead.

Some examples of false path:



Consider the circuit above. The select line of the two multiplexers is
complement of each other. STA tool, however, doesn't understand this logic and
would treat all nodes as X (either 0 or 1). In practice, there can never be a
timing path between 

C -> E -> G
D -> F -> G

And these can be marked as false paths.

2. CASE ANALYSIS: Using set_case_analysis, any node can be constrained to a
boolean logic value of 1 or 0. All case values are evaluated and propagated
through the design. For example, if one input of an AND gate is 0, 0 being the
controlling value, the output of AND gate would also be 0 and this 0 is
propagated downstream. The timing arcs for set_case_analysis are not evaluated
and they never show up in the timing reports. However, PnR tooks would still fix
max transition, max capacitance and max-fanout violations on these nets/pins.


 * Some latest tool versions also support a case value of static which means
   that the node will always be static (never toggle), and this is used to
   reduce the pessimism which doing noise analysis.

 * Case analysis is also particularly useful for DFT modes where you would want
   to set a few configuration registers and drive the chip into a particular DFT
   mode: like atspeed, shift or stuck-at mode. This acts as an additional level
   of verification because you'd expect to see only scan chains in the shift
   mode with scan enable being 1. You'd expect to see functional paths in the
   atspeed mode with scan enable being X, and you'd expect to see only paths
   ending at functional register inputs in the stuck-at mode with scan enable
   being 0.


3. DISABLE TIMING: This disables a particular timing arc, and that timing arc or
any timing path through the disabled timing arc is not computed. This tends to
be a bit disruptive as compared to false paths or case analysis, but in some
cases this is indispensable and the easiest way to achieve the intent. For
example if you have a MUX based divider which receives the clock signal at the
select line of the multiplexer, and two functional enables at the multiplexer
inputs, STA tool would try to propagate the clock to the output of the MUX via
the MUX select line to the output. But for a MUX, a select line only controls
what gets propagated to the output. In practice, there's no arc between select
and output and should be disabled.








Both case analysis and disable timing result in fewer timing paths to be
analyzed. False path still tries to fix the design rule (max cap, max transition
and max fanout) violations.

Posted by Naman at 1:57 PM 4 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: At-Speed, DFT, Disable Timing, False Paths, Set Case Analysis, Shift,
STA, Static Timing Analysis, Stuck-At



APRIL 08, 2018


LEAKAGE POWER: INPUT VECTOR DEPENDENCE


Leakage Power of a standard cell depends on various transistors parameters like
the channel length, threshold voltage, substrate or the body bias voltage etc.
Apart from these physical parameters, leakage power also depends upon the input
vector applied.


Consider a 2-input NAND gate and a 3-input NAND gate. Can you arrange the input
combinations: (AB = 00, 01, 10, 11 for a 2-input NAND gate), and (ABC = 000,
001, 010, 011, 100, 101, 110, 111 for a 3-input NAND gate) in increasing order
of leakage current, with a word of two about the logical reasoning behind it?


Note that the order of transistors in a stack matters here.



2-input NAND and 3-input NAND Gates













Posted by Naman at 10:38 PM 5 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: CMOS Nand Gate, Input Vector Dependence on Leakage Power, Leakage
Current, Leakage Power



APRIL 15, 2017


TUNING CTS RECIPE


I've been trying to debug and tune my CTS recipe for quite some weeks now, and
this gave me the basic insight into the CTS algorithm, various knobs available
to the designers to be able to tune their CTS results to achieve the desired
skew, transition and latency targets.


In this blog post, I'll discuss about those knobs while trying my best not to go
into tool specific commands/constructs to be able to keep the discuss more
conceptual and tool independent. Before we delve any deeper into these knobs,
let's ask the basic question first: why do we need CTS to begin with, or what
goals do we expect CTS to achieve for us? The answer is to be able to create a
balanced clock tree. A balanced clock tree would simply mean: minimum skew
between your sequentials in the design (of course we would only be interested in
skew within the same clock group. Let me know in comments if this part is not
clear). In addition to minimizing the skew, we would also like to achieve
minimum latency by adding minimum number of clock buffers on the clock path
thereby ensuring  lesser area, lesser routing congestion and most importantly no
extra dynamic power dissipation!


Now, we have the required background to discuss the CTS knobs in detail! :)


1. Creating Skew Groups: Skew groups are basically groups of sink-pins (clock
end-points) which need to be balanced against each other. Now, some skew groups
may be default, some might need to be created explicitly to help CTS engine.
We'll take a look at some use-cases.
Default skew groups: Let's say you have 5 clocks in your design. 
Group1: CLK1, CLK2 and CLK3 are synchronous to each other.
Group2: CLK4, CLK5 are synchronous to each other.

Group1 and Group2 are logically exclusive and therefore clocks within each group
are implicitly asynchronous to the clocks in other group.
In this case, by defining clock groups, we have implicitly defined skew groups.
CTS engine would try and balance latencies of CLK1, CLK2 and CLK3. And
independently try and balance clock latencies of CLK4 and CLK5.

Sometimes, however, designers might want to create some explicit skew groups on
top of the implicit ones. Let's take a look at those use-cases.







The figure highlights the sequential cloud of devices working on CLK1, CLK2 and
CLK3 respectively. Assume there's heavy traffic and interaction between CLK1 and
CLK2 sequentials while only a very few sequentials working on CLK3 interact with
those working on CLK1 and CLK2. Clock enters the partition via three different
clock ports on the left side, and certainly distance between the CLK3 port and
CLK3 sequentials is the largest, thereby CTS engine would need to insert more
clock buffers to maintain the transition (Ask yourself why? What would be the
caveat if clock transition goes bad? Puzzle: Clock Transition). Assuming average
latency that CTS can manage for CLK3 sequentials is 150 ps, while for CLK1 and
CLK2 sequentials, it's 100 ps. In order to balance these three clocks, it will
push the clock latency for CLK1 and CLK2 sequentials to match that of the
longest latency: 150 ps. If, as designers, we know that interaction between CLK3
sequentials and CLK1, CLK2 sequentials is not too much, or even if it's too
much, we know from timing perspective (both hold and setup) we have sufficient
positive slack, we don't really need to balance these three clocks. We can
create a separate skew group for CLK3 sequentials thereby preventing the extra
latency  on CLK1 and CLK2 buffers. This would help us in minimizing clock tree
buffers, the associated area, routing resources, power and perhaps even the
detrimental impact of OCVs on the uncommon clock path. (Read the post: Common
Path Pessimism for greater insight).


Another case could be let's say a hard IP in your design which is placed far
away from rest of the sequentials working on the same clock. And you know that
there's minimal interaction between the sequentials and hard IP, you might need
to create a separate skew group for the hard IP clock pin.







2. Sequential Clustering: (Different from Register Banking) CTS is performed
after the placement step and by that time all the sequentials and standard cells
have been placed. And this placement of sequentials is invariably driven only by
the data path optimization constraints. In other words, placement engine would
place sequentials at locations which it finds convenient to meet timing assuming
ideal clock distribution. As depicted in the figure below, for some reason,
placer decided to place a small bunch of sequentials working on CLK1 far away
from the port thereby threatening to shoot up the clock latency of all the CLK1
sequentials. Now, either you can try and create a separate skew group to
decouple these sequentials, or you can re-run placement tightly bounding all
CLK1 sequentials togther to prevent latency (and hence clock skew) shoot-up.







3. Clock Ordering and "dont touch subtree": You might have cases in your design
where there's clock multiplexing, let's say between functional and scan clocks,
and you need to create a clock tree for both of them. compile_clock_tree usually
works on a clock by clock basis. Let's say you were smart enough to enforce the
order to command CTS engine to build the CTS network for fast functional clock
first and then for the slower scan clock. That's a reasonable approach
considering skew, transition and latency targets would be more difficult and
constrained to meet for faster clocks, and by building the CTS for faster clocks
first, you are giving the engine the leeway to do it's best possible job.
However, when it will try and balance the network for scan clocks, it can touch
the functional clock network as well. One key difference between functional and
scan clocks, in addition to the difference in clock frequencies, would be the
scan clock would have a greater fan-out than the functional clocks and therefore
more scope for the CTS engine to goof-up! To prevent this, we need to do two
things:



a) Enforce CTS order to construct the clock tree for faster clocks first and
slower clocks next
b) In order to prevent slow clock from altering the clock tree network of fast
clocks, we need to apply a dont_touch_subtree exception on the MUX input of the
slower clock. 




4. Divided Clocks and "stop_pins": By default, all the sequentials which are
flop-based dividers, their CLK is treated as a default "non-stop-pin". Meaning
CTS would consider clk -> out arc of these divider flops to be a "through-pin"
and try to balance the latencies of the master clock and the generated clock.
Now, consider the case as shown below. There are many ways to solve the problem
and which of the two methods give you better results would depend on the design:







a) Creating a different skew group for the sequentials placed far away. This
would de-couple the sequentials placed nearby and the ones placed far away. And
CTS engine would be able to do a decent job.


b) Another experiement well worth a shot could be defining at CLK pin of the
divider flop as a "stop_pin" so that latency of the master clock would be in
check considering it will treat all it's sequentials including the divider flop
as one group and do a relatively good job in balancing out these sequentials.
This would avoid latency shoot-up of the master clock.


5. Exclude Clock from CTS: If there are two clocks defined at the same pin/port
with different clock periods, whether they be synchronous or asynchronous, it
might be a good idea to exclude the slower clock from CTS all-together to
prevent CTS from touching the same clock network twice and surprising you with
the results.


6. Clock used as data and "exclude pins": You might have some cases where clock
is being used as data inside your design. CTS engine would be oblivious of this
fact and might go crazy while building the clock tree. In these cases, it would
be a good idea to explicitly mark the beginning of data path as "exclude_pin" to
guide CTS engine to exclude anything further from clock tree balancing!





I couldn't think of any more cases. If you have some interesting use cases that
I might have missed, kindly share them in the comments. :)





Posted by Naman at 1:44 PM 15 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: Balanced Clock Tree, Clock Balancing, Clock Latency, Clock Skew, Clock
Tree, Clock Tree Synthesis



MARCH 02, 2017


SIMULTANEOUS SETUP-HOLD CRITICAL NODE


I've got this question multiple times- How do we fix timing violations on paths
that have at least one node which is both setup critical and hold critical
simultaneously. To answer that question, one must realize that (generally
speaking) for the same PVT and same RC corner, there cannot be paths where all
nodes are simultaneously setup and hold critical. 


Let's take an example:

Test Case



Now, if we buffer at node C, path from B to C which was already setup critical
will start violating.



Buffering at C



If we buffer at Node A, the path from A to D which was already setup critical
would start violating.



What shall we do here now? Any suggestions? Thoughts? I'd like to hear from you
and I'll post the right answer (at least one of the right answers soon!). Just
like always, looking forward to engage in the comments section below. 

Posted by Naman at 8:09 PM 14 comments:
Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest
Labels: Simultaneous Setup Hold Critical, Timing Optimization, Timing Violation

Older Posts Home

Subscribe to: Posts (Atom)


PAGES

 * Home
 * About Me
 * Computer Architecture
 * Physical Design
 * Low Power Methodology
 * SoC Design
 * Puzzles
 * DFT
 * Tcl / Perl
 * Post Your Query




BLOG ARCHIVE

 * ▼  2019 (1)
   * ▼  June (1)
     * IR Drop Analysis - II

 * ►  2018 (4)
   * ►  December (1)
   * ►  November (1)
   * ►  April (2)

 * ►  2017 (4)
   * ►  April (1)
   * ►  March (2)
   * ►  February (1)

 * ►  2016 (5)
   * ►  September (1)
   * ►  August (1)
   * ►  June (1)
   * ►  April (1)
   * ►  January (1)

 * ►  2015 (8)
   * ►  December (3)
   * ►  November (1)
   * ►  September (2)
   * ►  January (2)

 * ►  2014 (11)
   * ►  December (2)
   * ►  October (1)
   * ►  July (2)
   * ►  June (1)
   * ►  May (1)
   * ►  January (4)

 * ►  2013 (27)
   * ►  August (1)
   * ►  July (2)
   * ►  June (3)
   * ►  May (2)
   * ►  April (5)
   * ►  March (8)
   * ►  February (3)
   * ►  January (3)

 * ►  2012 (18)
   * ►  August (6)
   * ►  July (10)
   * ►  June (2)




POPULAR POSTS

 * PVTs and How They Impact Timing
   PVT is acronym for Process-Voltage-Temperature . PVTs model variations in
   Process, Voltage and Temperature. There's other term O...
   
 * Inverter vs Buffer Based Clock Tree
   A buffer is nothing but two inverters connected back to back. Does it make
   any difference if the CTS (Clock Tree Synthesis) is done using ...
   
 * IR Drop Analysis
   Just yesterday, I got a question from one of our readers Lakshman
   Yandapalli.  I thought it would be nice to write a blog post for you al...
   
 * Clock Skew: Implication on Timing
   Clock Skew is an important parameter that greatly influences the timing
   checks and you would often find the backend design engineers alway...
   
 * OCV v/s AOCV
   When I had started my career around 6 years back, we were introduced to the
   term called OCV. While the OCV concept was quite simple and fa...
   
 * Timing Analysis: Graph Based v/s Path Based
   Hello folks! In this post, I'm gonna talk about the difference between two
   commonly used Static Timing Analysis methodologies, namely-...
   
 * Power Domain Crossings
   With all the fuss about low power designs, the implementation of multiple
   power domains has gained significant traction in the past decade...
   
 * Clock Gating Integrated Cell
   In the post, Clock Gating , we discussed the need for Clock Gating for Low
   Power Design Implementation. Clock being the highest frequency ...
   
 * IR Drop Analysis - II
   A couple of years back, I wrote about IR Drop Analysis in one of my earlier
   posts. Fortunately. I got to work on IR Drop Analysis more ext...
   
 * Clock Gating
   Clock signal is the highest frequency toggling signal in any SoC. As we
   discussed in the post:  Need for Low-Power Design Methodology , th...
   




.





SUBSCRIBE TO THE BLOG FACEBOOK PAGE







FOLLOWERS




LABELS

 * Basic Digital Electronics (3)
 * Physical Design (8)
 * Puzzles (12)
 * PVT (4)
 * Tcl (2)
 * Timing Arcs (1)




TOTAL PAGEVIEWS

044137221328451554660748843935103411581251137014100155016171752182219322027214022302312241425262644274328532941



1,726,001



SUBSCRIBE TO

Posts
Atom

Posts

All Comments
Atom

All Comments





AUTHOR

 * Naman
 * Unknown
 * Unknown





Simple theme. Theme images by enot-poloskun. Powered by Blogger.



Diese Website verwendet Cookies von Google, um Dienste anzubieten und Zugriffe
zu analysieren. Deine IP-Adresse und dein User-Agent werden zusammen mit
Messwerten zur Leistung und Sicherheit für Google freigegeben. So können
Nutzungsstatistiken generiert, Missbrauchsfälle erkannt und behoben und die
Qualität des Dienstes gewährleistet werden.Weitere InformationenOk