






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
These are the Lecture Slides of Program Optimization for Multi Core Architectures which includes Triangular Lower Limits, Multiple Loop Limits, Dependence System Solvers, Single Equation, Simple Test, Extreme Value Test etc.Key important points are: Cycle Shrinking, Distance Varying Loops, Loop Peeling, Index Set Splitting, Loop Fusion, Loop Fission, Loop Reversal
Typology: Slides
1 / 12
This page cannot be seen from the preview
Don't miss anything!
Dependence cycle with distance > 1 Transform a serial loop into two nested loops (outer serial and inner parallel) Consider the loop
for I = 1,n A[i+k] = B[i] – B[i+k] = A[i] + C[i] endfor
for i=1, n, k forall j=1, i+k- A[j+k]=B[j]- B[j+k]=A[j]+C[j] endforall endfor
For I = 3,n A[i]=B[i-2]- B[i]=A[i-3]*k Endfor
For j =3, n, 2 forall I = j, j+ A[i]=B[i-2]- B[i]=A[i-3]*k endforall Endfor
B3 = A0 * k A4 = B2 - B4 = A1 * k A5 = B3 - B5 = A2 * k A6 = B4 - B6 = A3 * k A7 = B5 - B7 = A4 * k A8 = B6 - B8 = A5 * k
The distance may not be constant Cycle may be reduced by the minimum distance
For I = 1,n X[i]=Y[i]+Z[i] Y[i+3]=X[i-4]*W[i] Endfor
for j=1, n, 3 forall I = j, j+ X[i]=Y[i]+Z[i] Y[i+3]=X[i-4]*W[i] endforall endfor
When two adjacent countable loops have the same loop limits they can sometimes be fused Reduces cost of test and branch Fusing loops which refer to the same data enhances temporal locality It has significant impact on cache and virtual memory performance Loop fusion may increase size of the loop which can reduce instruction locality (noticeable with very small cache memories) Fusion is legal if all the dependence relations are preserved Before fusion all relations must flow from body1 to body2 (unless carried by an outer loop)
For I = 1,n A[i]=B[i]+ Endfor For I = 1,n C[i]=A[i]/ Endfor For I = 1,n D[i]=1/C[i+1] Endfor S2 S S5 S
For I = 1,n A[i]=B[i]+ C[i]=A[i]/ D[i]=1/C[i+1] Endfor
after fusion the second dependence is violated
For I = 1,n A[i]=B[i]+ C[i]=A[i]/ Endfor For I = 1,n D[i]=1/C[i+1] Endfor
for I = 1, A[i]=B[i]+ Endfor for I = 1, C[i]=A[i+1]* Endfor
for I = 2, A[i]=B[i]+ Endfor for I = 1, C[i]=A[i+1]* 2 Endfor
for j = 0, A[j+2]=B[j+2]+ C[j+1]=A[j+2]* Endfor
A single loop may be broken into smaller loops (inverse of loop fusion) Used on machines which have very small instruction cache Improves memory locality Construct a statement level dependence graph of the body of the loop Dependence relations carried by outer loop need not be preserved Inner loops are treated as single nodes If there are no cycles then loop fission can divide the loop into separate loops around each node The loops are ordered in topological order of the dependence graph
For I = 1,n A[i] = A[i] + B[i-1] B[i] = C[i-1]*x + y C[i] = 1/B[i] D[i] = sqrt(C[i]) endfor
For ib = 0,n- B[ib+1] = C[ib]*x + y C[ib+1] = 1/B[ib+1] Endfor For ib = 0,n- A[ib+1] = A[ib+1] + B[ib] Endfor For ib = 0,n- D[ib+1] = sqrt(C[i]) Endfor I = n+
Compiler can decide to run a loop backward Always legal for parallel loops Illegal for sequential loop if it has loop carried dependence Allows loop fusion to proceed where it might otherwise fail
for I = 1,n A[i]=B[i]+ C[i]=A[i]/ endfor for i=1,n D[i]=1/C[i+1] endfor
for i=n downto 1 A[i]=B[i]+ C[i]=A[i]/ D[i]=1/C[i+1] endfor
Using normalized iteration vectors the shape of the iteration space changes as shown below The dependence distance are (0,1) and (1,-1) This dependence prevents loop interchange
If normalization can prevent interchange then un-normalization can enable loop interchange This is called loop skewing Skewing changes the iteration vector of each iteration by adding the outer loop index value to the inner loop index (I,j) becomes (I, j+i) A dependence relation from (i1, j1) to (i2, j2) will have distance (i1, j1) –(i2-j2) = (d1, d2) After skewing the distance will change to (i1, j1+i1) –(i2, j2+i2) = (d1, d2+d1) In general loops can be skewed by a factor changing iteration label from (I,j) to (I, j+fi) This changes distance from (d1, d2) to (d1, d2+fd1) F can also be negative Choosing whether to skew and the factor by which to skew depends upon the goal to enable other transformations
Interchange following loop using skewing for I = 2, n for j = 2, m A[I,j] = 0.5 * (A[i-1, j-1]+A[i-1, j+1]) endfor Endfor The two dependence distances are (1,1) and (1,-1) The second one prevents the interchange Skewing the loop would change the dependence distance to (1,2) and (1,0) allowing the interchange The compiler must generate the correct limits using FM method
For js = 2, n+m- for is = max(0, js-m+2), min(n-2, js) I = is+ j = js-is+ A[I,j] = 0.5 * (A[i-1, j-1] + A[i-1, j+1]) endfor endfor
Creates doubly nested loops out of single loops Organizes computation into chunks of approximately equal sizes Used to overcome size limitations of caches and local memory
for I = 1,n A[i] = B[i] + C[i] endfor
for j = 1, n, k for I = j, min(j+k, n) A[i]=B[i]+C[i] endfor endfor
for I = 1, 16 A[i+3]=A[i]+B[i] endfor
for It = 1, 16, 5 for i=it, min(16, it+4) A[i+3]=A[i]+B[i] endfor endfor
file:///D|/...ry,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture%2036/36_10.htm[6/14/2012 12:13:25 PM]
Interchange jt loop with I loop Compiler finds the new lower limits for jt For It = -15, 45, 20 for jt = max(-15,it), 45, 20 for i = max(I,it), min(50,it+19) for j=max(I,jt), min(60, jt+19) A[I,j] = A[I,j]+ endfor endfor endfor endfor
A variation of loop skewing Skew the inner loop iterations such that they wrap around a cylinder The shape of the iteration space does not change but the relative positions change Backward dependencies with large distances make tiling unprofitable Circular loop skewing shortens backward dependencies
For I = 0, n- for j = 0, n- A[i] = A[i] + B[i] * C[i] endfor endfor
The circular loop skewing does not change the shape of the iteration space. It changes the iterations computed at each point.