查看: 1875|回复: 0

[经验] C6000优化inline举例

[复制链接]

该用户从未签到

发表于 2021-3-2 09:29:56 | 显示全部楼层 |阅读模式
分享到:
C6X优化inline举例:
1、源程序:

  1. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">for (i = LO_CHAN; i <= HI_CHAN; i++)</span>
  2. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">{</span>

  3. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        norm_shift = norm_l(st->ch_noise[i]);</span>
  4. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp = L_shl(st->ch_noise[i], norm_shift);</span>

  5. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        norm_shift1 = norm_l(st->ch_enrg[i]);</span>
  6. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp3 = L_shl1(st->ch_enrg[i], norm_shift1 - 1);</span>

  7. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = L_divide(Ltmp3, Ltmp);</span>
  8. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = L_shr(Ltmp2, 27 - 1 + norm_shift1 - norm_shift);  // * scaled as 27,4 *</span>

  9. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        if (Ltmp2 == 0)</span>
  10. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">                Ltmp2 = 1;</span>

  11. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp1 = fnLog10(Ltmp2);</span>
  12. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp3 = L_add(Ltmp1, LOG_OFFSET - 80807124);  // * -round(log10(2^4)*2^26 *</span>
  13. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = L_mult(TEN_S5_10, extract_h(Ltmp3));</span>
  14. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        if (Ltmp2 < 0)</span>
  15. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">                Ltmp2 = 0;</span>
  16. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        // * 0.1875 scaled as 10,21 *</span>
  17. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp1 = L_add(Ltmp2, CONST_0_1875_S10_21);</span>
  18. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        // * tmp / 0.375 2.667 scaled as 5,10, Ltmp is scaled 15,16 *</span>
  19. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp = L_mult(extract_h(Ltmp1), CONST_2_667_S5_10);</span>
  20. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        ch_snr[i] = extract_h(Ltmp);</span>
  21. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">}</span>
复制代码

2、优化后程序:
  1. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">//因循环体太大,拆成两个循环并把相应的函数内嵌以使程序能pipeline,</span>
  2. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">//用L_div_tmp[]保存因拆分而产生的中间变量。</span>
  3. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">for (i = LO_CHAN; i <= HI_CHAN; i++)</span>
  4. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">{</span>
  5. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //norm_shift = norm_l(st->ch_noise[i]);</span>
  6. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        norm_shift = _norm(st->ch_noise[i]);</span>
  7. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp = _sshl(st->ch_noise[i], norm_shift);</span>

  8. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //norm_shift1 = norm_l(st->ch_enrg[i]);  </span>
  9. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        norm_shift1 = _norm(st->ch_enrg[i]);</span>
  10. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //Ltmp3 = L_shl1(st->ch_enrg[i], norm_shift1 - 1);</span>
  11. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        LLtmp1 = st->ch_enrg[i];</span>
  12. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        LLtmp1 = LLtmp1 << (norm_shift1 + 7);</span>
  13. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp3 = (Word32)(LLtmp1 >> 8);</span>

  14. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = IL_divide(Ltmp3, Ltmp);</span>
  15. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //Ltmp2 = L_shr(Ltmp2, 27 - 1 + norm_shift1 - norm_shift);  </span>
  16. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = (Ltmp2 >> (27 - 1 + norm_shift1 - norm_shift));</span>

  17. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        if (Ltmp2 == 0)</span>
  18. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">                Ltmp2 = 1;</span>
  19. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        L_div_tmp[i] = Ltmp2;</span>
  20. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">}</span>
  21. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">for (i = LO_CHAN; i <= HI_CHAN; i++)</span>
  22. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">{</span>
  23. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = L_div_tmp[i];</span>
  24. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp1 = IfnLog10(Ltmp2);</span>
  25. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //Ltmp3 = L_add(Ltmp1, LOG_OFFSET - 80807124);  </span>
  26. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp3 = _sadd(Ltmp1, LOG_OFFSET - 80807124);</span>
  27. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //Ltmp2 = L_mult(TEN_S5_10, extract_h(Ltmp3));</span>
  28. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp2 = _smpy(TEN_S5_10, (Ltmp3 >> 16));</span>
  29. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        if (Ltmp2 < 0)</span>
  30. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">                Ltmp2 = 0;</span>

  31. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp1 = _sadd(Ltmp2, CONST_0_1875_S10_21);</span>

  32. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //Ltmp = L_mult(extract_h(Ltmp1), CONST_2_667_S5_10);</span>
  33. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        Ltmp = _smpy((Ltmp1 >> 16), CONST_2_667_S5_10);</span>
  34. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        //ch_snr[i] = extract_h(Ltmp);</span>
  35. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">        ch_snr[i] = (Ltmp >> 16);</span>
  36. <span style="color: rgb(51, 51, 51); font-family: " microsoft="" yahei";"="">}</span>
复制代码

3、优化说明
    观察上面这个循环,循环体本身比较大,且含有两个函数L_divide()和fnLog10(),而C62内部只有32个寄存器,且有些寄存器是系统用的,如B14、B15这样循环体太大将会导致寄存器不够分配,从而导致系统编译器无法实现循环的pipeline。

为了实现循环的pipeline。我们需要把循环体进行拆分,拆分时要考虑以下几点:
  (1)拆分成几个循环比较合适?在各个循环能pipeline的前提下,拆开的循环个数越少越好。这就要求尽可能让各个循环的运算量接近。
  (2)考虑在什么地方把程序拆开比较合适?循环体里的数据流往往并不是单一的,在拆开的断点处势必要用中间变量保存上次的循环运算结果,供以后的循环用。适当的拆开循环体,使所需的中间变量越少越好。
  (3)循环体中的函数调用必须定义成内嵌形式,含有函数调用的循环系统是无法使之pipeline的;各个循环体中的判断分支机构不可太多,否则系统也无法使之pipeline,为此应近可能把可以确定下来的分支确定下来,并尽可能用内嵌指令。  

针对上面这个例子,考虑:
  (1)为让各个循环的运算量大致相当,应把L_divide()和fnLog10()分到两个循环中去,从循环体大小上考虑,估计拆成两个循环比较合适。
  (2)考虑在什么地方把程序拆开比较合适?在if (Ltmp2 == 0) Ltmp2 = 1;后拆开,因为后面用到的数据只有Ltmp2,故只需用一个数组保存每次循环的Ltmp2值即可。
  (3)循环体中的两处函数调用L_divide()和fnLog10()都定义了其内嵌形式,IL_divid()和IfnLog10()。当把可以确定下来的分支作确定处理,并尽可能用内嵌指令后,该循环体中所剩的分支结构已很少,循环体可以pipeline。
  优化前程序用2676 cycle,优化后用400 cycle。优化后两个子循环的MII分别为14和6cycle。


内存地址形式: 奔腾,C6000都是32位计算机,字长32,但内存地址都是按字节组织的 一个字4字节(查看内存时候各个字
时候:例如两个连续字ox1000 ox1004) 写汇编程序时候,下一个字也需要+4,但写 C语言时候,int 型,+1就是加4但是,在Tiger SHARC中,虽然也是32位机,但内存是地址是按字组织的,查看内存时,连续的字地址相差1
回复

使用道具 举报

您需要登录后才可以回帖 注册/登录

本版积分规则

关闭

站长推荐上一条 /4 下一条



手机版|小黑屋|与非网

GMT+8, 2024-11-25 02:02 , Processed in 0.114217 second(s), 15 queries , MemCache On.

ICP经营许可证 苏B2-20140176  苏ICP备14012660号-2   苏州灵动帧格网络科技有限公司 版权所有.

苏公网安备 32059002001037号

Powered by Discuz! X3.4

Copyright © 2001-2024, Tencent Cloud.