从电路到Verilog | 数字电路设计：有理论、有电路、有代码“三位一体”

书接上文，前面给大家介绍了时序逻辑电路的基本知识和代码写法。今天的讲座更精彩，讲数字电路设计的三种常用模式，有理论、有电路、有代码“三位一体”。

1. 并行运转，灵活运用
在数字逻辑系统设计里面有三大实用的逻辑：并行化、流水线和时分复用。这些方法在《玩转 IP core》/《IP 核芯志》里面都有详细介绍，下面不在细说理论，仅仅关注与代码设计。

先用一个生活里面的例子，来说明一下“并行化”的思想。

盛夏，知了在树梢头不停名叫，日天吃西瓜是一种享受。可惜，小明不仅吃不到西瓜，还得去搬西瓜。事情是这样的，小明他爹大明到批发市场卖了 30 个西瓜，准备回去吃。可是呢，突然公司有急事，不能亲自搬回家了。于是万能的小明被委派来把这些西瓜搬回家。小明啊，还是小学生呢，力气不大，一次只能搬一只西瓜，这个情节设计是比较符合场景的。从市场把一只西瓜搬回家要 10 分钟时间，也就是说如果小明一个人搬的话，总共需要 300 分钟，也就是 5 个小时。这个工作量也太大了。

小明可是会匀速运动的聪明娃，那是掐指一算计上心来。他招呼来了自家兄弟小亮、小光，还是好朋友小黑、小暗，还有女朋友小红，大家一起搬。所谓人多好办事，这样算计下来只要不到 1 个小时就完事了。这个是基本的想法，下来明确几个细节。

但是这个西瓜如何搬呢，自然不能都切开搬了。脑袋没问题的人都知道，一个一个地搬运最靠谱。然后，搬西瓜有顺序吗？自然没有，理论上没说不搬地三号西瓜，第四号西瓜绝对不准动。可是考虑实际中西瓜被堆成一堆的情况，还是需要按照从外到里、从上到下的顺序搬运的。最后，大家怎么一个操作顺序最合适？你搬完，我再来？兄弟，忘了充值 IQ 卡了吧？当然是大家并行来搬最好。基本步骤设计阶段结束了，下来加大难度考虑西瓜堆附近和小明家里的细节。

西瓜堆和小明家里都有一个问题：地方小，场地容不下 6 个人同时存在。那怎么办呢？简单大家排队取，现在规定按照小明、小亮、小光、小黑、小暗，最后小红的顺序，大家依次搬西瓜。每个人操作间隔是 1 分钟。

终于不存在搬西瓜的任何障碍了，这个搬西瓜的流程设计就是采用了“并行化”的思想。

抽象模型总给人虚无缥缈的感觉，初学者一定需要例子来看看到底如何实现。好吧，这就满足你们的需求，给大伙儿一个工程化的例子。

一个简化的不考虑溢出的累加器，如图 4 所示。系统需求明确输入信号为 7 比特，每输入 128 个数据需要输出一次累加结果。那么为了保险起见，输出结果需要 14 个比特。

图 4 待“并行化”的累加器

假设分成 4 个分路来处理，那么每个处理里面的加法器仅需 12 比特位宽即可。这个是计划外的好处，一般的并行化是没有的。

图 5 是采用编码方式的输入分发模块以及随时计算的合并输出输出模块的系统整体的结构图，不同模块用虚线框出。这是大伙儿第一次遇到不同模块组合而成的系统，请稍稍注意自己好好分析一下。

图 5 并行化累加器的结构图 1：定时器数值控制采样相位

对应图 5 的代码如例 3 所示。这个是大伙儿遇到的第一个这种设计模式的代码，所以多浪费了一些纸张，拷贝的比较详细，请大家别怪贫道浪费。

【例 3】并行化累加器的代码：定时器数值控制采样相位
module sum_parallel_timer
(
    input[7:0] input_data,
    input data_start,
    input CLK, input RST,
    output reg[16:0] sum,
    output reg sum_enable
);

//Definition for Variables in the module
reg[7:0] count;
//For 7 bits timer
reg[2:0] count1;
//For 2 bits timer

reg[7:0] data1, data2, data3, data4;
//Sample for 4 parallel operation modules
reg [14:0] sum1, sum2,sum3, sum4;
//Result (sum) for 4 parallel operation modules

//Load other module(s)

//Logical

always @(posedge CLK or negedge RST)
//7 bits timer
begin
    if (!RST)
    //Reset
    begin
        count <= 8'h00;
    end
    else if(data_start)
    begin
        count <= 8'h7f + 8'h04;
        //Constant 4 is for 4 clock delay for last 4 data input
        //Constant 7f for 128 total data
    end
    else if (count != 8'h00)
    begin
        count <= count - 8'h01;
    end
    else
    begin
        count <= 8'h00;
    end
end

always @(posedge CLK or negedge RST)
//2 bits timer
begin
    if (!RST)
    //Reset
    begin
        count1 <= 3'b000;
    end
    else if(count == 8'h01)
    begin
        count1 <= 3'b111;
    end
    else if (count1 != 3'b000)
    begin
        count1 <= count1 - 3'b001;
    end
end

always @(posedge CLK or negedge RST)
//Sample for each operation
begin
    if (!RST)
    //Reset
    begin
        data1 <= 8'h00;
        data2 <= 8'h00;
        data3 <= 8'h00;
        data4 <= 8'h00;
    end
    else if(count != 8'h00)
    begin
        case (count[1:0])
         2'b11: data1 <= input_data;
         2'b10: data2 <= input_data;
         2'b01: data3 <= input_data;
         2'b00: data4 <= input_data;
        endcase
    end
    else
    begin
        data1 <= 8'h00;
        data2 <= 8'h00;
        data3 <= 8'h00;
        data4 <= 8'h00;
    end
end

always @(posedge CLK or negedge RST)
//Parallel sum for each operation
begin
    if (!RST)
    //Reset
    begin
        sum1 <= 15'h0000;
        sum2 <= 15'h0000;
        sum3 <= 15'h0000;
        sum4 <= 15'h0000;
    end
    else
    begin
        case (count[1:0])
         2'b11: sum1 <= sum1 + {7'h00, data1};
         2'b10: sum2 <= sum2 + {7'h00, data2};
         2'b01: sum3 <= sum3 + {7'h00, data3};
         2'b00: sum4 <= sum4 + {7'h00, data4};
        endcase
    end
end

always @(posedge CLK or negedge RST)
//Sum
begin
    if (!RST)
    //Reset
    begin
        sum <= 17'h0_0000;
    end
    else
    begin
        sum <= {2'h0,sum1} + {2'h0,sum2} + {2'h0,sum3} + {2'h0,sum4};
    end
end

always @(posedge CLK or negedge RST)
//Sum
begin
    if (!RST)
    //Reset
    begin
        sum_enable <= 1'b0;
    end
    else if (count1 == 3'b001)
    begin
        sum_enable <= 1'b1;
    end
    else
    begin
        sum_enable <= 1'b0;
    end
end
endmodule

2. 流水哗哗，奔腾不息
话接上文，还是搬西瓜。前面那讲的设计就是没人情味的家伙搞的，看看：要求系统里面人人都是埋头跑圈。个人连面都见不到，更别说聊天了。这是典型的“反人类”，把人当机器。“以人为本”对于很多公司都是说说说的，但是在老朽这里确实必须坚持的原则。

本人的系统设计是，搬运人员是小明等六人，就把把路程分为 6 段。没人负责一段的运输，到了交接点把西瓜面对面交给下一个人。当然除了交接西瓜，两人聊聊也是不错的。这样传递下去，直到到达小明家为止。这样还有一个附加的好处：哪里有跌破的瓜，是谁打破的一目了然。老夫这个系统里面，打破的地点属于谁的一亩三分地，就该谁负责。

这里就用乘法器作为例子来说明流水线模式的代码设计。利用乘法的交换律和结合律，可以把乘法转化成了 b 的位宽个加法运算；第 k 次加法的加数是 b 的第 n 比特（bk）与 a 左移 k 位（a<<k）的乘积；这里面的乘积呢，又可以根据 bk 的两个可能的数值，通过选择器实现。这般如此，如此这般……，可以得到了乘法器的流水线结构，如图 6 所示。

图 6 流水线形式乘法器的结构图

流水线形式的乘法器的代码如例 4 所示。穆老道采用的是横着写的方式，和尚为了显示区别，这次竖着来。这里只给出了关键部分的代码。

【例 4】流水线形式乘法器的代码（部分）
module multiplication_pipeline
//Multiplication in pipeline
(
    input[7:0] a,
    input[7:0] b,
    input CLK, input RST,
    output[15:0] product
);

//Definition for Variables in the module
wire[7:0] a0, a1, a2, a3, a4, a5, a6, a7;
//In a's delay chain
wire[7:0] b0;
wire[6:0] b1;
wire[5:0] b2;
wire[4:0] b3;
wire[3:0] b4;
wire[2:0] b5;
wire[1:0] b6;
wire b7;
//In b's delay chain
wire [7:0] result1;
wire [9:0] result2;
wire [10:0] result3;
wire [11:0] result4;
wire [12:0] result5;
wire [13:0] result6;
wire [14:0] result7;
wire [15:0] result8;
//In Result's Chain

//Load other module(s)
mul_pipe_step1 M1(.a_prev(a0),.b_prev(b0),
                  .CLK(CLK), .RST(RST),
                  .a_next(a1),.b_next(b1),.result_next(result1));

mul_pipe_step2 M2(.a_prev(a1),.b_prev(b1),.result_prev(result1),
                  .CLK(CLK), .RST(RST),
                  .a_next(a2),.b_next(b2),.result_next(result2));

……

mul_pipe_step8 M8(.a_prev(a7),.b_prev(b7),.result_prev(result7),
                  .CLK(CLK), .RST(RST),
                  .result_next(result8));
//Pipeline chain

//Logical
assign a0 = a;
assign b0 = b;
assign product = result8;

endmodule

module mul_pipe_step1
//Step1 in pipeline
(
    input[7:0] a_prev,
    input[7:0] b_prev,
// input result_prev,
    input CLK, input RST,
    output reg[7:0] a_next,
    output reg[6:0] b_next,
    output reg[7:0] result_next
);

//Definition for Variables in the module

//Load other module(s)

//Logical

always @(posedge CLK or negedge RST)
//Staoring for input delay
begin
    if (!RST)
    //Reset
    begin
        a_next <= 8'h00;
        b_next <= 7'h00;
    end
    else
    begin
        a_next <= a_prev;
        b_next <= b_prev[7:1];
    end
end

always @(posedge CLK or negedge RST)
//Staoring for input delay
begin
    if (!RST)
    //Reset
    begin
        result_next <= 9'h000;
    end
    else
    begin
        if (b_prev[0])
        begin
            result_next <= a_prev;
        end
        else
        begin
            result_next <= 8'h00;
        end
    end
end

endmodule

module mul_pipe_step2
//Step2 in pipeline
(
    input[7:0] a_prev,
    input[6:0] b_prev,
    input[7:0] result_prev,
    input CLK, input RST,
    output reg[7:0] a_next,
    output reg[5:0] b_next,
    output reg[9:0] result_next
);

//Definition for Variables in the module

//Load other module(s)

//Logical

always @(posedge CLK or negedge RST)
//Staoring for input delay
begin
    if (!RST)
    //Reset
    begin
        a_next <= 8'h00;
        b_next <= 6'h00;
    end
    else
    begin
        a_next <= a_prev;
        b_next <= b_prev[6:1];
    end
end

always @(posedge CLK or negedge RST)
//Staoring for input delay
begin
    if (!RST)
    //Reset
    begin
        result_next <= 10'h000;
    end
    else
    begin
        if (b_prev[0])
        begin
            result_next <={2'b0, result_prev} + {1'b0, a_prev, 1'b0};
            // = result + (a << 2)
            //max bit width 10
        end
        else
        begin
            result_next <= {1'b0, result_prev};
        end
    end
end

endmodule

…….

module mul_pipe_step8
//Step7 in pipeline
(
    input[7:0] a_prev,
    input b_prev,
    input[14:0] result_prev,
    input CLK, input RST,
// output reg[7:0] a_next,
// output reg b_next,
    output reg[15:0] result_next
);

//Definition for Variables in the module

//Load other module(s)

//Logical

always @(posedge CLK or negedge RST)
//Staoring for input delay
begin
    if (!RST)
    //Reset
    begin
        result_next <= 16'h0000;
    end
    else
    begin
        if (b_prev)
        begin
            result_next <={1'b0, result_prev} + {1'b0, a_prev, 7'b000_0000};
            // = result + (a << 1)
        end
        else
        begin
            result_next <= {1'b0, result_prev};
        end
    end
end

endmodule

3. 时分复用，节约成本
前面给大伙儿介绍了两种设计模式，目的都是为了适应更加快速数据输入速度的要求。通过这两个设计模式的架构图的分析，不难看出要想快是要付出代价的。但是老板却不这样想，他们的想法是：既要快又要省。这个老衲做不到啊！但是，如果只是节约点成本，鄙人还有办法，那就是这一讲里面介绍的时分复用。时分复用需要在数据输入比较慢的场景下使用，这个又快不起来了。天道循环，有得必有失，即使是佛祖也不能避免。老板们，总是觉得自己是神，事实证明他们更像某种队友。

败家容易，节约难。设计者一旦掌握了“并行”处理的思想，有意无意的会喜欢用很多分枝的结构来设计数字逻辑系统。这个错是没错，但是在有些地方未免浪费。现在不号召“颗粒归仓”了。老衲是老派的人，还是喜欢能省就省的设计的。并行平铺肯定是最简单的设计了，但是这个太不体现水平了。因地制宜，因模块速度制设计，才是高手本色啊。

时分复用的基本思想是：对于一些需要处理速度比较慢、有重复运算的单元，在可以接受的处理时间内，多次重复利用有关的运算器件，以达到减少整个单元面积的目的。
说了一堆严格的非人话，总结一下是必要的。

首先，时分复用的应用的场合是“对于一些需要处理速度比较慢”，也就是需要的建立时间较长的单元。

其次，这些单元里面，必须“有重复运算的单元”。例如：在 FIR 滤波器里面存在很多相同位数的加法，就合适采用时分复用。

最后，时分复用的手段是“多次重复利用有关的运算器件”。

复数乘法就是一个很好的例子，里面有四个乘法呢！复数乘法，上过高中的施主都晓得的：

其中，real_x 表示复数 x 的实部，image_x 表示复数 x 的虚部。现在小孩子也能数出来，公式里面存在四个乘法运算，两个加 / 减法运算。这就是基本算法，上面的公式就是《算法说明书》的核心内容。

仅仅依靠算法是没法设计数字逻辑系统的，还需要输入信号的性质和输出信号的需求等信息。连上面的假设也敢做，所谓“债多了不愁，虱子多了不痒”，对于接口信号的假设没有理由不敢做了。假设，输入信号为 4 比特有符号数，取值范围为[-7, +7]，每 8 个时钟周期变换一次，连续输入；对于输出信号要求 8 比特有符号数，在 8 个时钟周期里面至少一个固定的时钟周期 ---- 而且这个固定的时钟周期是已知的 ---- 内有效。这样系统的输入输出接口基本清楚了。但是对于这种多个时钟有效的输入信号，必须要有一个同步信号标记输入开始有效的时刻。这里设计这个同步信号在输入数据有效的第一个时钟节拍为高电平，其他时刻为低电平，如图 7 所示。

图 7 数据与同步信号的时序关系

现在是“万事俱备只欠东风”的节奏了，开始排时序，如表 2。考虑到同步信号与数据同时到达，在下一个时钟周期调度用的计数器才能被清零，所以有一个 -1 时刻。

表 2 时分复用的时序表

时刻	输入选择			运算		输出分配			加减运算
时刻	输入	输出	状态名	输入	输出	输入	输出	状态名	加减运算
-1	状态计数清零
0	real_a real_b	无效	REAL_REAL	无效		无效	无效
1	image_a image_b	real_a real_b	IMAGE_IMAGE	real_a * real_b	无效	无效	无效
2	real_a image_b	image_a image_b	REAL_IMAGE	image_a * image_b	real_a * real_b	real_a * real_b	无效	REAL_1
3	image_a real_b	real_a image_b	IMAGE_ REAL	real_a * image_b	image_a * image_b	image_a * image_b	real_a* real_b	REAL_2
4		image_a real_b		image_a * real_b	real_a * image_b	real_a * image_b	image_a* image_b	IMAGE_1
5					image_a * real_b	image_a * real_b	real_a* image_b	IMAGE_2
6							image_a * real_b		有效结果输出

还好，还好，8 个时钟周期就可以得到有效结果的输出。有时候，时序排不开，可是容易失眠的。实际上，上面的时序还可以优化，但是反正满足输入数据频率要求了。优化的意义不大，这里就得过且过了。

按照表 2 总时序设计的、时分复用乘法器的复数乘法的代码见例 5。

【例 5】复数乘法模块
`define REAL_REAL 0
`define IMAGE_IMAGE 1
`define REAL_IMAGE 2
`define IMAGE_REAL 3
//Input statements

`define REAL_1      2
`define REAL_2      3
`define IMAGE_1     4
`define IMAGE_2     5
//Output statements

module complex_multiplication
//Multiplication in pipeline
(
    input signed[3:0] real_a,
    input signed[3:0] image_a,
    input signed[3:0] real_b,
    input signed[3:0] image_b,
    input CLK, input RST,
    input data_start,
    output reg signed[7:0] product_real,
    output reg signed[7:0] product_image
);

//Definition for Variables in the module
reg signed[6:0] real_1, real_2;
//Two operents for real part
reg signed[6:0] image_1, image_2;
//Two operents for image part
reg signed[3:0] a, b;
//Operents for signed multiplication
reg signed[6:0] product;
//Result for multiplication
reg[2:0] state_counter;
//Counter for scheduling statements

//Load other module(s)

//Logical
always @(posedge CLK or negedge RST)
//Multiplication to be reused
begin
    if (!RST)
    begin
        product <= 7'sh0;
    end
    else
    //Statement counting
    begin
        product <= a * b;
    end
end

always @(posedge CLK or negedge RST)
//Statement management
begin
    if (!RST)
    begin
        state_counter <= 3'h7;
    end
    else if (data_start)
    //New data and start the statement counting
    begin
        state_counter <= 3'h0;
    end
    else
    //Statement counting
    begin
        state_counter <= state_counter + 3'h1;
    end
end

//Reused multiplication part
always @(posedge CLK or negedge RST)
//Input Part
begin
    if (!RST)
    begin
        a <= 4'h0;
        b <= 4'h0;
    end
    else
    //Input Operations
    begin
        case (state_counter)
            `REAL_REAL:
            begin
                a <= real_a;
                b <= real_b;
            end
            `IMAGE_IMAGE:
            begin
                a <= image_a;
                b <= image_b;
            end
            `REAL_IMAGE:
            begin
                a <= real_a;
                b <= image_b;
            end
            `IMAGE_REAL:
            begin
                a <= image_a;
                b <= real_b;
            end
        endcase
    end
end

always @(posedge CLK or negedge RST)
//Output Part
begin
    if (!RST)
    begin
        real_1 <= 7'h0;
        real_2 <= 7'h0;
        image_1 <= 7'h0;
        image_2 <= 7'h0;
    end
    else
    //Input Operations
    begin
        case (state_counter)
            `REAL_1:
            begin
                real_1 <= product;
            end
            `REAL_2:
            begin
                real_2 <= product;
            end
            `IMAGE_1:
            begin
                image_1 <= product;
            end
            `IMAGE_2:
            begin
                image_2 <= product;
            end
        endcase
    end
end

//Adder part
always @(posedge CLK or negedge RST)
//Input Part
begin
    if (!RST)
    begin
        product_real <= 8'sh0;
        product_image <= 8'sh0;
    end
    else
    //Input Operations
    begin
        product_real <= real_1 - real_2;
        product_image <= image_1 + image_2;
    end
end
endmodule

亲们，请注意代码里面乘号“*”只能出现一次，这样是复用。如果写成
real_1 <= a * b;
real_2 <= a * b;
image_1 <= a * b;
iamge_2 <= a * b;
或者类似的样子，综合软件会毫不犹豫的给您产生四个乘法的，这样就不叫时分复用了。

这正是：
“
时序逻辑显神通，时钟节拍叮叮咚。边沿触发生意隆，信号采样中间空。
设计模式有三种，小心选择变无穷。频率面积考虑中，小心选择变网红。
”

与非网原创内容，谢绝转载！

系列汇总：

之一：温故而知新：从电路里来，到 Verilog 里去！

之二：Verilog 编程无法一蹴而就，语言层次讲究“名正则言顺”

之三：数字逻辑不容小窥，电路门一统江湖

之四：Verilog 语言：还真的是人格分裂的语言

之五：Verilog 不难学，聊聊时序逻辑那些事儿