本帖最后由 liujing1232 于 2021-5-30 11:09 编辑
之前使用的都是Xilinx官方自带的训练好的模型,局限性较大,因此这里训练自定义的数据集,选用的例程同样是人脸识别,选用的算法是YOLOV3-tiny版本。由于自己也不擅长C++,因此这里仅使用Python版本。 1.训练
采用的算法是YOLOV3,代码见下: https://github.com/david8862/keras-YOLOv3-model-set 该代码是github开源代码,本帖子只是交流学习使用,侵删。这个YOLOV3代码非常牛逼,此前也找过很多YOLOV3相关的代码,这个代码的功能性最强,训练得到的精度也是目前找到的最好的,最厉害的是里面不仅仅只有YOLOV3,还有多种网络组成的算法,例如YOLO3-MobileNet组成的神经网络,精度非常高,但是网络体积非常小,可惜在DPU上不是很友好,MobileNet网络需要训练中优化来着,比较复杂,也没深入探究。 下面是人脸识别的数据集网址: 按照github里面的指令即可训练得到YOLOV3模型,测试的图片结果如下:
训练得到的模型精度较好。 2.测试
利用xilinx的编译工具以及armlinux交叉编译工具把上面训练得到的模型bian以为动态链接库,也就是: - <p class="MsoNormal" style="text-indent:18.0pt"><span lang="EN-US" style="font-size:9.0pt">aarch64-xilinx-linux-gcc --sysroot=sysroots/aarch64-xilinx-linux
- \<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:18.0pt"><span lang="EN-US" style="font-size:9.0pt"> -fPIC -shared dpu_${MODEL_NAME}.elf -o
- libdpumodel${MODEL_NAME}.so<o:p></o:p></span></p>
复制代码
会比C++版本多一个步骤,C++的只需要得到可执行文件(.elf)即可,但是Python调用还需要编译得到动态链接库,然后在Python程序中调用。 相关python源码如下: - <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">n2cube.dpuOpen()<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">""" Create DPU Kernels for
- tf_yolov3_voc """<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">kernel = n2cube.dpuLoadKernel(args.kernel_conv)<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">""" Create DPU Tasks for
- tf_yolov3_voc """<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">task = n2cube.dpuCreateTask(kernel, 0) # 1 =
- T_MODE_PROF; 0 = normal;<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">#task = n2cube.dpuEnableTaskProfile(task)<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">time_start=time.process_time() #Running time
- calculate (start)<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">"""Load image to
- DPU"""<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">print("Loading picture from image
- folder({})...".format(args.image_path))<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">image = cv2.imread(args.image_path) <o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">image_size = image.shape[:2]#</span><span style="font-size: 9pt;">前两维</span><span lang="EN-US" style="font-size:9.0pt"><o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">image_data = pre_process(image,
- (args.input_size, args.input_size))#</span><span style="font-size: 9pt;">改变输入的尺寸,并扩展为</span><span lang="EN-US" style="font-size:9.0pt">[1,416,416,3]<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">image_data =
- np.array(image_data,dtype=np.float32)#</span><span style="font-size: 9pt;">转换为</span><span lang="EN-US" style="font-size:9.0pt">array</span><span style="font-size: 9pt;">类型的,</span><span lang="EN-US" style="font-size:9.0pt">float</span><span style="font-size: 9pt;">类型</span><span lang="EN-US" style="font-size:9.0pt"><o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">input_len =
- n2cube.dpuGetInputTensorSize(task, args.input_node)#</span><span style="font-size: 9pt;">获取输入</span><span lang="EN-US" style="font-size:9.0pt">INPUT_NODE</span><span style="font-size: 9pt;">的长度</span><span lang="EN-US" style="font-size:9.0pt"><o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">conv_time_start = time.process_time() #
- <------------- Start Convolution Time Recording<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">"""</span><span style="font-size: 9pt;">将存储在</span><span lang="EN-US" style="font-size:9.0pt">CPU</span><span style="font-size: 9pt;">中的图像数据放到</span><span lang="EN-US" style="font-size:9.0pt">DPU</span><span style="font-size: 9pt;">的输入</span><span lang="EN-US" style="font-size:9.0pt">tensor</span><span style="font-size: 9pt;">中,数据长度应该为</span><span lang="EN-US" style="font-size:9.0pt">416*416*3"""<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">n2cube.dpuSetInputTensorInHWCFP32(task,args.input_node,image_data,input_len)<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">"""Model run on
- DPU"""<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">n2cube.dpuRunTask(task)<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">conv_sbbox_size =
- n2cube.dpuGetOutputTensorSize(task, args.output_node0)#</span><span style="font-size: 9pt;">小</span><span lang="EN-US" style="font-size:9.0pt">box</span><span style="font-size: 9pt;">的输出</span><span lang="EN-US" style="font-size:9.0pt">size</span><span style="font-size: 9pt;">,应该是</span><span lang="EN-US" style="font-size:9.0pt">13*13*6<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">conv_out1 =
- n2cube.dpuGetOutputTensorInHWCFP32(task, args.output_node0, conv_sbbox_size)#</span><span style="font-size: 9pt;">获取输出</span><span lang="EN-US" style="font-size:9.0pt">0</span><span style="font-size: 9pt;">的张量</span><span lang="EN-US" style="font-size:9.0pt"><o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">conv_out1 = np.reshape(conv_out1, (1, 13,
- 13, 3,5+args.num_classes))#</span><span style="font-size: 9pt;">改变维度,扩展一维,变为张量</span><span lang="EN-US" style="font-size:9.0pt"><o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">conv_mbbox_size =
- n2cube.dpuGetOutputTensorSize(task, args.output_node1)#</span><span style="font-size: 9pt;">中</span><span lang="EN-US" style="font-size:9.0pt">box</span><span style="font-size: 9pt;">的输出</span><span lang="EN-US" style="font-size:9.0pt">size</span><span style="font-size: 9pt;">,应该是</span><span lang="EN-US" style="font-size:9.0pt">26*26*6<o:p></o:p></span></p>
- <p class="MsoNormal" style="text-indent:0cm;mso-char-indent-count:0"><span lang="EN-US" style="font-size:9.0pt">conv_out2 =
- n2cube.dpuGetOutputTensorInHWCFP32(task, args.output_node1, conv_mbbox_size)<o:p></o:p></span></p>
- <span lang="EN-US" style="font-size:9.0pt;font-family:"Times New Roman",serif;
- mso-fareast-font-family:宋体;mso-bidi-theme-font:minor-bidi;mso-ansi-language:
- EN-US;mso-fareast-language:ZH-CN;mso-bidi-language:AR-SA">conv_out2 =
- np.reshape(conv_out2, (1, 26, 26, 3,5+args.num_classes))</span>
复制代码
代码还能优化,采取多线程应该可以使识别速度更快,然而我不擅长,有擅长的可以与我交流一下。 在该段程序中调用动态链接库,并将其使用DPU进行加速。 识别效果如下: 视频测试效果:
PS:发热严重,完全不敢长时间使用。
|