推荐|表情识别-情感分析-人脸识别（代码+教程）

表情识别-情感分析-人脸识别（代码+教程）

25-03-08 00:22

8025

blog.csdn.net

id="article_content" class="article_content clearfix"> id="content_views" class="markdown_views prism-atom-one-dark">

表情识别

面部情绪识别（FER）是指根据面部表情识别和分类人类情绪的过程。通过分析面部特征和模式，机器可以对一个人的情绪状态作出有根据的推断。这个面部识别的子领域高度跨学科，涉及计算机视觉、机器学习和心理学等领域的知识。

inference result example of facial face emotion recognition system for happy and neutral multiple face recognition detection

应用领域

以下是一些关键领域，其中这项技术可能有所帮助：

社交和内容创作平台：

根据情感反馈个性化用户体验。
自适应学习系统根据学习者的情绪状态调整内容，提供更具针对性的建议。

医疗研究：

监测患者是否存在抑郁、焦虑或其他情绪障碍的迹象。
协助治疗师在治疗过程中跟踪患者的进展或反应。
实时监测应激水平。

驾驶员安全机制：

监测驾驶员是否出现疲劳、注意力分散、压力或昏昏欲睡的迹象。

营销和市场研究：

实时分析观众对广告的反应。
根据观众的情绪状态定制广告内容。
产品测试和反馈。

安全和监控：

在拥挤区域检测可疑或异常行为。
分析公共事件中人群的反应，以确保安全。

构建面部情绪识别系统

本节深入探讨构建面部情绪识别系统的复杂性。我们首先探讨一个专门为面部情绪识别设计的数据集，确保我们的模型有稳健的基础。接下来，我们将介绍一种自定义的VGG13模型架构，该架构以其在分类任务中的效率和准确性而闻名，并阐明其与我们情绪识别目标的相关性。

最后，我们花费一些时间在实验结果部分，提供全面的评估，阐明系统的性能指标及其潜在应用。

面部情绪识别数据集（FER +）

FER + 数据集是原始面部表情识别（FER）数据集的一个重要扩展。为了改进原始数据集的局限性，FER + 提供了更精细和细致的面部表情标签。虽然原始FER数据集将面部表情分类为六种基本情绪——快乐、悲伤、愤怒、惊讶、恐惧和厌恶——但FER + 根据这一基础更进一步引入了两个额外的类别：中性和蔑视。

Different expressions from a facial expression recognition dataset.

用于人脸检测的RFB-320

Single Shot Multibox Detector (SSD)模型在识别情绪之前，需要在输入帧中检测人脸。为此，使用了超轻量级人脸检测模型RFB-320。它是一种针对边缘计算设备进行优化的创新人脸检测模型。该模型采用了改进的感受野块（RFB）模块，在不增加计算负担的情况下有效地捕捉多尺度的上下文信息。它在多样化的WIDER FACE数据集上进行训练，并针对320×240的输入分辨率进行优化，具有出色的效率平衡，计算速度为0.2106 GFLOPs，参数量仅为0.3004百万个。该模型基于PyTorch框架开发，取得了令人称赞的平均精度(mAP)为84.78%，在资源受限环境中成为高效人脸检测的强大解决方案。在此实现中，RFB-320 SSD模型以Caffe格式使用。

ssd single shot multibox detector model architecture

自定义的VGG13模型架构

情绪识别分类模型采用了定制的VGG13架构，专为64×64灰度图像设计。它使用具有最大池化和dropout的卷积层将图像分类为八个情绪类别，以防止过拟合。该架构开始于两个具有64个卷积核的卷积层，然后是最大池化和25%的dropout。额外的卷积层捕捉复杂的特征，两个具有1024个节点的密集层聚合信息，然后是50%的dropout。一个softmax输出层预测情绪类别。

customized vgg13 model architecture for facial emotion recognition that classifies images into eight emotion classes using convolutional and max pooling layers.

让我们编写一些代码来实现这个系统。

首先，需要初始化一些重要的参数。


image_mean = np.array([127, 127, 127])
image_std = 128.0
iou_threshold = 0.3
center_variance = 0.1
size_variance = 0.2
min_boxes = [
    [10.0, 16.0, 24.0], 
    [32.0, 48.0], 
    [64.0, 96.0], 
    [128.0, 192.0, 256.0]
]
strides = [8.0, 16.0, 32.0, 64.0]
threshold = 0.5
 class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

image_mean: 在RGB通道上进行图像归一化的均值。
image_std: 图像归一化的标准差。
iou_threshold: 确定边界框匹配的交并比（IoU）度量的阈值。
center_variance: 预测的边界框中心坐标的缩放因子。
size_variance: 预测的边界框尺寸的缩放因子。
min_boxes: 不同尺寸对象的最小边界框尺寸。
strides: 根据图像大小控制特征图的尺度。
threshold: 目标检测的置信度阈值。

def define_img_size(image_size): shrinkage_list = [] feature_map_w_h_list = [] for size in image_size: feature_map = [int(ceil(size / stride)) for stride in strides] feature_map_w_h_list.append(feature_map) for i in range(0, len(image_size)): shrinkage_list.append(strides) priors = generate_priors( feature_map_w_h_list, shrinkage_list, image_size, min_boxes ) return priors class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}">

def FER_live_cam(): emotion_dict = { 0: 'neutral', 1: 'happiness', 2: 'surprise', 3: 'sadness', 4: 'anger', 5: 'disgust', 6: 'fear' } # cap = cv2.VideoCapture('video1.mp4') cap = cv2.VideoCapture(0) frame_width = int(cap.get(3)) frame_height = int(cap.get(4)) size = (frame_width, frame_height) result = cv2.VideoWriter('result.avi', cv2.VideoWriter_fourcc(*'MJPG'), 10, size) # Read ONNX model model = 'onnx_model.onnx' model = cv2.dnn.readNetFromONNX('emotion-ferplus-8.onnx') # Read the Caffe face detector. model_path = 'RFB-320/RFB-320.caffemodel' proto_path = 'RFB-320/RFB-320.prototxt' net = dnn.readNetFromCaffe(proto_path, model_path) input_size = [320, 240] width = input_size[0] height = input_size[1] priors = define_img_size(input_size) while cap.isOpened(): ret, frame = cap.read() if ret: img_ori = frame #print("frame size: ", frame.shape) rect = cv2.resize(img_ori, (width, height)) rect = cv2.cvtColor(rect, cv2.COLOR_BGR2RGB) net.setInput(dnn.blobFromImage( rect, 1 / image_std, (width, height), 127) ) start_time = time.time() boxes, scores = net.forward(["boxes", "scores"]) boxes = np.expand_dims(np.reshape(boxes, (-1, 4)), axis=0) scores = np.expand_dims(np.reshape(scores, (-1, 2)), axis=0) boxes = convert_locations_to_boxes( boxes, priors, center_variance, size_variance ) boxes = center_form_to_corner_form(boxes) boxes, labels, probs = predict( img_ori.shape[1], img_ori.shape[0], scores, boxes, threshold ) gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) for (x1, y1, x2, y2) in boxes: w = x2 - x1 h = y2 - y1 cv2.rectangle(frame, (x1,y1), (x2, y2), (255,0,0), 2) resize_frame = cv2.resize( gray[y1:y1 + h, x1:x1 + w], (64, 64) ) resize_frame = resize_frame.reshape(1, 1, 64, 64) model.setInput(resize_frame) output = model.forward() end_time = time.time() fps = 1 / (end_time - start_time) print(f"FPS: {fps:.1f}") pred = emotion_dict[list(output[0]).index(max(output[0]))] cv2.rectangle( img_ori, (x1, y1), (x2, y2), (0, 255, 0), 2, lineType=cv2.LINE_AA ) cv2.putText( frame, pred, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2, lineType=cv2.LINE_AA ) result.write(frame) cv2.imshow('frame', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break else: break cap.release() result.release() cv2.destroyAllWindows() class="hljs-button signin" data-title="登录后复制" data-report-click="{"spm":"1001.2101.3001.4334"}">