深度学习在流量识别中的应用王占一 2015.9.30 Black Hat 2015参会议题 – The Applications of Deep Learning on Traffic Identification Black Hat 2015 • 2015.08.01-08.06 • Las Vegas, NV Black Hat 2015大数据与机器学习相关议题 Black Hat 2015大数据与机器学习相关议题 Title Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing Speaker Alex Pinto Why Security Data Science Matters and How It’s Different: Pitfalls and Promises of Data Science Based Breach Detection and Threat Intelligence Joshua Saxe Graphic Content Ahead: Towards Automated Scalable Analysis of Graphical Images Embedded in Malware Alex Long Distributing the Reconstruction of High-Level Intermediate Representation for Large Scale Malware Analysis Rodrigo Branco Securing Your Big Data Environment Defeating Machine Learning: What Your Security Vendor is Not Telling You Ajit Gaddam Bob Klein From False Positives to Actionable Analysis: Behavioral Intrusion Detection, Machine Learning, and the SOC Joseph Zadeh The Applications of Deep Learning on Traffic Identification Zhanyi Wang Internet-Scale File Analysis Zachary Hanif Tamas Deep Learning on Disassembly Matt Wolff 内容提要 • 流量识别的传统方法 • 神经网络和机器学习 • 具体应用 – – – – 协议分类未知协议识别特征的自动学习应用程序识别 • 总结和展望流量识别的传统方法（一） • 将流量准确地映射到某种协议或应用 – 是网络安全的基础 – 对异常检测、安全管理作用重大 • 基于预定义或特殊端口 – 标准HTTP端口：80 – 默认SSL端口：443 – 缺点：非标准端口或新定义的端口不适用 • 基于DPI和统计特征的流量识别 – 根据经验和规则确定的特征字/指纹/序列 – 缺点：既耗时又耗力流量识别的传统方法（二） • 基于行为特征和机器学习 – 优点：建模和识别过程自动化 – 难点：特征抽取和选择依赖于如何选择特征？ • 有没有不依赖于专家的方法？ • 非监督的特征学习是否可行？ • 答案 – 人工智能领域的深度学习技术专家经验，火热的深度学习技术 • 图像 • 自然语言处理 • 语音深度学习技术的应用 • Gatys, L. A. (2015). A Neural Algorithm of Artistic Style. arXiv preprint arXiv:1508.06576. 神经网络 • 人工神经网络 • 基本单元 o1 o2 W3 +1 – 神经元 • 结构 – 输入层 – 隐藏层 – 输出层 Layer 4 (output) Layer 3 W2 +1 Layer 2 +1 Layer 1 (input) W1 x1 x2 • 相邻层的神经元彼此相连 • 同层的神经元不直接相连 x3 自编码(Auto-Encoder)网络 • 一种特殊的神经网络 • 只有一个隐藏层 x' x1' • 输出层与输入层完全相同！ h x2' x3' Layer 3 (output) W1' +1 Layer 2 W1 x x1 x2 x3 +1 Layer 1 (input) 自编码在图像识别中的应用 • 手写体数字识别栈式自编码(Stacked Auto-Encoder) • 栈式自编码(SAE) • 由多个自编码网络 • SAE本质上也是一种络 …… w4 h 2' w3' (AE)组成 …… 神经网 …… Hidden Layers • 采用逐层贪婪训练 • 使用微调(fine-tuning) Output w3 h3 …… h 1' w2' …… h2 …… w2 x' w1' …… w1 …… x (Input) h1 图像 VS Payload数据 • 是否有相似之处？ TCP flow Payloads 474554206874……727665720020……732048545450……33a31353a323…… 732048545450……33a31353a323…… 255 210 21 53 … 255 52 3 0 … 52 6 0 85 … … … … … 115 32 72 84 84 80……51 163 19 83 163 35…… 数值范围相同：[0,255] 256个数字! 协议流量图像 MySQL SSH Whois-DAS BitTorrent 协议识别的实现过程 • 数据采集自公司内网 • 实验环境 – 框架1 - CPU集群: 2~10台服务器 – 框架2 - CPU + 4GPU – 训练时间 - 天->分钟 training stage Training data association Training data sampling Training data transformation Deep learning model identifying stage Testing data association Testing data transformation Protocol identification Predicted protocol 基于多GPU的并行计算 • 训练时间的需求 – 用CPU需要几天完成 • GPU矩阵计算 • 大量的模型参数 – 500,000以上 • 大规模的数据 – 存储的需求 • 解决方法 – 多机并行 – 多GPU并行 – OpenCL框架 Parameter Server w Δw GPU 0 GPU 1 GPU 2 Machine 0 GPU 3 … Machine 1 Data 协议分类结果 • 宏观准确率>99% • 平均准确率97.9% Protocol SMB DCE_RPC NetBIOS TDS SSH Kerberos LDAP BitTorrent MySQL DNS Precision 1.0000 1.0000 1.0000 1.0000 0.9996 0.9996 0.9996 0.9992 0.9989 0.9989 Protocol RSYNC Redis FTP_CONTROL HTTP_Connect SMTP Whois-DAS IMAPS Apple SSL HTTP_Proxy Precision 0.9987 0.9985 0.9970 0.9967 0.9949 0.9943 0.9814 0.9640 0.9513 0.9174 未知协议识别 • 随机选取10,000条被传统方法标记为 “unknown”的记录 number ratio • 识别率： SSL 1956 29.12% DCE_RPC 1454 21.65% • 0% • 63.37% Skype Kerberos MSN Google DNS RTMP TDS H323 873 517 360 311 260 234 202 170 13.00% 7.70% 5.36% 4.63% 3.87% 3.48% 3.01% 2.53%

2015-《深度学习在流量识别中的应用-王占一》

安全研究库 > 网络论坛材料 > 数据驱动安全之大数据分析论坛 > 文档预览

30 页 0 下载 46 浏览 0 评论 0 收藏 3.0分

温馨提示：如果当前文档出现乱码或未能正常浏览，请先下载原文档进行浏览。

2015-《深度学习在流量识别中的应用-王占一》第 1 页

2015-《深度学习在流量识别中的应用-王占一》第 2 页

2015-《深度学习在流量识别中的应用-王占一》第 3 页

2015-《深度学习在流量识别中的应用-王占一》第 4 页

2015-《深度学习在流量识别中的应用-王占一》第 5 页

下载文档到电脑，方便使用

还有 25 页可预览，继续阅读

本文档由张玉竹于 2022-04-08 10:42:44上传分享

举报

下载原文档(3.28 MB)

收藏分享

给文档打分

评论列表

暂时还没有评论，期待您的金玉良言

最新文档