What can you get from 100 billion DNS queries, each day, in real time? 1000 DNS DNS C&C How Data Science and Machine Learning Help to Discover Botnets with DNS: Moving Beyond Honeypots and Reverse Engineering Hongliang Liu, Senior Staff Data Scientist, Nominum Inc. Wooyun Summit 2016, Beijing, China “You might say that DNS is in our DNA. Nominum invented DNS, has written 90% of the world’s DNS code, and was the first to scale, secure, and leverage DNS to deliver a whole new set of services. We are passionate about great Internet experience, high quality code, and straightforward approaches to solving complex provider challenges. Now we have harnessed DNS to allow providers to deliver extraordinary value to their subscribers.” https://nominum.com/company/ Ali Data Science team at Nominum Mikael Paul and Yuriy Alexey Yohai Thanh Hongliang About me • A PhD degree in physics, plus two bachelor degrees in physics and computer science. • • No degree in security research. • • Not a big deal either. Building machine learning on security • • Not a big deal. Have detected and blocked multiple botnets using data science. Follow me @phunter_lau on weibo. Photo courtesy: Danny Dong Photography Can’t wait! Outline • When we talk about threat intelligence, what do we talk about? • When we see threat, what should we do? • Too much, that is too much information, meow~ Should be an amazing talk! • Intelligent machines are here to help • Intelligent Anomaly Detection, Correlation map, DRS… with real life examples • So your machines want to replace human? or, we work together? My sister @dudulee ’s two cats, Pogo and Niba When we talk about Threat Intelligence, what does it mean? What I think I do Source: Movie “The Matrix” What senior white hats think I do Source: http://tech.sina.com.cn/i/2013-07-13/07368536046.shtml What I really do Source: http://kirill9876.blogspot.com/ Threat Intelligence != visualization • “Hadouken” on a world map: pew, pew, pew~ • You know who I am talking about. • Just don’t spend much of your time on making a “hadouken” system • even if your boss really loves it • even if your boss thinks it is the future of the company. • You data are much more valuable than that! http://fishki.net/1481859-hadoukening-ili-vsyo-ne-tak-uzh-i-slozhno-no-jeffektno.html DNS traffic for 0.000001 second Threat Intelligence != Just see it • Every company has/wants an anomaly detection. • Yes, I believe you see these too, but so what? what can you do about them? • You said “Big data”? please, no bullsh*t! • “Big data” stays in 2012! • “Big data” is “garbage in garbage out” DNS traffic for another 0.000001 second What is the problem with Threat Intelligence? • Finding new threats are too hard and human are not scalable. • Too much information, too little time • Too few security researchers • Can’t copy-n-paste human • Too….much under paid (well, this talk can’t solve this problem) New threat is needle in haystacks? • Nope! • Because you look too close. “The closer you look…the less you will see.” — “Now you see me” the movie What is in this talk today Threat Intelligence is too general. Let’s take DNS traffic for example Threat Intelligence Source https://baldscientist.wordpress.com/2013/04/03/why-the-science-newsbite-is-just-the-tip-of-the-iceberg/ In a higher dimension world • If we have … 100 Billion DNS queries per day in real time? • If we know … some domain names are new and had never been seen before? • If we know … the correlation between any two domain names? • If we know … the connection among any client IPs, name servers, server IPs at any time? Core idea in this talk is, how to do … Dimension reduction attack Data, all about data • At Nominum, we receive and in real time process 100 Billion DNS records per day from our global partner Internet Service Providers. • From Nominum Vantio CacheServe, anonymized • Sampled from 1.5 trillion DNS queries served per day • Data volume advantage or disadvantage • Advantage: no need to wait for honeypot • Disadvantage: If we have 100 billions records per day (~ 1 M per second), what should we do? Just hire more security analysts? • Or an intelligent way? Source http://www.stuckinplastic.com/2015/11/storytelling/ Anomaly Detection Web Interface screenshot Intelligent anomaly detection • Anomaly detection (AD): an intelligent machine telling if a domain name looks not normal in real time • Never existed before? • Too many
2016-《在 DNS 数据里用机器学习自动寻找恶意软件 C&C:超越蜜罐和逆向工程》
温馨提示:如果当前文档出现乱码或未能正常浏览,请先下载原文档进行浏览。
本文档由 张玉竹 于 2022-04-08 09:18:21上传分享