SESSION ID: PS-T12 Securing the Software Development Life Cycle with Machine Learning Scott Christiansen Mayana Pereira Senior Security Program Manager Microsoft Customer Security & Trust Data and Applied Scientist Microsoft CELA Data Science #RSAC #RSAC The Story of Mike the Developer #RSAC Friday The Story of Mike the Developer #RSAC What’s In This For You? Educate + Learn = Apply Educate Learn Apply In complex enterprise environments, tracking bugs and understanding the security debt is vital. How to apply data science in security teams for more efficient and reliable processes Identify bottlenecks that require intensive work of Security expert and evaluate potential data science partnerships. 4 #RSAC What is a Security Bug? Software code flaw Software design issue Operational flaw of implemented software 5 #RSAC Sources of Security Bug Debt AUTOMATION MANUAL – Fortify – Threat Modeling – DevSkim – Security/Engineering Code Reviews – Vulnerability Reporting / Bug Bounty – Checkmarx – Semmle – Q/A, Testers, etc. – Microsoft Secure Development Lifecycle – Operation Security Assurance Programs 6 #RSAC Quality of Security Bug Debt Sources Automation Pen Testing – Static Analysis Security Testing Tools Network Vuln Scans Security Code Reviews Quality Low High – Dynamic Analysis Security Testing Tools – Fuzzing – Network Vulnerability and Secure Configurations Scans Manual – Threat Modeling – Vulnerability Reporting / Bug Bounty – Penetration Testing Threat Modeling Vuln Reporting Dynamic Analysis Static Analysis Fuzzing – Security/Engineering Code Reviews Manual -- Automated Testing and Validation 7 Where do the Security Bugs Go? Tracking Mechanisms 8 #RSAC How many Bugs are we talking about? Apache Software Foundation ~35K bugs in the past 16 years Debian OS ~85K totalGitHub bugs ~28.5 million issues across 100 million projects 9 #RSAC #RSAC Work-item Scale + ~47K engineers potentially creating bugs Azure DevOps ~100+ Different & GitHub repositories ~61K new Work Items created each month Items at ~13m Work Microsoft since 2001 10 #RSAC Security Bug debt questions Triaged Properly Opportunity for Security Training Does it matter if Security Teams are seeing everything? Fixed per SLA Does it Matter? Coverage Gaps Engineering Fatigue Clustering 11 Finding ‘Everything’ is a hard problem + The Easy Finds – SDL Mandated fields such as “Security Severity” – Security Bugs created by automation with error ID #s – Bug created by security teams with the words ‘THIS IS A SECURITY BUG’ in them The Hard Finds – Free form bug created from Threat Modeling exercises – Engineering code review bugs 12 #RSAC #RSAC The Machine Learning Test – Humans can't scale up to solve the problem – Large curated dataset to train a model from – On staff, security-focused Data Scientists – Security Subject Matter Experts whom are experts with the dataset 13 #RSAC Data Science Joins Security #RSAC What does the security team want from data science? Classification System that is “as close as possible” to a Security Expert for the task of classifying a bug report as security/non-security. 15 #RSAC Labeled Data Supervised learning: a quick recap XSS Buffer overflow Button in wrong place Wrong font in main page Security Security Non-Security Non-Security AI Model 16 #RSAC Data to be Labeled Supervised learning: a quick recap Security Security Non-Security Non-Security XSS Buffer overflow Button in wrong place Wrong font in main page 17 #RSAC Is there enough data? How good is the data? Are there data usage restrictions? Can data be generated in a lab? 18 Data Science + Subject Matter Expertise First Step for Success – The classification system needs to perform as Security Expert. • A very important step is to have a data that reflects decisions of the security expert. – Having the training data approved by the expert is fundamental for the success of the classifier • Security expert can review the data through statistical sampling 19 #RSAC #RSAC Our Classification System Two-Step Machine Learning Model Operation Critical Bugs Security Bugs Important Bugs Moderate/Low Impact Bugs Data from AzureDevOps Non-Security Bugs 20
2020_USA20_PS-T12_01_Securing the Software Development Life Cycle with Machine Learning