Open Source Vulnerabilities – Debricked, Lund university and Axis in the Industrial Open Source Network Webinar

in DevOps , DevSecOps , Machine Learning , Cloud Computing , Configuration Management , Monitoring and Observability , Continuous Integration

On June 10th, 2020, Lund University, Axis Communicationsand Debrickedparticipated in a webinar arranged by Swedsoft’sIndustrial Open Source Network on the theme of Open Source Software Security and Vulnerabilities. The webinar was taped and can be watched on YouTube– or read as presented in these abstracts.

Maturity Models and Open Source Security

Associate proffessor Martin Hell, Lund University and Co-founder of Debricked AB

The theme of Martin Hell’s presentation was to give a brief overview of the two most well-known software security maturity models, BSIMM and OWASP SAMM, and in addition, to introduce Lund University’s work on a maturity model which specifically handles software vulnerabilities in Open Source software.

What maturity models are used for

Generally, a maturity model is a tool for an organisation used to access and improve the way it works. It will support you in finding missing fundamental initiatives and helps you to improve and prioritise on the specific initiatives that you are taking. Though, there are two fundamental issues with maturity models: “WHO determines what is right?’ and secondly ‘WHAT defines your maturity?’

Those issues are reflected in the fundamental differences found between the BSIMM and the OWASP SAMM maturity models for Software Security. Although similar in the sense that both tries to help organizations to improve their software security handling, both are also fundamentally different in HOW they try to achieve it.

BSIMM and SAMM – two well-known security maturity models

Starting with the BSIMM model, it is a descriptive model based on data and observations made on how hundreds of companies actually conduct their software security initiatives. The BSIMM model does not specifically tell you what to do, it just says what others do and presumes that if many are doing something, then it is probably a good idea that you do it yourself.

The SAMM model is on the contrary a prescriptive model. It is essentially a community-driven effort where experts suggest the best practices to conduct for proper software security work and issue general guidelines for what you should do.

Models are general in their descriptions as they should fit both small and large organisation, and for a large variety of software development. This ‘one-size fits all’ approach is as much a strength as a weakness. Yet, as much as both BSIMM and OWASP SAMM recognize the existence of Open Source, there is something lacking in the specific practices to use for managing vulnerabilities in Open Source software.

For instance, it is crucial to have an efficient handling of vulnerabilities as exploits are quickly crafted nowadays. Adding, to be forced to urgently make updates, a common way of mitigation to counter exploits, may become very costly for certain organisations.

The HAVOSS model specifically addresses vulnerabilities in Open Source

For filling those missing aspects of today’s wide-spread use of Open Source in the security models, Lund University recently developed the HAVOSS model – ‘Handling Vulnerabilities in third party OSS’ – for a broader and deeper handling of Open Source vulnerabilities than what BSIMM or SAMM offers. Still, HAVOSS is not to be seen as a replacement of either those two models but is rather a supplement to them.

Similar to both BSIMM and SAMM, HAVOSS defines six different domains, or Capability Areas as they are named, where a total of 21 Practices are measured in their maturity in five different levels. The core practices span over a chain of events for handling a specific vulnerability, starting from ‘Identify’, over ‘Evaluate’, ‘Remediate’ and ending in ‘Deploy’.

The HAVOSS model itself has gone through two major evaluations with industry participation and is now disseminated by publications of academic papers as well as presented in conferences.

Software Security at Axis

Stefan Andersson, Security Architect at Axis Communications AB

Stefan Andersson started with giving a background on Axis and its use of Open Source, which spans back to 1999 when Axis launched its first embeddedLinuxproduct; a network security camera. Because of that, it may be said that the Axis engineering culture is since long inspired by the Linux community, which is also reflected in how security work is organized in Axis.

A team-centric security program

Their own flavoured Axis Security Development Model, ASDM, resembles both the BSIMM and SAMM models, spreads over all phases of software development, but is adjusted to Axis’ own specific environment, organization, and culture. One of the key approaches for its security program is to be team-centric, meaning that all security work is to be done by the ordinary development teams themselves and not by a central entity.

There are two good reasons for taking a team-centric approach. First, because it really tightly couples to the Axis development culture: all development teams are essentially fully responsible for everything they do on their own without any interference from management.

The second reason is generally related to the modern ways of software development, that is, to optimize as much throughput as possible of deliverables and to not introduce any bottlenecks or delays in the development process.

The latter is also the reason that the Software Security Group, SSG, is kept small with only two people supporting more than 800 developers. The SSG does most of the activities related to governance, such as to conduct training, assessing how the security work progresses in the teams, and the gathering of metrics related to software security.

Though, being a coach for all individual developers is hard, so to their aid the SSG have the support of a virtual team, ‘The Satellites’, consisting of some 20 developers and architects embedded in the development teams.

Risk based

All the security work at the teams start by a risk assessment at the very beginning of development, at the backlog grooming, allowing them to determine how much time on security work is going to be spent on a particular feature to be developed.

This early risk assessment practice means that everything made in Axis is also risk-based, leading to that teams are spending a significantly larger effort in security on such that is considered high-risk than that of low-risk. In addition, the teams are doing a lot of threat modeling, which pushes security concerns, i.e. SW Vulnerability Management, as early in the software development process as possible.

New code first step for the security program

When the SSG were about to launch their software security program, an obvious understanding was that if the program would include all of the development teams all at once it would be bound to fail.

Instead the SGG started the program with the first step of introducing the Axis Secure Development Model for teams working on new code, focusing on teaching those teams how to develop more secure software as well as getting the model evaluated.

Second step: high-risk legacy component

However, addressing just new code development does not significantly increase the overall software security for Axis products. To do that, they had to look into their legacy code as well, so next step in Axis security program was to look into high-risk legacy components.

As about 90 percent of the codebase of Axis products is Open Source, most of the high-risk legacy components are Open Source as well. Given that, and with the responsibility of the maintainers of the Open Source code to keep the high-risk components up to date regarding security, this essentially meant that they had to go back in time of development and apply the whole toolbox of the Axis Secure Development Model for the selected lot of high-risk legacy components.

Anytime Axis brings in some new open source component, it has to be assessed by threat modelling to determine if it represents a high security risk or not. This assessment is not primarily based on the security properties of the component itself, but more where it will be put into the system and how exposed it would then become. At that point it would be great to know or to be able to measure the security of the new component – but as Axis also recognizes, that is hard to achieve.

Instead, Axis tries to understand the maturity of the Open Source project behind the code. It will be easier to build trust in the code itself if Axis can trust the community to produce high-quality code from a security perspective.

Understanding an open source community’s security maturity

So, having a BSIMM or SAMM evaluation of an Open Source project would be the best of worlds, but in reality, Axis has to look into whatever other sources of data that may be available. Typically, that is to look at historical data of reported vulnerabilities for a particular project.

To Axis experience that does represent a good indicator as of the maturity of the project – having a good amount of people carrying about the security aspects, reporting vulnerabilities, and fixing the security problems is at least a proof that security concerns have been addressed and improved in the project over time.

Another way to assess a project’s maturity is to look if the project has a specific security page for reporting vulnerabilities, where to find the latest vulnerability information, whom to contact, etc. All of those are as well good indicators on a project’s maturity on security.

Next step: contribute

After having addressed security in new code as well as in the legacy code, the next step that Axis is now contemplating on is to increase the security of those Open Source projects vital to Axis by contributing Axis’ own security analysis capabilities. In the end, concludes Stefan, “if you worry about the security level of a project that is critical for you, the only way to really fix it is getting involved in the project yourself and to contribute your own skills and time”.

Using Machine Learning to enrich Open Source Vulnerability Data.

Emil Wåreus, Head of Data Science, Debricked AB

Industry analysts have reported that 96% of all companies use Open Source, with an average of 60% of their codebase being Open Source. As impressive this advancement may be, one has to also recognize that leveraging Open Source comes with some risks that need to be addressed.

Emil Wåreus started by giving a brief background of Debricked and its mission on addressing three of those risks with the help of automation – Security Vulnerabilities, License & Copyright, and Quality aspects of open source communities. The latter could be on such as measuring the responsiveness of a community on found vulnerabilities, but that theme is saved for another day. Instead, the focus is laid on the specific challenge that delays in vulnerability reporting represents today.

Source of data on open source vulnerabilities is key

One of the largest sources of listed vulnerabilities is the National Vulnerability Database, NVD. It is a U.S. government managed database containing some 130k vulnerabilities as of today. These vulnerabilities cover about 50k software products which could be proprietary software, such as MS Windows, but a majority of them originate from Open Source projects.

It all relies on community efforts on reporting potential vulnerabilities to NVD, where they analyze those reports and publish newly found vulnerabilities. In comparison NVD dwarf other vulnerability databases, e.g. GitHub’s own security advisory has just some 4k vulnerabilities listed.

Capture, understand and to remediate

When working with vulnerabilities, you need a process in your DevOps to capture, understand and to remediate what vulnerabilities you have. Through Software Composition Analysis you can chart what Open Source components you have included in your software development and map those to known vulnerabilities as found in open databases such as NVD.

Presented with a list of identified vulnerabilities that you have in your software, you follow with an evaluation of these, then with a prioritization and assignment for the developers to finally remediate those vulnerabilities. The remediation could be either simply to patch or update the open source software used, or better, to contribute back a fix to the community in order to resolve the vulnerability for all.

Linking open source vulnerabilities to products takes time

Listed vulnerabilities in NVD are known as CVEs, ‘Common Vulnerabilities and Exposures’. Each one had a unique identifier, a human written summary description of the vulnerability and its implications, and a version range for which the vulnerability applies.

A CVE is in turn linked to one or more CPEs, ‘Common Platform Enumerations’, which are a unique piece of software identifiable by a specific vendor and product, and with its own version range. To be noted, here also lies part of the complexity of mapping vulnerabilities to products, as they may be indirectly dependent on each other.

Unfortunately, it takes a lot of time to couple CVEs to CPEs nowadays, causing an increasing delay of the time from when a vulnerability is disclosed until it is linked to a product, thus opening for covert exploits. It used to be zero days, but since 2015 the average has grown to 35 days and beyond.

A lexicon based on past entered products is not viable

What could be done to reduce the delay is to let a machine read the description in the CVEs and match CPEs entered in the past to create a kind of lexicon. However, there are lots of challenges in this. One major challenge is that a large share of reported vulnerabilities is for just a handful of the very largest of software products.

Another is that a lot of the newer reported vulnerabilities may mention a particular piece of software for the first time. In figures, only 1% of all products that NVD covers has 40 or more vulnerabilities reported, and 60% of all vulnerabilities covers new software just mentioned once. Thus, creating a lexicon by just using past reported CPEs alone is not really a viable strategy.

Neural networks can extract the information

Instead what could be done is to contextually extract the information from the vulnerability summary – and that is exactly what Debricked has done. By implementing a machine learning model that uses a technique called Named Entity Recognition, NER, it looks, labels, and classifies each word in the vulnerability summary on whether it should belong to the product class, the vendor class or the averted version class.

With the large neural network that Debricked has built, it has come to great length in automating the coupling of CVEs to CPEs. It correctly identifies 88% of all products and 84% of all the versions in the reported vulnerabilities – to be noted in zero days of delay. In particular,

Debricked’s neutral network is completely correct, i.e. the labels on vendor, product, and version are completely correct and no further analysis is needed to assign CPEs to the CVE, as high as in 68% of all the CVEs. Human analysis is of course decisively better, but at least this gives a good first estimate of what software is vulnerable and where further analysis should be applied by human interaction. This establishes Debricked as the provider of what is state-of-the-art in the field, and they have also patented their technology.

To further explore this topic, listen to the presentation of Emil’s latest research paper here.

Open Source Risk Management is HARD

Yet, although Debricked has solved a specific problem in handling Software Vulnerabilities, one has to recognize that Open Source Risk Management in general is hard and technically challenging. There are still issues that Debricked wants to address, such as:

Are all vulnerabilities disclosed? E.g. there are research reports saying that only about 20% of Javascript vulnerabilities are disclosed.
Are the disclosed vulnerabilities relevant? Do you actually use the vulnerable part of the software in your system?
How does one foster ownership from developers, such as updating or patching to solve those vulnerabilities? This a more process- and team-based issue, and one conclusion is likely that the best tool for this is the tool you actually use rather than the tool with the most vulnerabilities or best data.To wrap things up, concludes Emil, Debricked wants to go further for the broader scope, which is how to maximize the leverage of Open Source while minimizing the risk.