Global Security Database

 View Only
Expand all | Collapse all

Discussion on future data formats

  • 1.  Discussion on future data formats

    Posted Dec 15, 2021 09:04:00 PM
    So I invented the JSON format used by MITRE/CVE (and for which I apologize, it's a terrible format and missing many things) which is at

    What do you want to see in a future data format? 

    E.g. I want a data format that supports both machine-readable (JSON) and human-readable (e.g. formatted text), for example I sort of did this in the CVE JSON format, but for example the text description doesn't support any formatting, or even line returns. So I think in general the solution here is to figure out which fields are human-readable (the description for sure, notes fields as well?) and allow the use of line returns and Markdown.

    I also suspect taking some inspiration from JSON-LD ( is a good idea but I don't think I fully understand it (so if you're using it, for e.g. events or products please let me know!)

    So what does the community need and want to see?

    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance

  • 2.  RE: Discussion on future data formats

    Posted Dec 15, 2021 09:15:00 PM
    Oh also I forgot, I put some of my thoughts on GSD data format/requirements (in general) here:

    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance

  • 3.  RE: Discussion on future data formats

    Posted Jan 27, 2022 01:27:00 PM

    Hey Guys,

    My name is Sam, I'm very excited about this project. My eyes lit up when I heard you guys talking about it on the OSS podcast (One of my favorites). The problem domain has been weighing on my mind ever since I read "This is how they tell me the world ends". The Log4j hit and the first security fix as disabling a configuration? I told my wife it was the equivalent of an attacker breaking into the house, turning the lights off, and standing her standing still. Eventually, they will find the light switch.

    I truly believe that this domain should be open-sourced. I think we need it to be decentralized and provide a Vulnerability Disclosure Blockchain that creates an economy to combat 0-day market bounties ethically... Like I said it's something that eats at me and what really got me hyperfocused on this area.

    Anyways, here is a quick thought I had...

    I think my skill set can contribute to the project. I work for Whitesource Software as a Technical Support Engineer and we provide open source vulnerability scanning and compliance through software composition analysis solutions.

    One tool that GitHub acquired is LGTM which provides continuous SAST and scanning on open source repos. They provide the SARIF files that contain that CWEs and links to the CodeQL database on the project's site. This would help the project gain adoption by allowing vendors to easily integrate the response from GSD into their solutions. It would be very useful to include these links so anyone using SAIRF for SAST or CodeQL could easily have access to how the weakness was discovered. The most we get out of a CVEs/NVD is a link to a website of a thread that started in 2012, slight sarcasm. Like you guys said, Twitter is a better resource for learning about CWEs.

    SAIRF Spec:

    Sam Jones
    Technical Support Engineer - SAST and SCA
    Whitesource Software

  • 4.  RE: Discussion on future data formats

    Posted Jan 27, 2022 05:45:00 PM
    Hi Sam,

    Welcome aboard!

    That book is amazing and I love your analogy.

    Are you suggesting we include the SARIF json directly into the GSD entries, or we link to the SARIF files?

    I admit I know if SARIF, but that's about all I know. I've not really thought about something like this, but I think it's clever.


    Josh Bressers
    Product Security

  • 5.  RE: Discussion on future data formats

    Posted Jan 27, 2022 06:08:00 PM
    Edited by Kurt Seifried Jan 27, 2022 06:08:58 PM
    As for a long-term permissionless distributed yada yada technology base for this data, YES. I don't want anyone to be a gatekeeper (intentionally or otherwise), and the important stuff already runs (messily) by consensus on Twitter using PoH (proof of hashtag =). But first we build using the tools we have and figure out what we need/don't need.

    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance

  • 6.  RE: Discussion on future data formats

    Posted Jan 27, 2022 08:46:00 PM
    Well git is a blockchain, maybe we already solved that problem. Anyways that was a slight security rant, the GSD project is the first I have seen moving in the right direction. 

    Here are some thoughts I had around the data model related to SARIF and links to external source code.

    I think we should provide a link to the SARIF file. I'm just getting acquainted with the specification, but from what I have worked with they can be pretty lengthy.
    Plus its not really JSON and would be hard to parse. VS Code hates it. I think most modern SAST solutions are moving to this format, or support it.

    Really what SARIF does is provides a way to describe the static conditions for a CWE. This enables the development for an exploit in the execution state, generating a CVE.

    Researchers could generate and verify POCs for the vulnerability, bug, or weakness much faster. If you consider the shift left philosophy, this would shorten the time to remediation and even foster the development auto-remediation technologies.

    I also think wherever there is a link to any sourcecode, the same object containing the link MUST include a minimum of two checksums (SHA1, SHA256) to verify the file files integrity.

    I briefly looked at some of the other specification and noticed none of them referenced checksum or hashes directly. I just did a fast skim and page search so I could of missed it.

    Open sourcing these data objects, they will be abused and manipulated, but keeping it in a JSON format makes for a interesting use case of version controlling a disclosure.

    The git commit hash adds a level of integrity to the JSON object as a whole. Adding checksums to verify links to external code allows adds a granular level of integrity within the JSON object itself. Attempts to change the hashes within the JSON would require a new commit disclosing their change. This would make it possible to automate the validation of these changes.

    I know you are aware there are major issues with relying on metadata to identify a file or piece of code. To do this correctly it must be done using only the minified file contents of the source code (exclude file metadata when hashing).

    For example, A user says they find a CWE in in a file "jquery.1.2.3.js" that came from Repo A, but the file is really "jquery.6.6.6.js" that came from repo B and was manually placed in repo A and renamed.

    Right now everything goes through some formal approval process where it can be checked before being publicly released. This is a privilege the GSD would not have. Therefore it must be enforced on external files.

    I propose three MUST conditions:

    External links to files must included with a minimum of 2 hashes (sha1, sha256)
    Hashes must exclude metadata
    Hashes must be of the the minified file contents (no white space, tabs, spaces)

    I propose 1 SHOULD condition:

    CWE should include links to a SARIF file when available
    Just some ideas! hope it helps. I could work tooling for the hash validation if we decide to go that route

    Sam Jones
    Technical Support Engineer - SAST and SCA
    Whitesource Software

  • 7.  RE: Discussion on future data formats

    Posted Jan 27, 2022 09:30:00 PM
    Some quick comments:

    One part of the plan is to mirror all the data we reference. The reason is simple: if we download and process a file (e.g. a Debian Security Advisory) a URL is not enough, it might serve a DSA, it might serve a jpeg, or might serve a 404. Currently, I was using but using git for this is terrible. I have reached out to the Internet Archive (they have paid solutions) but they've not replied. We'll figure out some mirroring solution.

    Git is a good start but it inherently has gatekeeping in the sense of "this git repo is the source of truth" and if you can't get someone to accept your PR/commit, well, it doesn't exist. To bad. In the short/medium term this is ok because I and Josh will keep things honest, but as good as BDFL is I'd rather see a technical solution long term so that it can't be stopped.

    CWE is a challenge, CWE is slow to create new CWEs. I'm on the CWE board and we just voted for a charter. I've been pushing for better transparency and process around creating new CWEs, we'll see if it happens or not. In the meantime, my plan is to use the repo to hold (among other things) data on projects, researchers, and new vulnerability types (essentially CWEs), for example, isprobably~100 new CVEs right there. 

    There's more but essentially we have 10-20 major interlocking problems that all have to be pushed forwards in order to solve the overall problem of "vulnerability identifiers".

    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance

  • 8.  RE: Discussion on future data formats

    Posted Jun 13, 2022 08:55:00 AM
    The only logically sound approach is to integrate with the existing formats supported by MITRE, Centre-for-Threat-Informed-Defense, CIS Community-Defense Model etc. etc.
    • STIX2.1
    Interoperability is guaranteed. related projects incorporating STIX:
    • OASIS Collaborative Automated Course of Action Operations (CACAO)
    • OASIS Topology and Orchestration Specification for Cloud Applications
    As for the Machine Readable aspect, I would to see integration work around OSCAL. It could potentially bridge a very big gap.

    [Bruce] [Lavoie] [Developer]
    [Montreal] [Quebec]

  • 9.  RE: Discussion on future data formats

    Posted Jun 13, 2022 10:26:00 AM
    I'm not sure I agree, but I would welcome some examples so we can better discuss this. Thanks.

    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance

  • 10.  RE: Discussion on future data formats

    Posted Jun 14, 2022 07:47:00 AM

    While I think interoperability with existing formats is valuable, we should not restrict or limit the functionality to only the existing challenge areas (which have gotten us here in the first place).

    That being said, (and rereading this thread), I'm not clear one what we're specifically searching for a solution to?

    A data format for information exchange?
    A data format for information storage?
    A data model to transcribe external information into a GSD compatible format?

    I was under the impression (likely wrongly so) that OSV and json would be the preferred formats. However OSV would require some modification to cover the variety of values needing to be expressed (such as extending beyond just open source to include proprietary software and cloud services, exploit path, evidence of compromise, tests, etc.), because the existing standards and formats are grossly lacking in the necessary details to make informed decisions let alone account for the variety of information technology and service constructs that exist in today's computing era.

    If we're also discussing a data model, we must first understand what are the values we seek to retain wrt a given vulnerability (suspected or confirmed - also implies state or status), so does it then make sense for us to first articulate what we need to know about a vulnerability and then determine what formats exist that are viable and the corresponding delta between the format, industry formats, and then describe the data model to exchange between them all?

    let me know if i'm on the wrong path here.

    Emily Fox
    Security Engineer

  • 11.  RE: Discussion on future data formats

    Posted Jun 15, 2022 08:32:00 AM
    One option would be to support the OWASP CycloneDX standard. Technically, it's a Bill of Materials (BOM) format, however, it also supports Bill of Vulnerabilities, Advisory, and VEX format. We bill it as a "modern standard for the software supply chain".

    Some key attributes which I think make it a viable format for this effort:

    That said, it's also not perfect. There is ongoing work to expand the vulnerability capabilities of the spec to include data relevant for reproducibility of vulnerabilities (evidence, tests, etc) and other data that would be necessary for the spec to align with capabilities of offensive engagement platforms (HackerOne, Defect Dojo, Faraday, etc). If anyone is interested in this effort, we're collecting industry feedback for inclusion in the next version of the spec (v1.5) which we anticipate will be available Q1 2023.

    We're also taking on more advanced vulnerability use cases as they relate to the source/sink of source code. We're aligning to the SARIF spec and will be including capabilities in a future release that will provide paths to vulnerable components and the risk and analysis of each.

    Anyway, I think CycloneDX should be considered for this effort and would be happy to answer any questions the community has, or take feedback for what other aspects may be missing form the spec so we can continue to advance it forward.


    Steve Springett
    Chair, CycloneDX Core Working Group