Global Security Database

 View Only
  • 1.  Discussion on future data formats

    Posted Dec 15, 2021 09:04:00 PM
    So I invented the JSON format used by MITRE/CVE (and for which I apologize, it's a terrible format and missing many things) which is at https://github.com/CVEProject/cve-schema/tree/master/schema.

    What do you want to see in a future data format? 

    E.g. I want a data format that supports both machine-readable (JSON) and human-readable (e.g. formatted text), for example I sort of did this in the CVE JSON format, but for example the text description doesn't support any formatting, or even line returns. So I think in general the solution here is to figure out which fields are human-readable (the description for sure, notes fields as well?) and allow the use of line returns and Markdown.

    I also suspect taking some inspiration from JSON-LD (https://json-ld.org/) is a good idea but I don't think I fully understand it (so if you're using it, for e.g. events or products please let me know!)

    So what does the community need and want to see?

    ------------------------------
    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance
    kseifried@cloudsecurityalliance.org
    ------------------------------


  • 2.  RE: Discussion on future data formats

    Posted Dec 15, 2021 09:15:00 PM
    Oh also I forgot, I put some of my thoughts on GSD data format/requirements (in general) here: https://github.com/cloudsecurityalliance/gsd-project-plans/blob/main/data-formats/Thoughts-on-data-formats.md

    ------------------------------
    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance
    kseifried@cloudsecurityalliance.org
    ------------------------------



  • 3.  RE: Discussion on future data formats

    Posted Jan 27, 2022 01:27:00 PM

    Hey Guys,

    My name is Sam, I'm very excited about this project. My eyes lit up when I heard you guys talking about it on the OSS podcast (One of my favorites). The problem domain has been weighing on my mind ever since I read "This is how they tell me the world ends". The Log4j hit and the first security fix as disabling a configuration? I told my wife it was the equivalent of an attacker breaking into the house, turning the lights off, and standing her standing still. Eventually, they will find the light switch.

    I truly believe that this domain should be open-sourced. I think we need it to be decentralized and provide a Vulnerability Disclosure Blockchain that creates an economy to combat 0-day market bounties ethically... Like I said it's something that eats at me and what really got me hyperfocused on this area.

    Anyways, here is a quick thought I had...

    I think my skill set can contribute to the project. I work for Whitesource Software as a Technical Support Engineer and we provide open source vulnerability scanning and compliance through software composition analysis solutions.

    One tool that GitHub acquired is LGTM which provides continuous SAST and scanning on open source repos. They provide the SARIF files that contain that CWEs and links to the CodeQL database on the project's site. This would help the project gain adoption by allowing vendors to easily integrate the response from GSD into their solutions. It would be very useful to include these links so anyone using SAIRF for SAST or CodeQL could easily have access to how the weakness was discovered. The most we get out of a CVEs/NVD is a link to a website of a thread that started in 2012, slight sarcasm. Like you guys said, Twitter is a better resource for learning about CWEs.

    SAIRF Spec: https://docs.oasis-open.org/sarif/sarif/v2.0/sarif-v2.0.html
    LGTM: https://lgtm.com/projects/g/apache/logging-log4j2/






    ------------------------------
    Sam Jones
    Technical Support Engineer - SAST and SCA
    Whitesource Software
    ------------------------------



  • 4.  RE: Discussion on future data formats

    Posted Jan 27, 2022 05:45:00 PM
    Hi Sam,

    Welcome aboard!

    That book is amazing and I love your analogy.

    Are you suggesting we include the SARIF json directly into the GSD entries, or we link to the SARIF files?

    I admit I know if SARIF, but that's about all I know. I've not really thought about something like this, but I think it's clever.

    -- 
         Josh

    ------------------------------
    Josh Bressers
    Product Security
    Elastic
    ------------------------------



  • 5.  RE: Discussion on future data formats

    Posted Jan 27, 2022 06:08:00 PM
    Edited by Kurt Seifried Jan 27, 2022 06:08:58 PM
    As for a long-term permissionless distributed yada yada technology base for this data, YES. I don't want anyone to be a gatekeeper (intentionally or otherwise), and the important stuff already runs (messily) by consensus on Twitter using PoH (proof of hashtag =). But first we build using the tools we have and figure out what we need/don't need.

    ------------------------------
    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance
    kseifried@cloudsecurityalliance.org
    ------------------------------



  • 6.  RE: Discussion on future data formats

    Posted Jan 27, 2022 08:46:00 PM
    Well git is a blockchain, maybe we already solved that problem. Anyways that was a slight security rant, the GSD project is the first I have seen moving in the right direction. 

    Here are some thoughts I had around the data model related to SARIF and links to external source code.

    I think we should provide a link to the SARIF file. I'm just getting acquainted with the specification, but from what I have worked with they can be pretty lengthy.
    Plus its not really JSON and would be hard to parse. VS Code hates it. I think most modern SAST solutions are moving to this format, or support it.

    Really what SARIF does is provides a way to describe the static conditions for a CWE. This enables the development for an exploit in the execution state, generating a CVE.

    Researchers could generate and verify POCs for the vulnerability, bug, or weakness much faster. If you consider the shift left philosophy, this would shorten the time to remediation and even foster the development auto-remediation technologies.

    I also think wherever there is a link to any sourcecode, the same object containing the link MUST include a minimum of two checksums (SHA1, SHA256) to verify the file files integrity.

    I briefly looked at some of the other specification and noticed none of them referenced checksum or hashes directly. I just did a fast skim and page search so I could of missed it.

    Open sourcing these data objects, they will be abused and manipulated, but keeping it in a JSON format makes for a interesting use case of version controlling a disclosure.

    The git commit hash adds a level of integrity to the JSON object as a whole. Adding checksums to verify links to external code allows adds a granular level of integrity within the JSON object itself. Attempts to change the hashes within the JSON would require a new commit disclosing their change. This would make it possible to automate the validation of these changes.

    I know you are aware there are major issues with relying on metadata to identify a file or piece of code. To do this correctly it must be done using only the minified file contents of the source code (exclude file metadata when hashing).

    For example, A user says they find a CWE in in a file "jquery.1.2.3.js" that came from Repo A, but the file is really "jquery.6.6.6.js" that came from repo B and was manually placed in repo A and renamed.

    Right now everything goes through some formal approval process where it can be checked before being publicly released. This is a privilege the GSD would not have. Therefore it must be enforced on external files.

    I propose three MUST conditions:

    External links to files must included with a minimum of 2 hashes (sha1, sha256)
    Hashes must exclude metadata
    Hashes must be of the the minified file contents (no white space, tabs, spaces)

    I propose 1 SHOULD condition:

    CWE should include links to a SARIF file when available
     
    Just some ideas! hope it helps. I could work tooling for the hash validation if we decide to go that route

    ------------------------------
    Sam Jones
    Technical Support Engineer - SAST and SCA
    Whitesource Software
    ------------------------------



  • 7.  RE: Discussion on future data formats

    Posted Jan 27, 2022 09:30:00 PM
    Some quick comments:

    One part of the plan is to mirror all the data we reference. The reason is simple: if we download and process a file (e.g. a Debian Security Advisory) a URL is not enough, it might serve a DSA, it might serve a jpeg, or might serve a 404. Currently, I was using https://github.com/cloudsecurityalliance/gsd-url-downloads but using git for this is terrible. I have reached out to the Internet Archive (they have paid solutions) but they've not replied. We'll figure out some mirroring solution.

    Git is a good start but it inherently has gatekeeping in the sense of "this git repo is the source of truth" and if you can't get someone to accept your PR/commit, well, it doesn't exist. To bad. In the short/medium term this is ok because I and Josh will keep things honest, but as good as BDFL is I'd rather see a technical solution long term so that it can't be stopped.

    CWE is a challenge, CWE is slow to create new CWEs. I'm on the CWE board and we just voted for a charter. I've been pushing for better transparency and process around creating new CWEs, we'll see if it happens or not. In the meantime, my plan is to use the https://github.com/cloudsecurityalliance/gsd-objects repo to hold (among other things) data on projects, researchers, and new vulnerability types (essentially CWEs), for example, https://csaurl.org/blockchain-vulnerabilities isprobably~100 new CVEs right there. 

    There's more but essentially we have 10-20 major interlocking problems that all have to be pushed forwards in order to solve the overall problem of "vulnerability identifiers".

    ------------------------------
    Kurt Seifried
    Chief Blockchain Officer and Director of Special Projects
    Cloud Security Alliance
    kseifried@cloudsecurityalliance.org
    ------------------------------