Well git is a blockchain, maybe we already solved that problem. Anyways that was a slight security rant, the GSD project is the first I have seen moving in the right direction.
Here are some thoughts I had around the data model related to SARIF and links to external source code.
I think we should provide a link to the SARIF file. I'm just getting acquainted with the specification, but from what I have worked with they can be pretty lengthy.
Plus its not really JSON and would be hard to parse. VS Code hates it. I think most modern SAST solutions are moving to this format, or support it.
Really what SARIF does is provides a way to describe the static conditions for a CWE. This enables the development for an exploit in the execution state, generating a CVE.
Researchers could generate and verify POCs for the vulnerability, bug, or weakness much faster. If you consider the shift left philosophy, this would shorten the time to remediation and even foster the development auto-remediation technologies.
I also think wherever there is a link to any sourcecode, the same object containing the link MUST include a minimum of two checksums (SHA1, SHA256) to verify the file files integrity.
I briefly looked at some of the other specification and noticed none of them referenced checksum or hashes directly. I just did a fast skim and page search so I could of missed it.
Open sourcing these data objects, they will be abused and manipulated, but keeping it in a JSON format makes for a interesting use case of version controlling a disclosure.
The git commit hash adds a level of integrity to the JSON object as a whole. Adding checksums to verify links to external code allows adds a granular level of integrity within the JSON object itself. Attempts to change the hashes within the JSON would require a new commit disclosing their change. This would make it possible to automate the validation of these changes.
I know you are aware there are major issues with relying on metadata to identify a file or piece of code. To do this correctly it must be done using only the minified file contents of the source code (exclude file metadata when hashing).
For example, A user says they find a CWE in in a file "jquery.1.2.3.js" that came from Repo A, but the file is really "jquery.6.6.6.js" that came from repo B and was manually placed in repo A and renamed.
Right now everything goes through some formal approval process where it can be checked before being publicly released. This is a privilege the GSD would not have. Therefore it must be enforced on external files.
I propose three MUST conditions:
External links to files must included with a minimum of 2 hashes (sha1, sha256)
Hashes must exclude metadata
Hashes must be of the the minified file contents (no white space, tabs, spaces)
I propose 1 SHOULD condition:
CWE should include links to a SARIF file when available
Just some ideas! hope it helps. I could work tooling for the hash validation if we decide to go that route
------------------------------
Sam Jones
Technical Support Engineer - SAST and SCA
Whitesource Software
------------------------------
Original Message:
Sent: Jan 27, 2022 06:08:21 PM
From: Kurt Seifried
Subject: Discussion on future data formats
As for a long-term permissionless distributed yada yada technology base for this data, YES. I don't want anyone to be a gatekeeper (intentionally or otherwise), and the important stuff already runs (messily) by consensus on Twitter using PoH (proof of hashtag =). But first we build using the tools we have and figure out what we need/don't need.
------------------------------
Kurt Seifried
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance
kseifried@cloudsecurityalliance.org
Original Message:
Sent: Jan 27, 2022 01:23:04 PM
From: Sam Jones
Subject: Discussion on future data formats
Hey Guys,
My name is Sam, I'm very excited about this project. My eyes lit up when I heard you guys talking about it on the OSS podcast (One of my favorites). The problem domain has been weighing on my mind ever since I read "This is how they tell me the world ends". The Log4j hit and the first security fix as disabling a configuration? I told my wife it was the equivalent of an attacker breaking into the house, turning the lights off, and standing her standing still. Eventually, they will find the light switch.
I truly believe that this domain should be open-sourced. I think we need it to be decentralized and provide a Vulnerability Disclosure Blockchain that creates an economy to combat 0-day market bounties ethically... Like I said it's something that eats at me and what really got me hyperfocused on this area.
Anyways, here is a quick thought I had...
I think my skill set can contribute to the project. I work for Whitesource Software as a Technical Support Engineer and we provide open source vulnerability scanning and compliance through software composition analysis solutions.
One tool that GitHub acquired is LGTM which provides continuous SAST and scanning on open source repos. They provide the SARIF files that contain that CWEs and links to the CodeQL database on the project's site. This would help the project gain adoption by allowing vendors to easily integrate the response from GSD into their solutions. It would be very useful to include these links so anyone using SAIRF for SAST or CodeQL could easily have access to how the weakness was discovered. The most we get out of a CVEs/NVD is a link to a website of a thread that started in 2012, slight sarcasm. Like you guys said, Twitter is a better resource for learning about CWEs.
SAIRF Spec: https://docs.oasis-open.org/sarif/sarif/v2.0/sarif-v2.0.html
LGTM: https://lgtm.com/projects/g/apache/logging-log4j2/
------------------------------
Sam Jones
Technical Support Engineer - SAST and SCA
Whitesource Software
Original Message:
Sent: Dec 15, 2021 09:04:17 PM
From: Kurt Seifried
Subject: Discussion on future data formats
So I invented the JSON format used by MITRE/CVE (and for which I apologize, it's a terrible format and missing many things) which is at https://github.com/CVEProject/cve-schema/tree/master/schema.
What do you want to see in a future data format?
E.g. I want a data format that supports both machine-readable (JSON) and human-readable (e.g. formatted text), for example I sort of did this in the CVE JSON format, but for example the text description doesn't support any formatting, or even line returns. So I think in general the solution here is to figure out which fields are human-readable (the description for sure, notes fields as well?) and allow the use of line returns and Markdown.
I also suspect taking some inspiration from JSON-LD (https://json-ld.org/) is a good idea but I don't think I fully understand it (so if you're using it, for e.g. events or products please let me know!)
So what does the community need and want to see?
------------------------------
Kurt Seifried
Chief Blockchain Officer and Director of Special Projects
Cloud Security Alliance
kseifried@cloudsecurityalliance.org
------------------------------