The Inner Circle

 View Only
Expand all | Collapse all

AI and the CSA knowledge base

  • 1.  AI and the CSA knowledge base

    Posted Nov 12, 2020 01:51:00 PM
    Hi All,

    This post might be a little off-the-wall, but bear with me.

    My theory is that in the 11+ years that CSA has been in existence, we have created so much content in a variety of ways that we have solved many issues that organizations are grappling with. The problem is that this information is scattered among whitepapers, webinars, working group meetings, Circle discussion threads, chapter conferences, podcasts, etc., and there isn't an easy way to connect the dots.

    The idea I would like to explore would be to see if some type of AI system could ingest CSA content in all of its formats and data structures and help us organize this information and gain insights into the industry problems and solutions we are facing. I have had initial discussions with a company about this, but I would like to broaden the discussion and perhaps get some expertise from the Circle community to weigh in.

    Thanks!

    ------------------------------
    Jim Reavis CCSK
    Cloud Security Alliance
    Bellingham WA
    ------------------------------


  • 2.  RE: AI and the CSA knowledge base

    Posted Nov 13, 2020 07:09:00 AM
    Jim this can be accomplished though leveraging both structured meta-tags for all data and that of Elastic (free/open source) for discoverable data within documents.  On the OSINT side that usually means a Kibana type query and output but with a little UI/UX/programming DevOps you can achieve a friendly and intuitive interface.  e.g. mark a report as "compliance", "cloud", "malware", or even ATT&CK identifiers etc, along with the elastic term matching of content provides the best of both worlds for such matching and "relevance" on queries for such a document.  AI can be part of that but from a DB architectural perspective I just got done designing that same type of solution for MSSP ops mapped back to specific attribution and large scale analytics for threat identification and response.

    ------------------------------
    Ken Dunham
    President & Founder
    CSA BOI
    Nampa ID
    ------------------------------



  • 3.  RE: AI and the CSA knowledge base

    Posted Nov 13, 2020 08:37:00 AM
    Hi Ken,

    How would we go back and tag 11+ years of content? It seems that some structured documents could be tagged automatically, but not sure what we would do with webinars, videos, podcasts without going back and listening to them all.

    ------------------------------
    Jim Reavis CCSK
    Cloud Security Alliance
    Bellingham WA
    ------------------------------



  • 4.  RE: AI and the CSA knowledge base

    Posted Nov 13, 2020 09:48:00 AM
    Hi Jim, 

    Law firms and libraries are two types of businesses that have similar challenges and there are companies selling AI solutions in the market for making years and years of documents easily accessible. This is one I have heard of but there may be other more mature products out there.

    https://www.microsoft.com/en-us/ai/ai-lab-project-ida

    Haven't looked into this one, but it sounds like it may work
    https://cloud.google.com/document-ai

    Maria

    ------------------------------
    Maria Mendieta
    Hyperproof
    Hyperproof
    ------------------------------



  • 5.  RE: AI and the CSA knowledge base

    Posted Nov 13, 2020 11:39:00 AM
    Legacy can be a high LOE. Going forward you can implement policy and procedure in an Elastic solution for structured and unstructured gains within that design.  For legacy you can attempt a review of original descriptions or authors input for structured data. Another option, which is more common, is to take any legacy data and simply rely upon the unstructured search results to locate data.  Obviously with media like audio there are limitations but for things like Office docs, PPTX, excel, etc, strings are easily searchable.  Podcasts are the greatest challenge - I'd have to look into that to see if there are conversion capabilities and how that's being dealt with as that's not a use case I've addressed for this application in the past; but I would think it wouldn't be that hard; solutions may already exist for automatic transcription conversions of audio?

    ------------------------------
    Ken Dunham
    President & Founder
    CSA BOI
    Nampa ID
    ------------------------------



  • 6.  RE: AI and the CSA knowledge base

    Posted Nov 13, 2020 08:26:00 AM
    Excellent Idea!  In Financial Services Consortia we use "links" to direct persons to the appropriate documents.

    ------------------------------
    Mick Talley
    Director
    University Bancorp
    ------------------------------



  • 7.  RE: AI and the CSA knowledge base

    Posted Nov 13, 2020 01:33:00 PM
    Jim I like the idea. As part of my role in data protection I recently met with BigID, a company that does something similar to what you are describing, though with a bias toward sensitivity of data. However, they do "categorize" in ways that would naturally lead to the ability to make a relationship map. I do wonder if there is simply a need for an improved search capability of existing CSA content, but I expect that you already looked into that.

    ------------------------------
    Paul Rich
    Executive Director
    JPMorgan Chase & Co.
    ------------------------------



  • 8.  RE: AI and the CSA knowledge base

    Posted Nov 17, 2020 02:58:00 PM
    A greatly improved search capability would be a big part of it, I wonder how far we could get without going back and tagging content. The diverse data structures are also a pain point, I think a fair amount of our knowledge is in video/audio. I am thinking a good step would be to carve out a smallish but diverse subset of CSA content to do some piloting with a few solutions and see what we can learn. We have a lot of corporate members with deep capabilities in AI/ML and search that would probably help.

    ------------------------------
    Jim Reavis CCSK
    Cloud Security Alliance
    Bellingham WA
    ------------------------------



  • 9.  RE: AI and the CSA knowledge base

    Posted Nov 17, 2020 03:26:00 PM
    One other thought comes to mind here, and again it is probably obvious to you but since I didn't read it in your post I'll say it: why go back ten years? Technology changes too rapidly for much that is 10 years old to be of use other than as historically interesting. Not saying that the problem is going to go away because of avoiding ingesting/cataloguing and analyzing a big chunk of the content, but the potential cost, time, and hand-wringing could at least be reduced substantially. 3-5 years seems plenty and documentation that is older should probably be treated more simply, perhaps by title, date, author for example.

    ------------------------------
    Paul Rich
    Executive Director
    JPMorgan Chase & Co.
    ------------------------------



  • 10.  RE: AI and the CSA knowledge base

    Posted Nov 17, 2020 03:41:00 PM
    You have a good point, even some practices from a year ago are outdated. On the other hand, the Cap One attack had aspects (SSRF) that are even older than CSA. But yes, I would compromise on a shorter time horizon if it helps bring a solution to the fore.--
    Jim Reavis
    [email protected]
    CEO, Cloud Security Alliance
    +1.360.820.2545



    This e-mail account is used only for work-related purposes; it is not guaranteed that any correspondence sent to this address will be read by the addressee only, as it may be necessary, under certain circumstances, for third parties appointed by the Cloud Security Alliance to access this e-mail account. Please do not send any messages of a personal nature to this address.





  • 11.  RE: AI and the CSA knowledge base

    Posted Nov 17, 2020 04:44:00 PM
    Edited by Tom Carroux Nov 17, 2020 04:57:54 PM
    Jim, Amazon Transcribe might be useful to you. A useful overview video of Amazon Transcribe is available here: https://www.aws.training/Details/Curriculum?id=27155 and then watch the ten minute video module about Transcribe: https://www.aws.training/Details/Video?id=19443 I don't work for AWS but it appears that the use case you describe has already been largely built out by AWS, although their focus is on closed captioning rather than metadata or content description. Additional resources available here: https://aws.amazon.com/blogs/machine-learning/category/artificial-intelligence/amazon-transcribe/ Hope this helps. Tom

    ------------------------------
    Tom Carroux
    CloudMaven
    Silicon Valley
    ------------------------------