Archive for the ‘Python’ Category

Using Python to unearth a goldmine of threat intelligence from leaked chat logs

June 1st, 2022 No comments

Dealing with a great amount of data can be time consuming, thus using Python can be very powerful to help analysts sort information and extract the most relevant data for their investigation. The open-source tools library, MSTICPy, for example, is a Python tool dedicated to threat intelligence. It aims to help threat analysts acquire, enrich, analyze, and visualize data.

This blog provides a workflow for deeper data analysis and visualization using Python, as well as for extraction and analysis of indicators of compromise (IOCs) using MSTICPy. Data sets from the February 2022 leak of data from the ransomware-as-a-service (RaaS) coordinated operation called “Conti” is used as case study.

An interactive Jupyter notebook with related data is also available for analysts interested to do further data exploration.

This research aims to provide a view into research methodology that may help other analysts apply Python to threat intelligence. Analysts can reuse the code and continue to explore the extracted information. Additionally, it offers an out-of-the-box methodology for analyzing chat logs, extracting IOCs, and improving threat intelligence and defense process using Python.

Using Python to analyze the Conti network

On February 28, 2022, a Twitter account named @ContiLeaks (allegedly a Ukrainian researcher) began posting leaked Conti data on Twitter. The leaked data sets, which were posted in a span of several months, consisted of chat logs, source codes, and backend applications.

For this research, we focused our analysis on the chat logs, which revealed crucial information about the Conti group’s operating methods, infrastructure, and organizational structure.

Compiling and translating chat logs

The leaked chat logs are written in the Russian language. To make the analysis more accessible, we adopted the methodology published here and translated the logs to English.

The chat logs revealed that the Conti group uses the messaging application Jabber to communicate among members. Since raw Jabber logs are saved using a file per day, they can be compiled in one JSON file so they can easily be manipulated with Python. Once the data is merged, they can be translated using the deep translator library. After the logs are translated and loaded into a new file, it’s then possible to load the data into a dataframe for manipulation and exploration:

df = pd.read_json('translated_Log2.json', 'r', 'utf-8'))
A screenshot of a table of chat messages translated from Russian to English. The table includes details when the message was sent, who it was from, to whom it was sent, the original text in Russian, and the translated English version.
Figure 1. Translated logs

Russian slang words not properly translated by the automated process can be translated by creating a dictionary. A dictionary off a list proposed here was used in this case to correctly translate the slang:

A screenshot of Python code that creates a dictionary which can be used to translate Russian slang words. It features a list of Russian slang words and their English translation.
Figure 2. Translating slang

Analyzing the chat activity timeline

One way to get insights from chat logs is to see its timeline and check the number of discussions per day. The Bokeh library can be used to build an interactive diagram and explore the loaded dataframe.

A screenshot of Python code for exploring data using the Bokeh library. It shows code for filtering results, creating diagrams, and adding hover tools.
Figure 3. Python code for exploring discussions

Using the data from Conti chat logs generates the following diagram, which shows the volume of Jabber discussions over time:

A line graph that shows a volume of discussions within the Conti group from March 2021 to March 2022. The data shows several peaks in activity, mostly concentrated from September to December 2021.
Figure 4. Volume of discussions over time

Visualizing the data as a timeline shows some peaks of activity that align to certain events. In the case of the Conti leaks, for example:

  • July 7, 2021 (615 discussions): Ransomware attack by REvil against software company Kaseya
  • August 27, 2021 (1,289 discussions): The playbook of a specific Conti affiliate was leaked
  • August 32, 2021 (1,156 discussions): FBI CISA advisory on ransomware and labor day
  • August 10, 2021 (853 discussions): Ransomware attack by Conti against Meyer Corporation

It’s interesting that no peak in chat activity was observed within the Conti group after the first leak, which could indicate that the breach was ignored or not known by the group at that time.

Analyzing the level of user activity

When analyzing chat logs, identifying the number of users and analyzing the most active ones can provide insight into the size of the group and roles of users within it. Using Python, the list of users can be extracted and saved in a text file:

A screenshot of Python code that extracts the list of users from the Conti chat logs. It also shows code to remove duplicates from the list, concatenate the dataframe, and save the list to a text file.
Figure 5. Extracting list of users

Running the script above using the Conti chat logs yielded a list of 346 unique accounts. This list can then be used to create a graph and show which users sent the most messages.

A screenshot of Python code that creates a graph to show the list of users from the Conti chat logs with the most messages.
Figure 6. Creating a graph for users with most messages
A bar chart that compares the users from the Conti chat logs based on the number of messages they sent. The bar shows that the most active user sent as many as more than eight thousand messages.
Figure 7. Most active users in the Conti chat logs

Based on the graph, the users named defender, stern, driver, bio, and mango have the largest number of discussions. Checkpoint  published extensive research on the structure of the organization and correlated the user discussions with several roles and services like human resources, coders, crypters, offensive team, SysAdmins, and more.

Mapping the users’ connections

Another way to analyze chat log data is to visualize the users’ connection. This can be done by creating a dynamic network graph that can highlight the connections between users. The Barnes Hut algorithm and the Pyvis library can be used to visualize this data.

A screenshot of Python code that creates a dynamic network graph of the Conti chat log data using the Barnes Hut algorithm and Pyvis library.
Figure 8. Creation a dynamic graph

Dynamic visualization shows a graphical overview of the network and allows zooming into the network to closely analyze the connections within. Bigger points represent the most active users, and it’s possible to highlight a user to analyze their connections. Additionally, the hovering tool shows which other users a specific user had conversations with.

A screenshot of a dynamic visualization showing a graphical overview of the Conti network. It shows users as points in the graph, all connected by lines that represent conversations between them.
Figure 9. Conti user network overview
A screenshot of a list of connections to the user named "Stern" from the Conti network. The graphical overview of the entire Conti network is shown on the background of the list.
Figure 10. Connections to user ‘Stern’

Searching for other topics of interest

Since reading data sets can be time-consuming, a simple search engine can be built to search for specific strings in the chat logs or to filter for topics of interest. For the Conti leak data, examples of these include Bitcoin, usernames, malware names, exploits, and CVEs, to name a few.

The following code snippet provides a simple search engine using the TextSearch library:

A screenshot of Python code that creates a search engine using the TextSearch library from Github. The code presents configuration options for the search engine widget and filters for search results.
Figure 11. Search engine using Python

Using MSTICPy to extract and analyze IOCs

Besides processing chat logs to analyze user activity and connections, Python can also be used to extract and analyze threat intelligence. This section shows how the MSTICPy library can be used to extract IOCs and how it can be used for additional threat hunting and intelligence.

Extracting IOCs

MSTICPy is a Python library used for threat investigation and threat hunting. The library can connect to several threat intelligence providers, as well as Microsoft tools like Microsoft Sentinel. It can be used to query logs and to enrich data. It’s particularly convenient for analyzing IOCs and adding more threat contextualization.

After installing MSTICPy, the first thing to do is to initialize the notebook. This allows the loading of several modules that can be used to extract and enrich the data. External resources like VirusTotal or OTX can also be added by configuring msticpyconfig.yaml and adding the API keys.

The IoCExtract module from MSTICPy offers a convenient way to extract IOCs using predefined regex. The code automatically extracts IOCs such as DNS, URLs, IP addresses, and hashes and then reports them in a new dataframe.

A screenshot of Python code that prepares the dataframe for IOC extraction. It presents code to remove "None" value from the dataframe, as well as to initiate the IOC extractor.
Figure 12. Passing the dataframe to the module for extraction
A screenshot of a sample table listing down IOC patterns found in the Conti chat logs. It includes the following data fields: IOC type, observable, source index, and input.
Figure 13. Sample of extracted IOCs

A regex can be added to filter specific IOCs from those extracted by the IOC extraction module by default. For example, the regex below extracts Bitcoin addresses from the Conti chat logs:

A screenshot of Python code that adds a regex in the IOCExtract module of MSTICPy. This specific regex extracts Bitcoin addresses from the Conti chat logs.
Figure 14. Extracting Bitcoin addresses and adding regex
A screenshot of a table showing the Bitcoin addresses extracted from the Conti chat logs. The table includes the following data fields: IOCtype, observable, source index, and input.
Figure 15. Sample of extracted Bitcoin addresses

After extracting IOCs, the dataframe can be cleaned to remove false positives as well as duplicate data. The final dataframe from the processed Conti chat logs contains the following unique IOC count, (these IOCs require additional analysis as not all of them are considered malicious):

URL DNS IPV4 Bitcoin MD5 SHA-256
1,137 474 317 175 106 16

Investigating UP addresses

 The threat intel lookup module TILookup in MSTICPy can be used to get more information on IOCs such as IP addresses. In the case of the Conti leak, 317 unique IP addresses were identified. Not all these IOCs are malicious but can reveal more relevant information.

The configuration file can be specified to load the TILookup module, along with other threat intelligence providers such as VirusTotal, GreyNoise, and OTX.

A screenshot of Python code that loads the threat intel lookup module in MSTICPy. It also presents code that loads other threat intelligence providers such as VirusTotal, GreyNoise, and OTX, and filters the IOCs by type.
Figure 16. Threat intel provider within MSTICPy

Running the module generates a new dataframe with more context for every IP address provided.

A screenshot of a table generated from running the threat intel lookup module. The table presents a list of IP addresses extracted from the Conti chat logs with related threat intelligence data.
Figure 17. Sample of IP addresses enriched with additional info

The module also allows to request information for a single observable.

A screenshot of Python code that extracts threat intelligence data on a single observable through the threat intel lookup module of MSTICPy.
Figure 18. Extracting information for a single observable
A screenshot of the table generated from running the threat intel lookup module. It shows threat intelligence data from GreyNoise, OTX, and VirusTotal on a particular IP address.
Figure 19. Additional threat context for one IP address

The browser provided by MSTICPy can also be used to explore the IOCs previously enriched. The interactive Jupyter notebook includes this view of the IOCs.

A screenshot of the browser provided by MSTICPy that can be used to explore IOCs extracted from the Conti chat logs. The browser shows threat intelligence details related to a selected IP address.
Figure 20. IOC browser provided by MSTICPy

In addition, MSTICPy has an embedded module that looks up the geolocation of IP addresses using Maxmind, which can be used to create a map of the IP addresses previously extracted.

A screenshot of Python code that looks up the geolocation of IP addresses. It also presents code that creates a map using the generated geolocation data.
Figure 21. Generating the IP geolocation map
A world map with the geolocation of all IPs extracted from the Conti leaks marked with red pins. The image shows that the location of IPs are concentrated in Europe and the US.
Figure 22. Geolocation of IPs extracted from the Conti leaks

Investigating URLs

Extracted URLs from IOC lists can provide details about targets, tools used to exchange information, and the infrastructure used to deploy attacks. A total of 1,137 unique URLs were extracted from the Conti leak dataset, but not all of them are usable for threat intelligence. The following code snippet shows how to filter for URLs.

A screenshot of Python code that filters for URLs among the IOC list.
Figure 23. Filtering the IOCs for URLs
A screenshot of the table generated by filtering URLs from the IOC list. The table includes the following data fields: IOC type, observable, source index, and input.
Figure 24. Sample of URLs extracted

A filter can be created to get details on executables, DLLs, ZIP files, and other files related to the extracted URLs. This can provide interesting insights and can be extracted for further research.

A screenshot of Python code that filters for specific file types related to extracted URLs. The code searches for URLs with .exe, .dll, .jpg, .zip, .7z, .rar, and .png files.
Figure 25. Filtering URLs for specific file formats
A screenshot of a table generated from filtering for URLs related to specific file formats. The table features the following data fields: IOC type, observable, source index, and input.
Figure 26. Sample of URLs delivering extracted file

Using the same technique for filtering, .onion URLs can also be identified from the URL list. This proved particularly useful in this case, since the Conti group used the Tor network for some of their infrastructure.

A screenshot of a table generated by filtering for .onion URLs from the Conti chat log IOCs. The table presents the following data fields: IOC type, observable, source index, and input.
Figure 27. Sample of extracted .onion URLs

Pivoting extracted IOCs using VirusTotal

The use of the pivot function within the MSTICPy library allows enrichment of data and discovery of additional infrastructure and IOC. This is particularly useful for threat intelligence and threat actor tracking. The next sections demonstrate the use of the VirusTotal module VTlookupV3 in MSTICPy to obtain intelligence about an IP address extracted from the Conti leak dataset that was used to deliver additional malware.

The following code initiates the VTlookupV3 in MSTICPy:

A screenshot of Python code that initiates the VirusTotal module in MSTICPy.
Figure 28. Configuring the VirusTotal module in MSTICPy

The VirusTotal module can be used to get data related to a particular IOC. The code below searches for files downloaded from a particular IP address from the Conti leak dataset:

A screenshot of Python code that uses the VirusTotal module in MSTICPy to look up files downloaded from a specific IP address.
Figure 29. Getting files downloaded from one IP address

The results show that the IP address 109[.]230[.]199[.]73 delivers several strains of malware.

A screenshot of a table generated from extracting the hashes of files downloaded from a specific IP address.
Figure 30. Hashes related to IP 109[.]230[.]199[.]73

The VirusTotal module can then be used to pivot and extract more information about these hashes. The table below shows information about the first hash on the list:

authentihash 0d10a35c1bed8d5a4516a2e704d43f10d47ffd2aabd9ce9e04fb3446f62168bf
creation_date 1624910154
crowdsourced_ids_results [{[TRUNCATED]’alert_context’: [{‘dest_ip’: ‘’, ‘dest_port’: 53}, {‘dest_ip’: ‘’, ‘dest_port’: 123}], ‘rule_url’: ‘’, ‘rule_source’: ‘Snort registered user ruleset’, ‘rule_id’: ‘1:527’}, {‘rule_category’: ‘not-suspicious’, ‘alert_severity’: ‘low’, ‘rule_msg’: ‘TAG_LOG_PKT’, ‘rule_raw’: ‘alert ( gid:2; sid:1; rev:1; msg:”TAG_LOG_PKT”; metadata:rule-type preproc; classtype:not-suspicious; )’, ‘alert_context’: [{‘dest_ip’: ‘’, ‘dest_port’: 443}], ‘rule_url’: ‘’, ‘rule_source’: ‘Snort registered user ruleset’, ‘rule_id’: ‘2:1’}]
crowdsourced_ids_stats {‘info’: 0, ‘high’: 0, ‘medium’: 2, ‘low’: 1}
downloadable TRUE
exiftool {‘MIMEType’: ‘application/octet-stream’, ‘Subsystem’: ‘Windows GUI’, ‘MachineType’: ‘AMD AMD64’, ‘TimeStamp’: ‘2021:06:28 19:55:54+00:00’, ‘FileType’: ‘Win64 DLL’, ‘PEType’: ‘PE32+’, ‘CodeSize’: ‘115712’, ‘LinkerVersion’: ‘14.16’, ‘ImageFileCharacteristics’: ‘Executable, Large address aware, DLL’, ‘FileTypeExtension’: ‘dll’, ‘InitializedDataSize’: ‘69632’, ‘SubsystemVersion’: ‘6.0’, ‘ImageVersion’: ‘0.0’, ‘OSVersion’: ‘6.0’, ‘EntryPoint’: ‘0x139c4’, ‘UninitializedDataSize’: ‘0’}
first_submission_date 1624917754
last_analysis_date 16365918529
last_analysis_results { [TRUNCATED] ‘20211110’}, ‘Tencent’: {‘category’: ‘undetected’, ‘engine_name’: ‘Tencent’, ‘engine_version’: ‘’, ‘result’: None, ‘method’: ‘blacklist’, ‘engine_update’: ‘20211111’}, ‘Ad-Aware’: {‘category’: ‘malicious’, Edition’: {‘category’: ‘malicious’, ‘engine_name’: ‘McAfee-GW-Edition’, ‘engine_version’: ‘v2019.1.2+3728’, ‘result’: ‘RDN/CobaltStrike’, ‘method’: ‘blacklist’, ‘engine_update’: ‘20211110’}, ‘Trapmine’: {‘category’: ‘type-unsupported’, ‘engine_name’: ‘Trapmine’, ‘engine_version’: ‘’, ‘result’: None, ‘method’: ‘blacklist’, ‘engine_update’: ‘20200727’}, ‘CMC’: {‘category’: ‘undetected’, ‘engine_name’: ‘CMC’, ‘engine_version’: ‘2.10.2019.1’, ‘result’: None, ‘method’: ‘blacklist’, ‘engine_update’: ‘20211026’}, ‘Sophos’: {‘category’: ‘malicious’, ‘engine_name’: ‘Sophos’, ‘engine_version’: ‘’, ‘result’:
last_analysis_stats {‘harmless’: 0, ‘type-unsupported’: 6, ‘suspicious’: 0, ‘confirmed-timeout’: 1, ‘timeout’: 0, ‘failure’: 0, ‘malicious’: 47, ‘undetected’: 19}
last_modification_date 1646895757
last_submission_date 1624917754
magic PE32+ executable for MS Windows (DLL) (GUI) Mono/.Net assembly
md5 55646b7df1d306b0414d4c8b3043c283
meaningful_name 197.dll
names [197.dll, iduD2A1.tmp]
pe_info [TRUNCATED] {‘exports’: [‘StartW’, ‘7c908697e85da103e304d57e0193d4cf’}, {‘name’: ‘.rsrc’, ‘chi2’: 51663.55, ‘virtual_address’: 196608, ‘entropy’: 5.81, ‘raw_size’: 1536, ‘flags’: ‘r’, ‘virtual_size’: 1128, ‘md5’:, ‘GetStringTypeW’, ‘RtlUnwindEx’, ‘GetOEMCP’, ‘TerminateProcess’, ‘GetModuleHandleExW’, ‘IsValidCodePage’, ‘WriteFile’, ‘CreateFileW’, ‘FindClose’, ‘TlsGetValue’, ‘GetFileType’, ‘TlsSetValue’, ‘HeapAlloc’, ‘GetCurrentThreadId’, ‘SetLastError’, ‘LeaveCriticalSection’]}], ‘entry_point’: 80324}
popular_threat_classification {‘suggested_threat_label’: ‘trojan.bulz/shelma’, ‘popular_threat_category’: [{‘count’: 22, ‘value’: ‘trojan’}, {‘count’: 6, ‘value’: ‘downloader’}, {‘count’: 2, ‘value’: ‘dropper’}], ‘popular_threat_name’: [{‘count’: 6, ‘value’: ‘bulz’}, {‘count’: 6, ‘value’: ‘shelma’}, {‘count’: 3, ‘value’: ‘cobaltstrike’}]}
reputation 0
sandbox_verdicts {‘Zenbox’: {‘category’: ‘malicious’, ‘sandbox_name’: ‘Zenbox’, ‘malware_classification’: [‘MALWARE’, ‘TROJAN’, ‘EVADER’]}, ‘C2AE’: {‘category’: ‘undetected’, ‘sandbox_name’: ‘C2AE’, ‘malware_classification’: [‘UNKNOWN_VERDICT’]}, ‘Yomi Hunter’: {‘category’: ‘malicious’, ‘sandbox_name’: ‘Yomi Hunter’, ‘malware_classification’: [‘MALWARE’]}, ‘Lastline’: {‘category’: ‘malicious’, ‘sandbox_name’: ‘Lastline’, ‘malware_classification’: [‘MALWARE’]}}
sha1 ddf0214fbf92240bc60480a37c9c803e3ad06321
sha256 cf0a85f491146002a26b01c8aff864a39a18a70c7b5c579e96deda212bfeec58
sigma_analysis_stats {‘high’: 0, ‘medium’: 1, ‘critical’: 1, ‘low’: 0}
sigma_analysis_summary {‘Sigma Integrated Rule Set (GitHub)’: {‘high’: 0, ‘medium’: 0, ‘critical’: 1, ‘low’: 0}, ‘SOC Prime Threat Detection Marketplace’: {‘high’: 0, ‘medium’: 1, ‘critical’: 0, ‘low’: 0}}
size 181248
ssdeep 3072:fck3rwbtOsN4X1JmKSol6LZVZgBPruYgr3Ig/XZO9:fck3rwblqPgokNgBPr9gA
tags [assembly, invalid-rich-pe-linker-version, detect-debug-environment, long-sleeps, 64bits, pedll]
times_submitted 1
tlsh T110049E14B2A914FBEE6A82B984935611B07174624338DFEF03A4C375DE0E7E15A3EF25
total_votes {‘harmless’: 0, ‘malicious’: 0}
trid [{‘file_type’: ‘Win64 Executable (generic)’, ‘probability’: 48.7}, {‘file_type’: ‘Win16 NE executable (generic)’, ‘probability’: 23.3}, {‘file_type’: ‘OS/2 Executable (generic)’, ‘probability’: 9.3}, {‘file_type’: ‘Generic Win/DOS Executable’, ‘probability’: 9.2}, {‘file_type’: ‘DOS Executable Generic’, ‘probability’: 9.2}]
type_description Win32 DLL
type_extension dll
type_tag pedll
unique_sources 1
Vhash 115076651d155d15555az43=z55

The results indicate that the hash is a Cobalt Strike loader, which means that Conti affiliates also use the penetration testing tool as part of their infrastructure during their operation.

In addition, the VirusTotal module can also provide details such as detection rate, type, description, and other information related to the hashes. The code snippet below generates the list of domains to which the hashes connect to.

A screenshot of Python code that generates the list of domains specific hashes connect to using the VirusTotal module.
Figure 31. Getting contacted domains
A screenshot of a table generated from extracting domains to which certain hashes connected to using the VirusTotal module.
Figure 32. Additional domains retrieved from previously extracted hashes

Doing this kind of analysis on the Conti leak data or similar data sets can lead to the discovery of possibly related domains that were not in the initial data sets.


This blog outlines how Python can be used to find valuable threat intelligence from data sets such as chat logs. It also presents details on how processing data using the MSTICPy library can be useful for enriching and hunting within environments, as well as collecting additional threat context. The interactive notebook provides additional code snippets that can also be used to continue log exploration.

The types of information extracted in this blog provides insights into the various elements of the criminal ecosystem that were coordinating their activities. Threat intelligence from research like this informs products and services like Microsoft 365 Defender, translating knowledge into real-world protection for customers. More importantly, the methodology described in this blog can be adapted to specific threat intelligence services, and the broader community is invited to use it for further analysis, enrichment of data, and intelligence sharing for the benefit of all.

Thomas Roccia
Microsoft 365 Defender Research Team



The post Using Python to unearth a goldmine of threat intelligence from leaked chat logs appeared first on Microsoft Security Blog.

MSTICPy January 2022 hackathon highlights

During the month of January 2022, the Microsoft Threat Intelligence Center (MSTIC) ran its inaugural hackathon for the open-source Jupyter and Python Security Tools library, MSTICPy. We asked the security community for their contributions to expand and improve MSTICPy’s features and capabilities, and we helped contributors shape and deliver their contributions. As MSTICPy is an open-source project, contributions from the community are highly valued and help to make the tools useful and effective. 

The response from the community was fantastic, with engagement and discussions on the future design and direction of MSTICPy, and many awesome contributions that ranged from updated documentation to completely new features. We are incredibly grateful for everyone’s engagement and wanted to take a moment to highlight some of the contributions and extend our sincere thanks to the authors. 

Some of these contributions are already released in MSTICPy 1.6.1, while most of the remaining items will make it into version 1.7.0, to be released in late February 2022. 

Contribution highlights

Data connector for Cybereason (Contributor: Florian Bracq, AXA)

This contribution added a new MSTICPy data provider for the Cybereason endpoint detection and response (EDR) product. This enables Cybereason users to query from a Jupyter Notebook and bring the data back for further analysis. The contribution also includes several pre-defined queries that users can select from.

As part of this work, Florian also added several fixes and improvements to MSTICPy’s core data provider features.

Splunk queries and async support (Contributor: Joey Dreijer (d3vzer0))

MSTICPy’s existing Splunk data provider was expanded with the addition of pre-defined Splunk queries for authentication and alert events, providing users with a much wider set of queries to select from. In addition, query performance was improved with the addition of support for Splunk’s asynchronous query execution.

Replaced Requests with HTTPX (Contributor: Grant Versfeld (@grantversfeld))

MSTICPy has traditionally used the Python Requests package to handle HTTP based connections. However, active development on Requests ended some time ago, and it does not support Python’s asynchronous architecture, so we needed to migrate to another package. Grant’s contribution replaced Requests with HTTPX ensuring that MSTICPy can use the improved performance that async support brings.

IntSights TI provider (Contributor: Florian Bracq, AXA)

Another contribution from Florian saw support for the IntSights Threat Intelligence (TI) platform added to MSTICPy. This feature allows users to see if indicators under investigation appear in the IntSights platform and obtain details about the indicators.

Updated QueryTime widget (Contributor: Jakub Jirasek, Chr. Hansen)

This contribution updated MSTICPy’s existing QueryTime widget to correctly accept time unit changes provided by the user.

Updated Readme (Contributor: danielc-evans)

The Readme file is often the first thing that new users to MSTICPy see, so ensuring it contains all the information they need is key. This update does just that, adding key additional information to the Readme.

Support for Sysmon data in MSTICPy’s process tree (Contributor: Nicolas Bareil (@nbareil))

This update adds schema support that allows users to generate process trees from Sysmon ProcessCreate events. This allows Sysmon users to take advantage of one of MSTICPy’s most powerful visualizations.

Blob storage connection string support (Contributor: Luis Francisco Monge (@Lukky86))

This contribution adds the ability for users to provide a connection string when using MSTICPy’s AzureBlobStorage feature. This provides additional flexibility to users when connecting to the Azure Blog Storage containers.

Our thanks

We would like to thank all the contributors for their efforts during the hackathon. These contributions are great additions to MSTICPy and will make the library more useful and usable.

Wider impact 

In addition, thanks to feedback received from these and others, we (the MSTICPy team) added several new features. These include: 

Pyproject.toml and Setup.cfg 

Thanks to suggestions from Joey Dreijer (d3vzer0), we moved MSTICPy into the modern era by implementing much of the project configuration into setup.cfg and pyproject.toml. This has the side benefit of making some of our tests that check for valid package configuration easier. 

As well as these external contributions, we also worked on a number of new features during the hackathon. Full details of these can be found in the MSTICPy release notes, but below is the summary of these additions: 

  • Support for new Microsoft Sentinel APIs, including adding the ability to create Incidents and interact with Watchlists and Analytics. 
  • Added a new SentinelAlert entity to better handle Sentinel alert objects. 
  • Improved authentication features for Azure elements, allowing users to authenticate against tenants other than their home tenant. This was a first-time contribution by MSTIC member Liam Kirton. 
  • Restructured data provider documentation to make it clearer and easier to read. 
  • Updated the GitHub pipeline to make it simpler for external contributors. 
  • Implemented multiple minor fixes and improvements. 

MSTICPy restructure 

The MSTICPy package has evolved organically and we have been considering a restructure of the package for some time. Thanks to inspiration from Florian Bracq, we set about reorganizing the modules into more a logical structure. These changes will make the structure of MSTICPy more intuitive to users and make sure the package is more easily extensible and maintainable in the future. This restructure will be included in the v2.0.0 release of MSTICPy.


There are several other contributions still being worked on that we will incorporate as soon as they are ready. We will include these in a future release of MSTICPy. You can keep up to date with MSTICPy on GitHub and by following @msticpy on Twitter. 

We plan to run more hackathons in the year, but contributions, ideas, and feedback are welcome at any time. 

The MSTICPy Team (@msticpy

The post MSTICPy January 2022 hackathon highlights appeared first on Microsoft Security Blog.