The MISTRAL project utilizes FAST storage for flow and protocol analysis data for research lab network captures. This data is stored in FAST storage for archival purposes. The MISTRAL team developed a pipeline which collects raw network data from several sensors, separates the data into both raw and normalized files, and then uploads these files to FAST storage via rclone. The data is segmented via several buckets based on both data type as well as research lab. These buckets not only provide logical separation between the various datasets, but also provides more granular access control for the types of data stored.
In addition to using FAST storage for data archives, the MISTRAL team also developed a method for retrieving and analyzing datasets in FAST storage using Python and the Boto3 Python library. This method allows researchers to download data directly from FAST storage to memory, both improving performance of analysis and allowing data to be analyzed without needing to save a local copy of the data. This method was used by both the MISTRAL Code+ and Data+ teams to rapidly analyze and query the MISTRAL datasets stored in FAST storage.
MISTRAL – Massive Internal System Traffic Research Analysis and Logging – NSF Award #2232819
More information about mistral can be found here: MISTRAL Project