Have you ever noticed that the data size collected does not end up being the data size hosted in eDiscovery review platforms? The data size expands when ingested into an eDiscovery review platform. This data expansion occurs when electronically stored information (ESI) is processed and loaded by the eDiscovery review platform for analysis and review. This expansion happens for three reasons:
Data Conversion and Processing: During the initial stages of eDiscovery, data is collected from various sources, such as email servers, file shares, databases, and individual devices. This collected data is processed to extract relevant information, remove duplicates, and convert it into a format compatible with the review platform. The processing steps often involve expanding the data size, as metadata, email attachments, text extraction, and other artifacts are added to the dataset.
Metadata Enrichment: Metadata, such as file properties, timestamps, and user information, plays a crucial role in eDiscovery analysis. Review platforms often enrich the metadata during the loading process by adding additional information or recalculating existing metadata. This enrichment can lead to an increase in the data size. For example, email pst files will expand up to two times the original pst size as a result of metadata and unpacking of the files. (LitSmart, 2018)
Indexing and Searchability: To enable efficient searching and analysis of the data, eDiscovery review platforms create indexes of the ingested information. Indexing involves creating a searchable database of terms and their associated locations within the data. The indexing process typically increases the data size as the index itself takes up additional storage space.
The data size expansion in eDiscovery review platforms can have implications for storage capacity, system performance, and overall costs. It requires organizations to consider the scalability of the review platform and plan for efficient data management strategies to handle the expanded dataset effectively.
Legal Eagle offers several tools to mitigate data hosting costs so only the most important data to your matter is hosted. For more information, please contact us at email@example.com.
LitSmart E-Discovery, July 25, 2018, “Part Two of ESI Basics: Processing”