Master this deck with 21 terms through effective study methods.
Generated from uploaded pdf
JSON is hierarchical and can contain nested objects and arrays, while CSV is a flat, tabular format with rows and columns.
JSON is semi-structured because it has organizational properties like keys and values that provide a schema, but the content can be flexible and complex, allowing for nested structures.
The nested structure complicates data processing as AI models typically require flat, 2D input. Flattening nested data can lead to an explosion of features, loss of hierarchical relationships, and inconsistent data dimensions.
Flattening can create many separate columns for each nested field, potentially losing context and relationships between features, and resulting in sparse data with many missing values.
JSON is preferred when dealing with complex data structures that require nesting, such as user profiles with multiple attributes, while CSV is suitable for simpler, flat data.
Unstructured data includes formats like text reviews, images, videos, and audio files that do not have a predefined data model or structure.
JSON can store user preferences, such as dietary restrictions and cuisine preferences, which can be analyzed to provide tailored recommendations based on individual profiles.
Variable length issues occur when different users have different numbers of attributes (e.g., phone numbers or addresses), leading to inconsistent data dimensions that can complicate analysis and model training.
Metadata provides additional context about the data, such as schema definitions, encoding, and relationships between objects, ensuring compatibility and aiding in data processing.
JSON's ability to nest objects allows for the representation of complex relationships, such as parent-child hierarchies, which can be crucial for understanding data context.
Temporal metadata, such as timestamps and versioning, is important for tracking data freshness and changes over time, which is critical for real-time applications.
JSON's structured format with clear key-value pairs enhances data interoperability by providing a common format that can be easily parsed and understood by different systems.
Potential downsides include increased complexity in data parsing, larger file sizes compared to CSV, and challenges in querying nested data without specialized tools.
Unstructured data is useful for analysis when qualitative insights are needed, such as sentiment analysis from text reviews or image analysis for visual content.
Schema definition helps ensure that the data adheres to a specific structure, which can facilitate validation, parsing, and integration with other systems.
AI models can use relationship metadata to understand connections between different data points, enhancing the model's ability to make informed predictions based on contextual relationships.
Normalization processes help to flatten nested structures into a usable format for AI models, ensuring that data is consistent and ready for analysis.
Encoding, such as UTF-8, is important to ensure that all characters are correctly represented and that the data can be processed without errors across different systems.
JSON is lightweight, easy to read, and can represent complex data structures, making it an ideal format for API responses that need to convey structured information efficiently.
The flexibility of JSON allows for the representation of diverse data types and structures, enabling more comprehensive data analysis but also requiring careful handling to maintain consistency.
Cache headers are important for managing data freshness and performance in applications, allowing systems to determine when to retrieve new data versus using cached versions.