PDF Notes: DataScience-module-1-overview_0303

    Master this deck with 30 terms through effective study methods.

    Generated from uploaded pdf

    Created by @dgw

    What is Big Data?

    A process to discover insights from data to solve business problems.

    How does Machine Learning differ from Deep Learning?

    Deep Learning uses neural networks, while Machine Learning does not.

    What are the core components of the Big Data process?

    Objective setting, data curation, inspection, preprocessing, analysis, evaluation, and deployment.

    What is the significance of data preprocessing?

    It accounts for 80% of the time and effort in the Big Data process.

    What happens if data preprocessing is poor?

    It can lead to useless results and unnecessary computations.

    What technologies are used for Big Data?

    Includes database systems, OLAP, data warehousing, and data mining software.

    What is the role of a team in Big Data projects?

    They develop business needs, apply results, and ensure data security and quality.

    What are some application areas of Big Data?

    Business analysis, governance, medicine, and security.

    What is a common misconception about Big Data?

    That it requires large amounts of data to be effective.

    What is the purpose of using Python libraries in Big Data?

    To facilitate data analysis, preprocessing, and machine learning.

    What is the difference between RDB and NoSQL?

    RDB uses normalized relations, while NoSQL focuses on performance and scalability.

    What is the importance of evaluating analysis results?

    To ensure algorithms are not blindly trusted and results are accurate.

    What are the 3Vs of Big Data?

    Volume, variety, and velocity.

    What is the iterative nature of the Big Data process?

    It involves continuous refinement of business needs and data analysis.

    What is the role of visualization in data analysis?

    To graphically represent complex relationships among data.

    What is Big Data?

    Techniques to extract insights from data for business solutions.

    How does Machine Learning differ from Deep Learning?

    Deep Learning uses neural networks; Machine Learning does not.

    What are the core components of the Big Data process?

    Objective setting, data curation, analysis, evaluation, and deployment.

    What is the significance of data preprocessing?

    It accounts for 80% of the time in the Big Data process.

    What happens if data preprocessing is poor?

    It can lead to useless results and unnecessary computations.

    What technologies are used for Big Data?

    Includes NoSQL, Hadoop, and various data analysis software.

    What is the role of a trained team in Big Data?

    They develop business needs and apply results effectively.

    What are some application areas of Big Data?

    Business analysis, governance, medicine, and security.

    What is a common misconception about Big Data?

    That it requires large amounts of data to be effective.

    What is the purpose of using Python libraries in Big Data?

    To facilitate data analysis, preprocessing, and machine learning.

    What is the importance of evaluating analysis results?

    To ensure algorithms are reliable and results are accurate.

    What is the iterative nature of the Big Data process?

    It involves continuous refinement of business needs and data analysis.

    What is the difference between RDB and NoSQL?

    RDB uses normalized relations; NoSQL focuses on performance and scalability.

    What are the consequences of not using Big Data?

    Organizations may miss insights that could enhance decision-making.

    What is the role of visualization in data analysis?

    To graphically represent complex relationships among data.