Big Data: Principles and Paradigms

Contributor(s):

Material type: Text

TextPublication details: Morgan Kaufmann Publishers 2016 CambridgeDescription: 468pISBN:

9780128053942

Subject(s):

Computer - Data Processing

DDC classification:

005.74 BIG

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Item location	Collection	Call number	Status	Date due	Barcode	Item holds
Book	NIMA Knowledge Centre	9th Floor Reading Zone	General	005.74 BIG (Browse shelf(Opens below))	Available		M0033275

Total holds: 0

Machine generated contents note: ch. 1 BDA = ML + CC
1.1.Introduction
1.2.A Historical Review of Big Data
1.2.1.The Origin of Big Data
1.2.2.Debates of Big Data Implication
1.3.Historical Interpretation of Big Data
1.3.1.Methodology for Defining Big Data
1.3.2.Different Attributes of Definitions
1.3.3.Summary of 7 Types Definitions of Big Data
1.3.4.Motivations Behind the Definitions
1.4.Defining Big Data From 3Vs to 32Vs
1.4.1.Data Domain
1.4.2.Business Intelligent (BI) Domain
1.4.3.Statistics Domain
1.4.4.32 Vs Definition and Big Data Venn Diagram
1.5.Big Data Analytics and Machine Learning
1.5.1.Big Data Analytics
1.5.2.Machine Learning
1.6.Big Data Analytics and Cloud Computing
1.7.Hadoop, HDFS, MapReduce, Spark, and Flink
1.7.1.Google File System (GFS) and HDFS
1.7.2.MapReduce
1.7.3.The Origin of the Hadoop Project
1.7.4.Spark and Spark Stack
1.7.5.Flink and Other Data Process Engines
Contents note continued: 1.7.6.Summary of Hadoop and Its Ecosystems
1.8.ML +CC -> BDA and Guidelines
1.9.Conclusion
References
ch. 2 Real-lime Analytics
2.1.Introduction
2.2.Computing Abstractions for Real-Time Analytics
2.3.Characteristics of Real-Time Systems
2.3.1.Low Latency
2.3.2.High Availability
2.3.3.Horizontal Scalability
2.4.Real-Time Processing for Big Data - Concepts and Platforms
2.4.1.Event
2.4.2.Event Processing
2.4.3.Event Stream Processing and Data Stream Processing
2.4.4.Complex Event Processing
2.4.5.Event Type
2.4.6.Event Pattern
2.5.Data Stream Processing Platforms
2.5.1.Spark
2.5.2.Storm
2.5.3.Kafka
2.5.4.Flume
2.5.5.Amazon Kinesis
2.6.Data Stream Analytics Platforms
2.6.1.Query-Based EPSs
2.6.2.Rule-Oriented EPSs
2.6.3.Programmatic EPSs
2.7.Data Analysis and Analytic Techniques
2.7.1.Data Analysis in General
2.7.2.Data Analysis for Stream Applications
Contents note continued: 2.8.Finance Domain Requirements and a Case Study
2.8.1.Real-Time Analytics in Finance Domain
2.8.2.Selected Scenarios
2.8.3.CEP Application as a Case Study
2.9.Future Research Challenges
ch. 3 Big Data Analytics for Social Media
3.1.Introduction
3.2.NLP and Its Applications
3.2.1.Language Detection
3.2.2.Named Entity Recognition
3.3.Text Mining
3.3.1.Sentiment Analysis
3.3.2.Trending Topics
3.3.3.Recommender Systems
3.4.Anomaly Detection
Acknowledgments
ch. 4 Deep Learning and Its Parallelization
4.1.Introduction
4.1.1.Application Background
4.1.2.Performance Demands for Deep Learning
4.1.3.Existing Parallel Frameworks of Deep Learning
4.2.Concepts and Categories of Deep Learning
4.2.1.Deep Learning
4.2.2.Mainstream Deep Learning Models
4.3.Parallel Optimization for Deep Learning
4.3.1.Convolutional Architecture for Fast Feature Embedding
Contents note continued: 4.3.2.DistBelief
4.3.3.Deep Learning Based on Multi-GPUs
4.4.Discussions
4.4.1.Grand Challenges of Deep Learning in Big Data
4.4.2.Future Directions
ch. 5 Characterization and Traversal of Large Real-World Networks
5.1.Introduction
5.2.Background
5.3.Characterization and Measurement
5.4.Efficient Complex Network Traversal
5.4.1.HPC Traversal of Large Networks
5.4.2.Algorithms for Accelerating AS-BFS on GPU
5.4.3.Performance Study of AS-BFS on GPU's
5.5.k-Core-Based Partitioning for Heterogeneous Graph Processing
5.5.1.Graph Partitioning for Heterogeneous Computing
5.5.2.k-Core-Based Complex-Network Unbalanced Bisection
5.6.Future Directions
5.7.Conclusions
ch. 6 Database Techniques for Big Data
6.1.Introduction
6.2.Background
6.2.1.Navigational Data Models
6.2.2.Relational Data Models
6.3.NoSQL Movement
Contents note continued: 6.4.NoSQL Solutions for Big Data Management
6.5.NoSQL Data Models
6.5.1.Key-Value Stores
6.5.2.Column-Based Stores
6.5.3.Graph-Based Stores
6.5.4.Document-Based Stores
6.6.Future Directions
6.7.Conclusions
ch. 7 Resource Management in Big Data Processing Systems
7.1.Introduction
7.2.Types of Resource Management
7.2.1.CPU and Memory Resource Management
7.2.2.Storage Resource Management
7.2.3.Network Resource Management
7.3.Big Data Processing Systems and Platforms
7.3.1.Hadoop
7.3.2.Dryad
7.3.3.Pregel
7.3.4.Storm
7.3.5.Spark
7.3.6.Summary
7.4.Single-Resource Management in the Cloud
7.4.1.Desired Resource Allocation Properties
7.4.2.Problems for Existing Fairness Policies
7.4.3.Long-Term Resource Allocation Policy
7.4.4.Experimental Evaluation
7.5.Multiresource Management in the Cloud
7.5.1.Resource Allocation Model
7.5.2.Multiresource Fair Sharing Issues
Contents note continued: 7.5.3.Reciprocal Resource Fairness
7.5.4.Experimental Evaluation
7.6.Related Work on Resource Management
7.6.1.Resource Utilization Optimization
7.6.2.Power and Energy Cost Saving Optimization
7.6.3.Monetary Cost Optimization
7.6.4.Fairness Optimization
7.7.Open Problems
7.7.1.SLA Guarantee for Applications
7.7.2.Various Computation Models and Systems
7.7.3.Exploiting Emerging Hardware
7.8.Summary
ch. 8 Local Resource Consumption Shaping: A Case for MapReduce
8.1.Introduction
8.2.Motivation
8.2.1.Pitfalls of Fair Resource Sharing
8.3.Local Resource Shaper
8.3.1.Design Philosophy
8.3.2.Splitter
8.3.3.The Interleave MapReduce Scheduler
8.4.Evaluation
8.4.1.Experiments With Hadoop 1.x
8.4.2.Experiments With Hadoop 2.x
8.5.Related Work
8.6.Conclusions
Appendix CPU Utilization With Different Slot Configurations and LRS
Contents note continued: ch. 9 System Optimization for Big Data Processing
9.1.Introduction
9.2.Basic Framework of the Hadoop Ecosystem
9.3.Parallel Computation Framework: MapReduce
9.3.1.Improvements of MapReduce Framework
9.3.2.Optimization for Task Scheduling and Load Balancing of MapReduce
9.4.Job Scheduling of Hadoop
9.4.1.Built-In Scheduling Algorithms of Hadoop
9.4.2.Improvement of the Hadoop Job Scheduling Algorithm
9.4.3.Improvement of the Hadoop Job Management Framework
9.5.Performance Optimization of HDFS
9.5.1.Small File Performance Optimization
9.5.2.HDFS Security Optimization
9.6.Performance Optimization of HBase
9.6.1.HBase Framework, Storage, and Application Optimization
9.6.2.Load Balancing of HBase
9.6.3.Optimization of HBase Configuration
9.7.Performance Enhancement of Hadoop System
9.7.1.Efficiency Optimization of Hadoop
9.7.2.Availability Optimization of Hadoop
Contents note continued: 9.8.Conclusions and Future Directions
ch. 10 Packing Algorithms for Big Data Replay on Multicore
10.1.Introduction
10.2.Performance Bottlenecks
10.2.1.Hadoop/MapReduce Performance Bottlenecks
10.2.2.Performance Bottlenecks Under Parallel Loads
10.2.3.Parameter Spaces for Storage and Shared Memory
10.2.4.Main Storage Performance
10.2.5.Shared Memory Performance
10.3.The Big Data Replay Method
10.3.1.The Replay Method
10.3.2.Jobs as Sketches on a Timeline
10.3.3.Performance Bottlenecks Under Replay
10.4.Packing Algorithms
10.4.1.Shared Memory Performance Tricks
10.4.2.Big Data Replay at Scale
10.4.3.Practical Packing Models
10.5.Performance Analysis
10.5.1.Hotspot Distributions
10.5.2.Modeling Methodology
10.5.3.Processing Overhead Versus Bottlenecks
10.5.4.Control Grain for Drop Versus Drag Models
10.6.Summary and Future Directions
Contents note continued: ch. 11 Spatial Privacy Challenges in Social Networks
11.1.Introduction
11.2.Background
11.3.Spatial Aspects of Social Networks
11.4.Cloud-Based Big Data Infrastructure
11.5.Spatial Privacy Case Studies
11.6.Conclusions
ch. 12 Security and Privacy in Big Data
12.1.Introduction
12.2.Secure Queries Over Encrypted Big Data
12.2.1.System Model
12.2.2.Threat Model and Attack Model
12.2.3.Secure Query Scheme in Clouds
12.2.4.Security Definition of Index-Based Secure Query Techniques
12.2.5.Implementations of Index-Based Secure Query Techniques
12.3.Other Big Data Security
12.3.1.Digital Watermarking
12.3.2.Self-Adaptive Risk Access Control
12.4.Privacy on Correlated Big Data
12.4.1.Correlated Data in Big Data
12.4.2.Anonymity
12.4.3.Differential Privacy
12.5.Future Directions
12.6.Conclusions
Contents note continued: ch. 13 Location Inferring in Internet of Things and Big Data
13.1.Introduction
13.2.Device-Based Sensing Using Big Data
13.2.1.Introduction
13.2.2.Approach Overview
13.2.3.Trajectories Matching
13.2.4.Establishing the Mapping Between Floor Plan and RSS Readings
13.2.5.User Localization
13.2.6.Graph Matching Based Tracking
13.2.7.Evaluation
13.3.Device-Free Sensing Using Big Data
13.3.1.Customer Behavior Identification
13.3.2.Human Object Estimation
13.4.Conclusion
Acknowledgements
ch. 14 A Framework for Mining Thai Public Opinions
14.1.Introduction
14.2.XDOM
14.2.1.Data Sources
14.2.2.DOM System Architecture
14.2.3.MapReduce Framework
14.2.4.Sentiment Analysis
14.2.5.Clustering-Based Summarization Framework
14.2.6.Influencer Analysis
14.2.7.AsKDOM: Mobile Application
14.3.Implementation
14.3.1.Server
14.3.2.Core Service
14.3.3.I/O
14.4.Validation
Contents note continued: 14.4.1.Validation Parameter
14.4.2.Validation method
14.4.3.Validation results
14.5.Case Studies
14.5.1.Political Opinion: #prayforthailand
14.5.2.Bangkok Traffic Congestion Ranking
14.6.Summary and Conclusions
ch. 15 A Case Study in Big Data Analytics: Exploring Twitter Sentiment Analysis and the Weather
15.1.Background
15.2.Big Data System Components
15.2.1.System Back-End Architecture
15.2.2.System Front-End Architecture
15.2.3.Software Stack
15.3.Machine-Learning Methodology
15.3.1.Tweets Sentiment Analysis
15.3.2.Weather and Emotion Correlation Analysis
15.4.System Implementation
15.4.1.Home Page
15.4.2.Sentiment Pages
15.4.3.Weather Pages
15.5.Key Findings
15.5.1.Time Series
15.5.2.Analysis with Hourly Weather Data
15.5.3.Analysis with Daily Weather Data
15.5.4.DBSCAN Cluster Algorithm
15.5.5.Straightforward Weather Impact on Emotion
Contents note continued: 15.6.Summary and Conclusions
ch. 16 Dynamic Uncertainty-Based Analytics for Caching Performance Improvements in Mobile Broadband Wireless Networks
16.1.Introduction
16.1.1.Big Data Concerns
16.1.2.Key Focus Areas
16.2.Background
16.2.1.Cellular Network and VoD
16.2.2.Markov Processes
16.3.Related Work
16.4.VoD Architecture
16.5.Overview
16.6.Data Generation
16.7.Edge and Core Components
16.8.INCA Caching Algorithm
16.9.QoE Estimation
16.10.Theoretical Framework
16.11.Experiments and Results
16.11.1.Cache Hits With Nu, Nc, Nm and k
16.11.2.QoE Impact With Prefetch Bandwidth
16.11.3.User Satisfaction With Prefetch Bandwidth
16.12.Synthetic Dataset
16.12.1.INCA Hit Gain
16.12.2.QoE Performance
16.12.3.Satisfied Users
16.13.Conclusions and Future Directions
Contents note continued: ch. 17 Big Data Analytics on a Smart Grid: Mining PMU Data for Event and Anomaly Detection
17.1.Introduction
17.2.Smart Grid With PMUs and PDCs
17.3.Improving Traditional Workflow
17.4.Characterizing Normal Operation
17.5.Identifying Unusual Phenomena
17.6.Identifying Known Events
17.7.Related Efforts
17.8.Conclusion and Future Directions
ch. 18 eScience and Big Data Workflows in Clouds: A Taxonomy and Survey
18.1.Introduction
18.2.Background
18.2.1.History
18.2.2.Grid-Based eScience
18.2.3.Cloud Computing
18.3.Taxonomy and Review of eScience Services in the Cloud
18.3.1.Infrastructure
18.3.2.Ownership
18.3.3.Application
18.3.4.Processing Tools
18.3.5.Storage
18.3.6.Security
18.3.7.Service Models
18.3.8.Collaboration
18.4.Resource Provisioning for eScience Workflows in Clouds
18.4.1.Motivation
18.4.2.Our Solution
18.5.Open Problems

There are no comments on this title.

to post a comment.