Description
This certification is supposed for IBM Big Data Engineers. The Big Data Engineer works immediately with the Data Architect and hands-on Developers to transform the architect’s Big Data imaginative and prescient and blueprint right into a Big Data reality. The Data Engineer possesses a deep degree of technical expertise and enjoy throughout a big range of merchandise and technology. A Data Engineer is aware a way to practice technology to resolve huge records problems, and has the cappotential to construct massive-scale records processing structures for the enterprise. Data engineers develop, keep, check and examine huge records answers inside organizations. They offer enter at the wanted hardware and software program to the architects.
Big Data Engineers consciousness on collecting, parsing, handling and studying massive records units, as a way to offer the proper records units and visible gear for evaluation to the records scientists. They apprehend the complexity of records and might take care of specific records variety (structured, semi-structured, unstructured), volume, velocity (consisting of circulation processing), and veracity. They additionally deal with the data governance and safety demanding situations related to the records. They have an excellent heritage in software program engineering and sizeable programming and scripting enjoy.
To gain the IBM Certified Big Data Engineer, applicants have to byskip one check. To benefit extra expertise and skills, and put together for the check primarily based totally at the activity roles and check objectives, click on at the hyperlink to the check beneath and talk to the Test Preparation tab.
Recommended Skills
Understand the records layer and specific regions of ability challenge/chance withinside the records layer
Ability to translate useful necessities into technical specifications.
Ability to take common solution/logical structure and offer bodily structure.
Understand Cluster Management
Understand Network Requirements
Understand Important interfaces
Understand Data Modeling
Ability to identify/guide non-useful necessities for the solution
Understand Latency
Understand Scalability
Understand High Availability
Understand Data Replication and Synchronization
Understand Disaster Recovery
Understand Overall overall performance (Query Performance, Workload Management, Database Tuning)
Propose advocated and/or nice practices concerning the movement, manipulation, and garage of records in a huge records solution (consisting of, however now no longer confined to:
Understand Data ingestion technical alternatives
Understand Data garage alternatives and ramifications (for example , apprehend the extra necessities and demanding situations delivered through records withinside the cloud)
Understand Data querying techniques & availability to guide analytics
Understand Data lineage and records governance
Understand Data variety (social, system records) and records volume
Understand/Implement and offer steerage round records safety to guide implementation, consisting of however now no longer confined to:
Understand LDAP Security
Understand User Roles/Security
Understand Data Monitoring
Understand Personally Identifiable Information (PII) Data Security concerns
Software regions of important consciousness:
BigInsights
BigSQL
Hadoop
Cloudant (NoSQL)
Software regions of peripheral consciousness:
Information Server
Integration with BigInsights, Balanced Optimization for Hadoop, JAQL Push down capability, etc
Data Governance
Security functions of BigInsights
Information Server (MetaData Workbench for Lineage)
Optim Integration with BigInsights (archival)
DataClick for BigInsights (Future: DataClick for Cloudant – to tug operation records into Hadoop for Analytics – scripts to be had today)
BigMatch (seeking to get to a unmarried view)
Guardium (monitoring)
Analytic Tools (SPSS)
BigSheets
Support in Hadoop/BigInsights
Data Availability and Querying Support
Streams
Interface/Integration with BigInsights
Streaming Data Concepts
In reminiscence analytics
Netezza
DB2 BLU
Graph Databases
Machine Learning (System ML)
Requirements
Exam C2090-101: IBM Big Data Engineer
Each check:
1. carries questions requiring unmarried and more than one solutions. For more than one-solution questions, you want to pick out all required alternatives to get the solution correct. You could be suggested what number of alternatives make up the ideal solution.
2. is designed to offer diagnostic remarks at the Examination Score Report, correlating lower back to the check objectives, informing the check taker how she or he did on every phase of the check. As a result, to keep the integrity of every check, questions and solutions aren’t distributed
Exam Objectives
Please be aware this examination has been withdrawn
This check includes 5 sections containing a complete of fifty three more than one-desire questions. The chances after every phase name replicate the approximate distribution of the whole query set throughout the sections.
Number of questions: 53
Number of inquiries to byskip: 34
Time allowed: 75 minutes
Status: Withdrawn
Section 1: Data Loading34%
Load unstructured records into InfoSphere BigInsights
Import streaming records into Hadoop the use of InfoSphere Streams
Create a BigSheets workbook
Import records into Hadoop and create Big SQL desk definitions
Import records to HBase
Import records to Hive
Use Data Click to load from relational reassets into InfoSphere BigInsights with a self-provider process
Extract records from a relational supply the use of Sqoop
Load log records into Hadoop the use of Flume
Insert records through IBM General Parallel File System (GPFS) Posix document gadget API
Load records with Hadoop command line utility
Section 2: Data Security8%
Keep records steady inside PCI standards
Uses masking (e.g. Optim, Big SQL), and redaction to defend touchy records
Section 3: Architecture and Integration17%
Implement MapReduce
Evaluate use instances for choosing Hive, Big SQL, or HBase
Create and/or question a Solr index
Evaluate use instances for choosing ability document codecs (e.g. JSON, CSV, Parquet, Sequence, etc..)
Utilize Apache Hue for seek visualization
Section 4: Performance and Scalability15%
Use Resilient Distributed Dataset (RDD) to enhance MapReduce overall performance
Choose document codecs to optimize overall performance of Big SQL, JAQL, etc.
Make particular overall performance tuning selections for Hive and HBase
Analyze overall performance concerns while the use of Apache Spark
Section 5: Data Preparation, Transformation, and Export26%
Use Jaql question techniques to convert records in InfoSphere BigInsights
Capture and prep social records for analytics
Integrating SPSS version scoring in InfoSphere Streams
Implement entity decision inside a Big Data platform (e.g. Big Match)
Utilize Pig for records transformation and records manipulation
Use Big SQL to convert records in InfoSphere BigInsights
Export processing outcomes out of Hadoop (e.g. DataClick, DataStage, etc.)
Utilize steady areas in InfoSphere Streams to make certain as a minimum as soon as processing
Reviews
0.0 Average Rating Rated ( 0 Review )
There are no reviews yet.