Data Engineering Community (DEC)
@data_dec
Followers
2K
Following
332
Media
345
Statuses
1K
A non-profit organisation providing Data Engineers a supportive and collaborative platform.
Nigeria
Joined April 2024
๐ Data Engineering Community is a vibrant hub for anyone passionate about data engineering, from aspiring data engineers to seasoned professionals. We provide a supportive and collaborative platform for learning, A thread๐งต #DataEngjneering
2
7
26
๐ Spark brings multiple data paradigms, batch, stream, graph, and SQL into one scalable ecosystem. #dataengineering #apachespark #graphx #sql #batchprocesssing #streamingprocessing
0
0
0
โ Spark SQL: This queries structured data using SQL syntax or DataFrames. The Catalyst Optimiser ensures fast query execution. โ GraphX: Enables distributed graph computation for use cases like social network analysis or route optimisation.
1
0
0
processing when data doesnโt change rapidly. โ Streaming Processing: This handles real-time data from IoT sensors, clicks, and financial transactions. Spark Structured Streaming treats live data as an unbounded table that continuously grows.
1
0
0
๐กWhat are the capabilities of Apache Spark? Apache Spark isnโt a single-purpose tool. Itโs a unified platform that can handle different types of data workloads: โ Batch Processing: Process large, static datasets โ e.g., daily logs, transaction histories. Use batch
1
0
0
troubleshoot jobs, and design better data pipelines.
0
0
1
โ The cluster manager is the kitchen supervisor (assigns resources). โ The executors are the line cooks (prepare the food). โ The DAG is the recipe,ย the optimized plan of steps to follow. ๐๐จ๐ญ๐: Understanding this architecture helps you tune performance,
1
0
0
โ DAG (Directed Acyclic Graph): Spark doesnโt just run line by line, it builds a logical plan (DAG), optimizes it, and executes it efficiently. ๐ Simple Analogy: Imagine a busy restaurant: โ The driver is the head chef (decides whatโs cooked).
1
0
0
managers are YARN, Kubernetes, or Spark Standalone. โ Executors: These are worker nodes that actually perform computations and store intermediate data in memory. โ Tasks and Jobs: Spark breaks your operations into stages and tasks that run in parallel across executors.
1
0
0
kicks into action: โ Driver Program: This is where your code starts. It defines transformations, actions, and creates the SparkSession. - Think of it as the brain that plans and coordinates your work. โ Cluster Manager: Allocates resources and manages executors. Commo
1
0
0
๐ก๐๐ง๐๐๐ซ๐ฌ๐ญ๐๐ง๐๐ข๐ง๐ ๐๐ฉ๐๐ซ๐คโ๐ฌ ๐๐ซ๐๐ก๐ข๐ญ๐๐๐ญ๐ฎ๐ซ๐ To use Spark effectively, you need to understand its architecture, how it runs your jobs behind the scenes. When you submit a Spark job (e.g., spark-submit or PySpark script), Sparkโs architecture
1
3
4
0
0
1
In essence, Spark is like the backbone of modern data engineering.ย Itโs what powers engines, fraud detection, real-time dashboards, and ETL pipelines at companies like Netflix, Uber, and Amazon. Note: If you canโt process data fast enough, you canโt react fast enough.
1
0
0
โ Scalability: Handles workloads across thousands of nodes. โ Versatility: Supports multiple languages likeย Python, SQL, Java, Scala, and R. โ Flexibility: Works with both batch (historical) and streaming (real-time) data.
1
0
0
for small to medium datasets, but they fall short when dealing with gigabytes or terabytes of data. Thatโs why Apache Spark was born. ๐ Itโs an open-source distributed computing engine designed for: โ Speed: Performs in-memory computation, reducing disk I/O.
1
0
0
๐ก๐๐ก๐ฒ ๐๐จ๐๐ฌ ๐๐ฉ๐๐๐ก๐ ๐๐ฉ๐๐ซ๐ค ๐๐๐๐ฅ๐ฅ๐ฒ ๐๐ฑ๐ข๐ฌ๐ญ? We live in a world where data is generated faster than ever, including transactions, IoT signals, social media clicks, sensor data, and system logs. Traditional tools like Excel, SQL, and pandas are powerful
1
6
6
What started as a DataFestHackathon is now published in the proceedings of the AI/Robotics Conference! Happy to see my name proudly listed alongside my co-authors @josh_bori @Mkm_world @josh_salako And even more exciting, our work has been selected for an oral presentation at
9
28
162
Core Data Engineers at the Data Engineering Community (DEC) Meetup! In September, members of the Core Data Engineers family, past and current participants, instructors, and team members showed up and represented at the @data_dec Meetup!
1
6
11
Over the weekend, I had the privilege of speaking at an event hosted by @IbomData on โPeople, Pipeline, and Possibilities.โ The topic was to remind everyone that even with the rapid growth of AI, the human factor in Data & Analytics Engineering remains irreplaceable.
2
2
6