Job Summary
We are seeking a skilled Data Engineer to join our dynamic team. The ideal candidate will play a crucial role in designing, building, and maintaining scalable data pipelines that process real-time data feeds. This position is vital for ensuring data quality and supporting data-driven decision-making across the organization. If you are passionate about data engineering and have a strong background in building data pipelines, we want to hear from you!
Location: We are seeking talents from Brazil, Argentina, Peru, Chile and Colombia.
Job Responsibilities
- Designing and Building Data Pipelines: Create robust and scalable data pipelines that efficiently process real-time data feeds, ensuring high availability and performance.
- Data Quality Assurance: Implement data validation and quality checks to ensure the integrity and accuracy of data throughout the pipeline.
- Collaboration with Analytics Teams: Work closely with analytics teams to understand their data needs and provide them with the necessary data infrastructure to support their analyses and reporting.
- Utilizing Cloud Technologies: Leverage cloud storage solutions, particularly Amazon S3, to store and manage large volumes of structured and unstructured data.
- Distributed Computing: Utilize Apache EMR (Elastic MapReduce) for distributed computing on big datasets, ensuring efficient processing and analysis.
- Documentation and Best Practices: Maintain comprehensive documentation of data pipelines and processes, and adhere to best practices in data governance and engineering.
Basic Qualifications
Must-Have Skills
- Python: Proficiency in Python programming language for scripting purposes within ETL processes. Experience in writing efficient and reusable code is essential.
- SQL: Strong expertise in SQL for querying and manipulating data. Ability to write complex queries and optimize them for performance.
- Spark: Experience with Apache Spark for processing big datasets efficiently. Familiarity with Spark’s data processing capabilities and optimization techniques.
- AWS S3: Familiarity with Amazon S3 for storing large volumes of structured and unstructured data. Understanding of S3 bucket policies and data lifecycle management.
- EMR: Experience working with Apache EMR (Elastic MapReduce) for distributed computing on big datasets. Knowledge of configuring and managing EMR clusters.
- Data Pipeline Development: Proven experience in building end-to-end scalable data pipelines, particularly for real-time data feeds.
- Dimensions and Calculated Fields: Must know how dimensions and calculated fields operate, particularly in the context of data analytics and reporting.
- Databricks: Experience with Databricks for collaborative data engineering and analytics, including the use of notebooks for data processing.
- Adobe Analytics: Experience with Adobe Analytics, including working knowledge of dimensions and calculated fields, is a plus.
- Apache Kafka: Familiarity with real-time data processing frameworks like Apache Kafka or AWS Kinesis for streaming data applications.
- Data Governance: Understanding of data governance principles and best practices to ensure compliance and data integrity.
- DevOps Practices: Familiarity with DevOps practices for CI/CD in a data engineering context, including tools like Jenkins, Docker, and Kubernetes.
Preferred Qualifications
- Experience Level: 3 to 6 years of experience in Data Engineering or a related field.
- Data Warehousing: Knowledge of data warehousing concepts and methodologies, including ETL processes and data modeling.
- Real-time Data Processing: Experience with real-time data processing frameworks and technologies, enhancing the ability to handle streaming data efficiently.
Target Start Date: ASAP
Engagement Length: 6 to 11 months
Time Zone: PST – Overlap at least 5-6 working hours.
Country Restrictions: ONLY ALLOWED Argentina, Brasil, Chile, Colombia, Peru.