Job Summary

We are seeking a skilled Data Engineer to join our dynamic team. The ideal candidate will play a crucial role in designing, building, and maintaining scalable data pipelines that process real-time data feeds. This position is vital for ensuring data quality and supporting data-driven decision-making across the organization. If you are passionate about data engineering and have a strong background in building data pipelines, we want to hear from you!

Location: We are seeking talents from Brazil, Argentina, Peru, Chile and Colombia.
Job Responsibilities

Designing and Building Data Pipelines: Create robust and scalable data pipelines that efficiently process real-time data feeds, ensuring high availability and performance.
Data Quality Assurance: Implement data validation and quality checks to ensure the integrity and accuracy of data throughout the pipeline.
Collaboration with Analytics Teams: Work closely with analytics teams to understand their data needs and provide them with the necessary data infrastructure to support their analyses and reporting.
Utilizing Cloud Technologies: Leverage cloud storage solutions, particularly Amazon S3, to store and manage large volumes of structured and unstructured data.
Distributed Computing: Utilize Apache EMR (Elastic MapReduce) for distributed computing on big datasets, ensuring efficient processing and analysis.
Documentation and Best Practices: Maintain comprehensive documentation of data pipelines and processes, and adhere to best practices in data governance and engineering.

Basic Qualifications

Must-Have Skills

Python: Proficiency in Python programming language for scripting purposes within ETL processes. Experience in writing efficient and reusable code is essential.
SQL: Strong expertise in SQL for querying and manipulating data. Ability to write complex queries and optimize them for performance.
Spark: Experience with Apache Spark for processing big datasets efficiently. Familiarity with Spark’s data processing capabilities and optimization techniques.
AWS S3: Familiarity with Amazon S3 for storing large volumes of structured and unstructured data. Understanding of S3 bucket policies and data lifecycle management.
EMR: Experience working with Apache EMR (Elastic MapReduce) for distributed computing on big datasets. Knowledge of configuring and managing EMR clusters.
Data Pipeline Development: Proven experience in building end-to-end scalable data pipelines, particularly for real-time data feeds.
Dimensions and Calculated Fields: Must know how dimensions and calculated fields operate, particularly in the context of data analytics and reporting.
Databricks: Experience with Databricks for collaborative data engineering and analytics, including the use of notebooks for data processing.

Nice-to-Have Skills

Adobe Analytics: Experience with Adobe Analytics, including working knowledge of dimensions and calculated fields, is a plus.
Apache Kafka: Familiarity with real-time data processing frameworks like Apache Kafka or AWS Kinesis for streaming data applications.
Data Governance: Understanding of data governance principles and best practices to ensure compliance and data integrity.
DevOps Practices: Familiarity with DevOps practices for CI/CD in a data engineering context, including tools like Jenkins, Docker, and Kubernetes.

Preferred Qualifications

Experience Level: 3 to 6 years of experience in Data Engineering or a related field.
Data Warehousing: Knowledge of data warehousing concepts and methodologies, including ETL processes and data modeling.
Real-time Data Processing: Experience with real-time data processing frameworks and technologies, enhancing the ability to handle streaming data efficiently.

Target Start Date: ASAP
Engagement Length: 6 to 11 months
Time Zone: PST – Overlap at least 5-6 working hours.
Country Restrictions: ONLY ALLOWED Argentina, Brasil, Chile, Colombia, Peru.

Job Type: Remote

Allowed Country: Argentina Brazil Chile Colombia Peru

Solicitar este puesto

Back to listings