Job Summary
We are seeking a talented Data Engineer to join our innovative team. As a key player in our data management efforts, you will be responsible for designing and implementing robust data pipelines and ETL processes. Your expertise in Python and cloud-based solutions will be crucial as we migrate our existing data warehouse from MySQL 1.0 to Snowflake and enhance our data infrastructure. This role offers the opportunity to work in a dynamic environment where your contributions will directly impact our ability to deliver exceptional service in the Specialty insurance sector.
Opportunity briefing: Ledgebrook is seeking a talented Data Engineer to join our innovative team. As a key player in our data
management efforts, you will be responsible for designing and implementing robust data pipelines and ETL processes. Your
expertise in Python and cloud-based solutions will be crucial as we migrate our existing data warehouse to Snowflake and
enhance our data infrastructure. This role offers the opportunity to work in a dynamic environment where your contributions
will directly impact our ability to deliver exceptional service in the Specialty insurance sector.
Job Responsibilities
- Design, develop, and maintain ETL processes to ensure efficient data flow and transformation.
- Migrate existing data from legacy systems (MySQL warehouse 1.0) to Snowflake, ensuring data integrity and quality throughout the process.
- Transform custom code in Power BI into dbt models for streamlined analytics and reporting.
- Collaborate with cross-functional teams to gather business requirements and translate them into scalable data solutions.
- Implement data governance and security best practices to protect sensitive information and ensure compliance with relevant regulations.
- Optimize data models and workflows for performance and scalability, ensuring that data is accessible and usable for analytics.
- Utilize tools like Apache Airflow for orchestration and automation of data pipelines, ensuring timely and reliable data delivery.
- Provide support for data-related issues and assist in troubleshooting production problems, ensuring minimal downtime and disruption.
- Stay updated with industry trends and emerging technologies to continuously improve our data engineering practices.
Basic Qualifications
- 5+ years of experience in data engineering, with a strong track record of designing and managing complex data systems.
- Proficiency in Python for data manipulation and ETL processes, with a solid understanding of libraries such as Pandas and NumPy.
- Good working knowledge of Power BI for data visualization and reporting.
- Experience with SQL and NoSQL databases, including Snowflake, Amazon Redshift, Databricks, and other cloud data solutions.
- Strong understanding of data modeling, ETL processes, and data warehousing concepts, with hands-on experience in building data pipelines.
- Familiarity with cloud computing solutions such as AWS, Azure, or Google Cloud Platform (GCP).
- Excellent communication skills and the ability to work collaboratively in a team environment.
- Proficiency in English at B2+ or C1 level.
Nice to Have:
- Prior startup experience, demonstrating adaptability and a proactive approach to problem-solving.
- Experience with data transformation tools like dbt (data build tool) for managing the analytics workflow.
- Familiarity with data integration tools such as Fivetran and Hevo for seamless data ingestion from various sources.
- Experience with orchestration tools like Prefect and Apache Airflow for managing complex workflows.
- Understanding of data security regulations (e.g., GDPR, CCPA) and user data privacy standards.
- Familiarity with data visualization tools (e.g., Power BI, Tableau) and machine learning frameworks (e.g., TensorFlow, PyTorch) for advanced analytics.
Other Skills
- Strong analytical skills with the ability to interpret complex datasets.
- Proficient in programming languages relevant to data engineering (Python, SQL).
- Knowledge of database management systems (DBMS) and experience with both relational and non-relational databases.
- Experience with cloud platforms (AWS, Azure, GCP) and their respective services related to data storage and processing.
- Familiarity with DevOps practices related to CI/CD pipelines for data applications.
Target Start Date: ASAP
Time zone: at least 6 Hours Overlap with CST (Team is on different US time zones and in Europe)
Country restrictions: Cuba, Venezuela.
Project length: ONGOING