Data Engineer
Company : Soothsayer Analytics
Working Hours : Full Time
Locations : Hyderabad
Experience : 4–6 Years
Soothsayer Analytics is a global AI & Data Science consultancy headquartered in Detroit, with a thriving delivery center in Hyderabad. We design and deploy end-to-end custom Machine Learning & GenAI solutions—spanning predictive analytics, optimization, NLP, and enterprise-scale AI platforms—that help leading enterprises forecast, automate, and gain a competitive edge.
As a Data Engineer, you will build the foundation that powers these AI systems—scalable, secure, and high-performance data pipelines.
Job Overview
We seek a Data Engineer (Mid-level) with 4–6 years of hands-on experience in designing, building, and optimizing data pipelines. You will work closely with AI/ML teams to ensure data availability, quality, and performance for analytics and GenAI use cases.
Key Responsibilities
1. Data Pipeline Development
- Build and maintain scalable ETL/ELT pipelines for structured and unstructured data.
- Ingest data from diverse sources such as APIs, streaming, and batch systems.
2. Data Modeling & Warehousing
- Design efficient data models to support analytics and AI workloads.
- Develop and optimize data warehouses/lakes using Redshift, BigQuery, Snowflake, or Delta Lake.
3. Big Data & Streaming
- Work with distributed systems like Apache Spark, Kafka, or Flink for real-time and large-scale data processing.
- Manage feature stores for machine learning pipelines.
4. Collaboration & Best Practices
- Work closely with Data Scientists and ML Engineers to ensure high-quality training data.
- Implement data quality checks, observability, and governance frameworks.
Required Skills & Qualifications
- Education: Bachelor’s or Master’s in Computer Science, Data Engineering, or related field.
- Experience: 4–6 years in data engineering with expertise in:
- Programming: Python, Scala, or Java (Python preferred)
- Big Data & Processing: Apache Spark, Kafka, Hadoop
- Databases: SQL and NoSQL (Postgres, MongoDB, Cassandra)
- Data Warehousing: Snowflake, Redshift, BigQuery, or similar
- Orchestration: Airflow, Luigi, or similar
- Cloud Platforms: AWS, Azure, or GCP (data services)
- Version Control & CI/CD: Git, Jenkins, GitHub Actions
- MLOps / GenAI Pipelines: Feature engineering, embeddings, vector databases
Skills Matrix
| Skill | Details | Last Used | Experience (Months) | Self-Rating (0–10) |
|---|---|---|---|---|
| Python | ||||
| SQL / NoSQL | ||||
| Apache Spark | ||||
| Kafka | ||||
| Data Warehousing (Snowflake, Redshift, etc.) | ||||
| Orchestration (Airflow, Luigi, etc.) | ||||
| Cloud (AWS / Azure / GCP) | ||||
| Data Quality / Governance Tools | ||||
| MLOps / LLMOps | ||||
| GenAI Integration |
Instructions for Candidates
- Provide a detailed resume highlighting end-to-end data engineering projects.
- Fill out the above skills matrix with accurate dates, duration, and self-ratings.