Role summary
As a Data Engineer, you will be employed by Ruah Tech Solutions and embedded into a next-generation decentralised impact investment platform, helping to design, build, and scale data infrastructure for an ESG and impact-focused product with an automation-first and AI-assisted approach. You will create robust pipelines to ingest, transform, and activate raw ESG, transaction, blockchain, and IoT data into strategic insights that power transparency, decision-making, and ML/AI applications.
About the platform (client)
Our client is a decentralised impact investment firm that transforms overlooked real-world assets into shared prosperity for communities around the world. By securing and tokenising assets such as land, water, timber, agriculture, and other commodities, they create transparent, blockchain-enabled governance models that keep financial sovereignty in the hands of local stakeholders while opening access to retail, institutional, and community investors. Built on institutional-grade compliance and on-chain accountability, their platform uses a governance token to connect capital with high-impact projects across emerging markets, aligning long-term economic growth with measurable social and environmental outcomes.
About the engagement
You will operate as part of an Embedded Capability Team that moves in lockstep with the client’s internal rhythms, working as a true extension of their in-house team rather than a separate vendor track. The engagement is designed for long-term capability building, shared ownership of outcomes, and deep domain expertise in sustainability, impact, and financial data over time.
Key responsibilities
- Design, build, and maintain scalable data pipelines for ingesting raw ESG, transaction, blockchain, and IoT data from diverse sources including sensors, on-chain events, and off-chain APIs.
- Transform and enrich datasets to fuel analytics, reporting, ML/AI models, and dashboards that demonstrate platform impact and compliance.
- Implement data storage solutions (data lakes, warehouses, vector databases) optimized for performance, cost, and query efficiency in cloud environments supporting ML workloads.
- Collaborate with full-stack, Web3, and platform engineers to integrate data flows into product features, AI applications, dashboards, and governance tools.
- Apply automation-first practices including orchestrated ETL/ELT pipelines, infrastructure-as-code, and AI-assisted data quality monitoring to ensure reliability and velocity.
- Establish data security and governance frameworks (e.g., access controls, encryption, lineage tracking, compliance with standards like ISO 27001).
- Participate fully in the client’s agile cadences (planning, stand-ups, reviews, retrospectives) and internal rhythms, ensuring tight alignment and clear communication.
- Proactively optimize data systems for scale, cost, and performance while identifying opportunities to leverage AI/ML for anomaly detection, predictive insights, and ESG impact forecasting.
Required skills and experience
- Strong experience building production data pipelines with tools like Apache Airflow, dbt, Kafka, or similar orchestration frameworks.
- Proficiency in SQL and Python (or similar) for data transformation, with familiarity in cloud data platforms (e.g., AWS Redshift, Snowflake, BigQuery).
- Experience with data modelling, warehousing, and handling semi-structured data from sources like JSON, blockchain events, APIs, and IoT streams.
- Familiarity with cloud-native development, containerization, and CI/CD for data infrastructure.
- Practical experience with data security fundamentals (e.g., encryption at rest/transit, row/column-level security, auditing).
- Proven experience in product-oriented data teams, collaborating with engineers and stakeholders to deliver actionable insights.
- Typically 3–5+ years of relevant data engineering experience, or equivalent capability demonstrated through portfolio and prior roles.
Preferred skills
- Proficiency with ML/AI applications and infrastructure (e.g., feature stores, model serving, vector databases like Pinecone, MLflow).
- Experience processing blockchain/Web3 data (e.g., on-chain transactions, token metadata, DeFi protocols) and IoT sensor data for ESG monitoring.
- Hands-on with ESG, financial, or impact datasets, including metrics for social/environmental outcomes derived from real-world IoT instrumentation.
- Advanced automation and AI tooling (e.g., AI-driven data validation, automated schema evolution, MLOps pipelines).
- Knowledge of streaming architectures (e.g., Kafka, Flink), microservices data patterns, and security-by-design (e.g., SAST for data pipelines).
- Familiarity with AWS services (e.g., Glue, Lambda, S3, SageMaker) and infrastructure-as-code (e.g., Terraform).
- Comfort working across time zones and cultures, with clear written and verbal communication.
How we work
You will join a team that values curiosity, thoughtful craftsmanship, and mutual support, where engineers are encouraged to take ownership of both their own growth and the outcomes they deliver. The role suits someone who is energized by collaborative problem-solving, cares about the real-world impact of the products they build, and is willing to invest in long-term relationships with both Ruah Tech Solutions and the client’s product and engineering teams.
Qualifications
- Degree in Computer Science, Data Engineering, or a related field, or equivalent practical experience.
- Strong portfolio of data projects demonstrating pipeline ownership and impact delivery.
What we offer
- The opportunity to work on meaningful, impact-driven technology that connects finance, data, IoT, and communities.
- An embedded engagement model that allows you to build deep product and domain expertise while benefiting from Ruah Tech Solutions’ support structure and community of practitioners.
- A learning-oriented environment with space for experimentation, mentoring, and cross-functional collaboration across product, data, infrastructure, Web3, and AI/ML domains.
