D

Lead Devops Engineer

Data Direct Networks
Full-time
Remote
India, India

Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

 

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC 

 

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA 

 

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. 

 

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. 

 

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. 

 

Job Description

As the Lustre and EXA Engineering DevOps team, we are passionate about innovation and automation.  We are looking for a Sr. DevOps Engineer with demonstrated experience in improving and scaling infrastructure as well as automating workflows for build, test and deployment stages. We prioritize reliability, scalability, visibility and security to create an ecosystem that empowers our engineering teams to build and deliver faster. Your work will directly support the development of storage solutions that are making an impact across many industries and AI applications.

 

Responsibilities for this role include but are not limited to:

  • Maintain and support infrastructure consisting of bare-metal servers and VMs, ensuring seamless functionality and performance.
  • Maintain computer networks including switches, VPNs, routers and other physical hardware.
  • Manage and optimize a suite of tools and applications for build, artifact hosting, testing, and reporting.
  • Automate configuration and provisioning of infrastructure for development, build, and test infrastructure.
  • Build and maintain CI/CD pipelines, streamlining delivery in multiple environments from build through deployment.
  • Develop solutions for log analysis and reporting.
  • Create and deploy RPM and DEB packages, enabling consistent and efficient software distribution.
  • Automate the provisioning of bare-metal servers, VMs, and containers for testing.
  • Troubleshoot and resolve complex issues with infrastructure, builds, pipelines, and deployments, focusing on stability and throughput.
  • Develop custom command-line tools to simplify infrastructure management and empower engineering teams.
  • Respond promptly to engineering requests and resolve time-sensitive issues as they arise.

 

Required Skills & Experience:

  • Bachelor's degree in CS or related technical field with a minimum of 7+ years of relevant industry experience.
  • Deep proficiency with Linux systems and command-line tools.
  • Strong scripting and automation skills with Bash, Python, Ruby
  • Programming experience with Go or Rust. (Prefer Rust)
  • Experience with build automation (Make, CMake) and dependency management.
  • Proficiency in creating RPM and DEB package specifications.
  • Strong understanding of version control best practices (Git).
  • Experience with developing and maintaining CI pipelines with GitHub Actions.
  • Experience with PXE booting and tools such as Cobbler and Forman
  • Experience with Infrastructure as Code (IaC) tools such as Terraform, Pulumi.
  • Experience with configuration management tools. (Chef or Ansible preferred).
  • Experience with artifact repository management tools. (Artifactory, Nexus)
  • Experience with monitoring tools such as Zabbix, Prometheus.
  • Experience with log and data analysis and reporting tools such as Splunk, ELK stack, Grafana, etc.
  • Security-focused mindset, with a deep understanding of infrastructure, package management, and reporting security best practices.
  • Understanding of Agile methodologies and the unique considerations for DevOps teams within Agile frameworks.
  • Strong communication skills with the ability to convey technical information clearly and concisely to a variety of stakeholders.

Experience with Lustre filesystem and Infiniband networking is a plus.

DDN

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

 

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

 

  • Coding assessment: Often in a language of your choice.
  • Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).
  • Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.
  • Meet and greet with the wider team.
  • Our goal is to finish the main process in 2-3 weeks at most.

 

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.