Verified Job On Employer Career Site
Job Summary:
Microsoft is on a mission to empower every person and organization on the planet and is seeking passionate software engineers for their HPC/AI team. The role involves designing and building networking infrastructure for large-scale AI training, focusing on performance, scalability, and reliability in Azure Cloud environments.
Responsibilities:
- Designs, develops, and optimizes networking solutions tailored for large-scale AI training infrastructure.
- Architects and implements high-performance, low-latency, and low-jitter communication frameworks for distributed systems.
- Benchmarks, analyzes, and enhances the scalability and reliability of networking systems to handle petabyte-scale data transfer.
- Debugs and resolves complex networking issues in large-scale, high-performance environments.
- Drives identification of dependencies and the development of design documents for a product, application, service, or platform.
- Creates, implements, optimizes, debugs, refactors, and reuses code to establish and improve performance and maintainability, effectiveness, and return on investment (ROI).
- Acts as a Designated Responsible Individual (DRI). Follows the playbook, works on call to monitor system/product/service for degradation, downtime, or interruptions, alerts stakeholders about status, and initiates actions to restore system/product/service for simple and complex problems when appropriate.
- Proactively seeks new knowledge and adapts to new AI trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale.
Qualifications:
Required:
- Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python, OR equivalent experience.
- Experience with networking and/or HPC.
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred:
- Bachelor's Degree in Computer Science, or related technical field AND 1+ year(s) technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, OR Python, OR Master's Degree in Computer Science or related technical field with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python, OR equivalent experience.
- 1+ year(s) of experience with networking and/or HPC.
Company:
Microsoft is a software corporation that develops, manufactures, licenses, supports, and sells a range of software products and services. Founded in 1975, the company is headquartered in Redmond, Washington, USA, with a team of 10001+ employees. The company is currently Public Company.