Embracing the Future: The Impact and Innovation of Cloud High-Performance Computing in Engineering

Embracing the Future: The Impact and Innovation of Cloud High-Performance Computing in Engineering

November 30, 2023
7:14 pm

The world of engineering is rapidly transforming, a change prominently driven by advances in Cloud High-Performance Computing (HPC). In a recent episode of “The Engineering Manager” podcast, host Steven Tedjamulia engaged with Jing Xie, Sr. Director of Cloud Go To Market and Partnerships, in a discussion that not only explored these changes but also provided a blueprint for engineering leaders to navigate this new landscape.

What is Cloud High-Performance Computing (HPC)?

Cloud HPC is an advanced computing model that leverages cloud technology to process complex and data-intensive tasks. It offers a scalable, flexible, and cost-effective alternative to traditional on-premises HPC solutions. Cloud HPC is characterized by its ability to provide vast computational resources on demand, significantly reducing the time and capital investment associated with large-scale computing tasks.

The Evolution of Engineering through Cloud HPC

Jing Xie’s discussion with Steven Tedjamulia underscored how Cloud HPC is reshaping the engineering sector. This transformation is evident in several ways:

Enhanced Computational Power: Cloud HPC brings unprecedented computational power to engineers, allowing for faster processing of complex simulations and data analysis, which is crucial in fields like genetic research, AI, and machine learning.
Accessibility and Flexibility: It democratizes access to high-performance computing resources. Engineering teams, regardless of their size, can access state-of-the-art computational resources, enabling small firms and startups to compete with larger organizations.
Cost-Effectiveness: With cloud HPC, organizations can optimize costs by paying only for the computing resources they use. This aspect is particularly beneficial for project-based work or research where computing needs fluctuate.

Real-World Applications and Case Studies

During the podcast, Jing Xie highlighted several case studies, particularly in university research settings, where Cloud HPC has been instrumental. For instance, research groups that previously relied on limited on-premise servers can now access hundreds of thousands of cloud servers. This scalability is vital for projects requiring massive data analysis, like brain research or regenerative tissue studies.

The Journey towards Cloud HPC Adoption

For engineering managers considering a move to Cloud HPC, the transition process is pivotal. Jing outlines a straightforward approach:

Initial Setup: Signing up for a cloud service provider like AWS, which offers comprehensive support and resources.
Partnering with Cloud HPC Providers: Companies like Memverge offer tools and services to facilitate the integration of Cloud HPC into existing workflows.
Customized Solutions: Solutions tailored to specific needs, such as virtual private cloud network architecture or data lake setup, are available through collaboration with cloud service providers and partners.

Challenges and Considerations

While Cloud HPC offers numerous benefits, there are challenges to consider:

Security and Data Privacy: Ensuring data security in the cloud is paramount, especially for sensitive engineering data.
Skill Gaps: Teams may require training to effectively utilize cloud HPC resources.
Cost Management: Understanding and optimizing cloud resource usage to prevent unexpected expenses is crucial.

Conclusion

The podcast episode with Jing Xie serves as a comprehensive guide for engineering leaders looking to leverage Cloud HPC. It goes beyond a mere introduction to the technology, offering actionable insights, real-world applications, and practical steps for adoption. Cloud HPC stands as a transformative force in engineering, promising innovation, efficiency, and a competitive edge in a rapidly evolving technological landscape.

Join The Engineering Manager Community: Are you looking to stay ahead in the dynamic world of engineering management and connect with like-minded professionals? Join the Engineering Manager community on LinkedIn here. Our community is a hub for the latest best practices, insightful articles, and engaging discussions tailored for engineering managers. It’s an ideal platform to expand your network, share experiences, and learn from industry leaders like Jing Xie.

How OpenTeams Can Help: If you’re seeking to hire expert engineering managers akin to Jing Xie, visit OpenTeams.com. Here, you’ll find a diverse pool of talented professionals ready to bring their expertise to your organization. Join us today to enhance your knowledge and elevate your team to new heights!

References

Embracing the Future: Cloud HPC Revolutionizing Engineering

Podcast Full Transcript

Steven Tedjamulia (00:01.211)
Welcome to another exciting episode of the Engineering Manager podcast. We had an episode with Reuven Lerner yesterday. It was really good. And now we’re looking forward to a new one where we’re going to do a deep dive into the world of engineering and leadership and innovation. I’m your host, Steven Tijamulia. And today we have a fascinating discussion lined up to explore the future of engineering.

that was mentioned recently in one of the articles we had. But before jumping into the podcast, a quick reminder to subscribe, rate, review our podcast on your favorite platform, and don’t forget to connect with us in the Engineering Manager community. Today we have a distinguished guest, unique professional background, someone I do admire in his field for doing…

high-level innovation and leading an amazing product that we’ll discuss in our podcast today. He’s gonna share his insights of cloud high-performance computing and how it’s poised to reshape the engineering landscape. Please join me in welcoming Jing Zi, head of MN Cloud and go-to-market MMVerge. Jing, it’s a pleasure to have you on the show.

Jing Xie (01:24.174)
Thank you, Stephen. I’m thrilled to be here, but also scared that I’m following Reuven Lerner in this series. That’s some big shoes to fill. Ha ha ha.

Steven Tedjamulia (01:33.071)
Yeah, it was an amazing podcast yesterday and today I’m super excited for this one as well. So I’m absolutely glad to have you here. Get started. And I asked the same question, Reuven. Can you tell us a little bit about your background and the journey that you’ve taken to get to Membridge and the position you have and the impact you’re making? Yeah, the audience would love to know a little bit about this journey.

Jing Xie (02:00.866)
Sure, Stephen. So I’d like to say that for me, it actually all started and it all comes back to video games. When I was a kid and well into my college years, I loved playing these two games. One was StarCraft and the second one was Counter-Strike. And we all know that if you’re a serious gamer, one of the key factors to high win rates is how good you are, your actual skill level. But also…

If you’ve played a fair share of games online, especially the multi-player type You know also that a fast computer and an even faster internet connection will get you quite a ways along in terms of you know being the king of the hill and I’m gonna date myself when I when I talk about these games But back then a t1 connection was Nirvana like you were you were gonna be faster

and react faster than your competition. So fast forward from my childhood days to now, I eventually had to grow up, had to find a career, had to make money. And what really guided me in my career choices and what ultimately led me to Memverge, it was actually the same kind of roles and aspects that I liked about playing those two games. And so I focus on.

opportunities and jobs that require a high level of strategy, a competitive spirit, and a curious mind. And ultimately, I like to stay close to fast computers and connections. So that really hasn’t changed either. And yeah, I’m very excited to talk about high performance computing with your audience.

Steven Tedjamulia (03:50.287)
Yeah, I’m glad you brought it up too, because I think just this week, my kids, they love fighting over the fastest computer and which one they can use. So. Yeah, we have several and they’re like, no, that’s the one I want, that’s the one I want. So I totally understand. Now, jumping in a little bit, we’re talking about fast computers.

Jing Xie (04:02.192)
Every week there’s a faster one at Best Buy or on Amazon, right?

Steven Tedjamulia (04:18.139)
Maybe give the audience an overview of cloud high-performance computing. What is it, why is it becoming so important to organizations?

Jing Xie (04:28.278)
Sure, and maybe I’ll even start by highlighting that when people say high performance computing, there’s actually kind of two subterms, and one’s high performance computing, the other one’s high throughput computing. And it might help to quickly describe the difference in each one. So the similarity is that HPC and HTC, they both tend to require a lot of compute.

running simultaneously to help process either a lot of data or go through a series of very complex computations. The main difference, if we want to get really technical, is that HPC is trying to solve typically like one or a few very complex computations by dividing that really messy, hairy big problem up and sending it to a bunch of computers. All.

to all with the mission of solving a smaller subcomponent of that bigger problem. And HTC high throughput is essentially, we have lots of simple, but still computationally intensive tasks, but they’re independent. They don’t require basically reconfiguring the results to answer that question. They’re just all lots of separate.

computational tasks that you’re just trying to complete very quickly. And so really this concept of HPC or HTC, it’s not different in the cloud versus running in perhaps your own lab or your own data center. But what I’ve noticed, especially over the last five years, is that AI machine learning and just core science.

Things like genetic research, it’s really feeling the need by more individuals to need more computers. And it’s not so easy to just build and set up your own high-performance computing environment or lab. And the cloud has really kind of served that role of when people need the compute, how do you get it quickly and how do you use it effectively?

Jing Xie (06:51.066)
And I think going forward, just to make it easier though, for the audience, now that I’ve explained these two things, I will just refer to everything as HPC. Um, but if, and when there’s examples of one or the other specifically, I’ll make sure to highlight.

Steven Tedjamulia (07:03.839)
Thanks for defining those and to breaking those down. You mentioned about going premise to cloud. What are the advantages of – as engineering managers are looking to move to the cloud or contemplating what are the advantages that the cloud offers with HPC slash HTC?

Jing Xie (07:22.434)
Sure. Yeah, sure, absolutely. There’s a ton of advantages. And I will also try to quickly hit on why it can become a double-edged sword sometimes. But to start with the advantages, elasticity and scalability are two of the biggest ones that come to mind. I run into customers and people I talk to all the time that complain.

To me, they either don’t have enough compute in their own labs or in their own HPC clusters, or they have the wrong configurations of compute to do what they actually need to do in the current research project that they’re pursuing. And so if you think about the cloud, it just basically represents a much larger lab or HPC cluster that you can now access. And so when you need a ton of compute, you can get it.

And then the next day, if you don’t have any more projects, you don’t have to manage the infrastructure and you don’t have to worry about it. You literally can just move on to the next problem that you’re trying to solve. And so it just really represents an enormous amount of scale that is basically accessible now at your fingertips.

Some of the largest customers on AWS, they run millions of HPC jobs in a single day. And they still have enough computers left for everyone else, all the individuals like you and me. The next set of advantages that get me really excited, and this is kind of an area that my company focuses on, is this concept of pricing model.

advantage. So because you don’t own the servers, you can just pay for them when you need it. And you actually can pay for them at different price points. So if you need a server and you don’t want the cloud provider to ever take it away till you’re done with it, you can use what’s typically called on demand or pay as you go pricing. It’s going to be more expensive. But again, you have the benefit of tomorrow when you don’t need it, you don’t have to pay anything.

Jing Xie (09:44.906)
And then in the cloud for that exact same machine, for that exact same server, you also have a spot market price. And this is what gets me really excited because I feel like not a lot of people know about this pricing mechanism. And you can get the same machine in the spot market, especially on a quieter day for 70, 80, 90% less than the pay as you go on demand price for that machine.

So to me, this also represents pricing power and pricing kind of arbitrage opportunities. And if you have different research projects that are not as time sensitive, you can try to run these during less busy times in the day, and you can run these things on spot compute to essentially optimize how much you’re spending to do this research. So that’s really, really exciting. And that’s one of the things that I spend some of my time helping customers to look at. It’s like, you know,

strategies and tools and solutions for being able to use this lower priced tier of compute in the cloud. And third, you know, the other key differences, if you’ve got your own servers, that means you have to take care of them. I like to use the term babysitting servers. I don’t know anybody that actually likes doing that. And cloud compute is a different paradigm in the sense that you’re trusting Google, AWS, Azure,

Steven Tedjamulia (11:02.891)
Thank you.

Jing Xie (11:13.502)
other cloud providers, you’re trusting them to maintain and care for those servers and also keep them at a high uptime for you. But it also means that, again, you don’t have to invest the same level of resources to access a lot of compute when you need it. And you also don’t have to worry about all the accessories that come with having your own data center.

Things like, okay, which storage do I use? Which storage software solutions do I use? What’s my networking setup? Do I have enough racks? Do I have enough power cooling in the room? And there’s just all sorts of headaches that you run into when you’re not in the cloud and you have to maintain your own server environment.

One of the stories I like to share is that at a, at a previous startup, um, it was actually so burdensome to run our own servers. We didn’t have that many, but we had about, I would say 25 to 30 servers. It was so burdensome that I basically lost one field sales engineer at any given time because he was running back and forth from the office or from a customer meeting back to our data center to fix something that, you know, was off and, uh, that one of our developers or engineers clued us into.

And I almost fell off my chair when I saw the quote for a new Palo Alto firewall subscription that I had to renew just to keep that data center up and running and secure.

Steven Tedjamulia (12:40.287)
No, thanks for sharing that example and those three benefits with us. Is there – you shared an example of what you were doing and how it’s helped you. Do you have some examples of customers in maybe a specific vertical that the engineering manager is listening could relate to that might give some more instances, especially on the spot side?

that could highlight the benefits of your product.

Jing Xie (13:13.834)
Yeah, sure. So I’m working with a few university research groups who are interested in this idea that, wow, now I don’t have to wait. And I might be able to just run all my backlog of research in the cloud. And what I notice, many of them have this kind of characteristic in common. They’re typically grant-funded research groups every year so they can buy a few servers and they add it to their cluster. So.

Most of these teams, they have maybe 100, 200 servers, and they share it across a team of post-docs and other research collaborators. We’re getting to the point where now to study things like the brain in a more advanced way to understand, you know, there’s some projects going on with regenerative tissue, you’re needing to spin up

in the hundreds of thousands of servers just to complete the analysis of a single data set. And so if you think about, you know, one or 200 machines versus the need for hundreds of thousands of machines running, the benefits become pretty obvious if you can get that from the cloud. And so we’re working with a number of these groups.

who themselves, they don’t have IT skills or they don’t have cloud administration skills, but we’re working with them to implement a solution where they can largely run these research computing workloads the same way they do in that 100 server environment, but instead in the thousands of servers in the cloud. So basically change as little as you can about their interface, let them submit jobs the way that

They’re used to submitting it with tools that they’re used to using, such as Slurm, um, you know, Q sub LSF, some of these, um, on-premise, uh, kind of traditional HPC scheduling tools, let them use those. And then our solution becomes essentially the interface to a cloud platform like AWS, and then we go out there, we help them find the best machines. We help them find machines that are available at a lower cost in the spot market, and we essentially run and manage those jobs until.

Jing Xie (15:37.574)
they’re done and then we give all the compute back to the cloud provider when those Windows research computing workloads are completed. And so there’s a level of automation as well as scale that we’re bringing together to allow these research scientists to do much faster science than they would have otherwise staying only on-prem.

Steven Tedjamulia (16:00.827)
No, that’s great. And just to summarize that one, because you did illustrate the benefits and how to move, but if someone say, look, Jing, I’m ready to go. What’s the process? Do I need to call you? How should I start this process to use the cloud and use the capabilities to do this

Jing Xie (16:28.322)
Sure. So it’s really nice because I think the experience for most people will be that they get help from many different places. So let’s say I want to start doing this and I want to start doing more of my research on AWS. You just go to you go and you sign up for an AWS account and it’s really simple. It’s really easy these days. You get you essentially get assigned a team.

that supports you on the AWS side. And then if you want to quickly stand up this kind of HPC automation capability so that you don’t have to change, you don’t really have to change a whole lot about in terms of what you’re already doing on premise, you can call somebody like us here at Memverge and we help you install a piece of software into your AWS environment. And that piece of software is called Memory Machine Cloud. And really all it…

All it is doing is taking orders from you. So if you like to boss people around, if you like to give orders, this software is really, really good at taking your instructions. And so some of the same tools and some of the same data sets that you’re now trying to analyze on that smaller local HPC environment, you can now move some of that data and you can run, use our software to run those same tools that you’re using.

And the main difference is once again, you get that much larger cloud scale and you get the automation of we talk to the cloud for you. We make sure that you get the compute you need and we make sure that the compute is given back when you’re done.

Steven Tedjamulia (18:10.847)
That’s perfect. And for them to get ahold of you and the team to find out more, what is the process, what’s the best way for them to get a demo, find out more, get started.

Jing Xie (18:22.638)
Sure, so our website is mmcloud.io, so MarthaMaryCloud.io. If you go there, you can sign up for an account. If you like to, if you’re like more introverted and you just wanna try this yourself, you can literally sign up for an account. You can access our docs and you can essentially try this completely without talking to anybody further. But if you like talking to people, if you would like help,

There’s also a little book a meeting button on our site, and it’s super easy to set up time with either myself or someone from our team who can walk you through deploying and setting up this research computing environment with you in the cloud. And we partner really, really well, as I said before, with cloud platforms like AWS. And so we also kind of sometimes play the role of, okay, if you need somebody who is really, really

good at setting up a particular AWS virtual private cloud network architecture, we’ll go pull that person in for you. And if you need someone else who knows about how to set up a data lake, because you actually have a ton of data that you want to move into the cloud and you need help thinking it through, we’ll help pull in that data lake person for you. And so it’s really kind of a teamwork type of experience and you’ll get help from AWS and you’ll also get help from us.

Steven Tedjamulia (19:53.679)
We’re coming up on the time here, but this is a perfect overview of HPC and how to get started. You gave some great case studies, gave benefits of moving to cloud. I’m sure a lot of engineering managers have more questions, and they can feel free to reach out to Jing in our engineering manager community and also sign up on the website to learn more and download the software and start using it. So it’s a pleasure having you on the show, Jing.

Thank you very much for being here.

Jing Xie (20:25.154)
Thanks for having me.

Steven Tedjamulia (20:27.531)
And to the listeners, thank you for tuning in to another episode of the Engineering Manager podcast. Stay curious, stay innovative, and stay tuned to our next episode next week, where we’ll continue exploring the dynamics of the world of engineering leadership. So until next time, thank you.

Podcast Full Transcript

Resources

Company