Profiling in Python: How to Find Performance Bottlenecks
Do you want to optimize the performance of your Python program to make it run faster or consume less memory? Before diving into any performance tuning, you should strongly consider using a technique called software profiling. It may help you answer whether optimizing the code is necessary and, if so, which parts of the code you should focus on.
Sometimes, the return on investment in performance optimizations just isn’t worth the effort. If you only run your code once or twice, or if it takes longer to improve the code than execute it, then what’s the point?
When it comes to improving the quality of your code, you’ll probably optimize for performance as a final step, if you do it at all. Often, your code will become speedier and more memory efficient thanks to other changes that you make. When in doubt, go through this short checklist to figure out whether to work on performance:
- Testing: Have you tested your code to prove that it works as expected and without errors?
- Refactoring: Does your code need some cleanup to become more maintainable and Pythonic?
- Profiling: Have you identified the most inefficient parts of your code?
Only when all the above items check out should you consider optimizing for performance. It’s usually more important that your code runs correctly according to the business requirements and that other team members can understand it than that it’s the most efficient solution.
The actual time-saver might be elsewhere. For example, having the ability to quickly extend your code with new features before your competitors will make a real impact. That’s especially true when the performance bottleneck lies not in the underlying code’s execution time but in network communication. Making Python run faster won’t win you anything in that case, but it’ll likely increase the code’s complexity.
Finally, your code will often become faster as a result of fixing the bugs and refactoring. One of the creators of Erlang once said:
Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful! (Source)
— Joe Armstrong
As a rule of thumb, anytime you’re considering optimization, you should profile your code first to identify which bottlenecks to address. Otherwise, you may find yourself chasing the wrong rabbit. Because of the Pareto principle or the 80/20 rule, which applies to a surprisingly wide range of areas in life, optimizing just 20 percent of your code will often yield 80 percent of the benefits!
But without having factual data from a profiler tool, you won’t know for sure which parts of the code are worth improving. It’s too easy to make false assumptions.
So, what’s software profiling, and how do you profile programs written in Python?
Free Bonus: Click here download your sample code for profiling your Python program to find performance bottlenecks.
How to Find Performance Bottlenecks in Your Python Code Through Profiling
Software profiling is the process of collecting and analyzing various metrics of a running program to identify performance bottlenecks known as hot spots. These hot spots can happen due to a number of reasons, including excessive memory use, inefficient CPU utilization, or a suboptimal data layout, which will result in frequent cache misses that increase latency.
Note: A performance profiler is a valuable tool for identifying hot spots in existing code, but it won’t tell you how to write efficient code from the start.
It’s often the choice of the underlying algorithm or data structure that can make the biggest difference. Even when you throw the most advanced hardware available on the market at some computational problem, an algorithm with a poor time or space complexity may never finish in a reasonable time.
When profiling, it’s important that you perform dynamic analysis by executing your code and collecting real-world data rather than relying on static code review. Because dynamic analysis often entails running a slow piece of software over and over again, you should start by feeding small amounts of input data to your algorithm if possible. This will limit the amount of time that you spend waiting for results on each iteration.
Once you have your code running, you can use one of the many Python profilers available. There are many kinds of profilers out there, which can make your head spin. Ultimately, you should know how to pick the right tool for the job. Over the next few sections, you’ll get a quick tour of the most popular Python profiling tools and concepts:
- Timers like the
time
andtimeit
standard library modules, or thecodetiming
third-party package - Deterministic profilers like
profile
,cProfile
, and line_profiler - Statistical profilers like Pyinstrument and the Linux
perf
profiler
Fasten your seatbelt because you’re about to get a crash course in Python’s performance profiling!
time
: Measure the Execution Time
In Python, the most basic form of profiling involves measuring the code execution time by calling one of the timer functions from the time
module:
>>> import time
>>> def sleeper():
... time.sleep(1.75)
...
>>> def spinlock():
... for _ in range(100_000_000):
... pass
...
>>> for function in sleeper, spinlock:
... t1 = time.perf_counter(), time.process_time()
... function()
... t2 = time.perf_counter(), time.process_time()
... print(f"{function.__name__}()")
... print(f" Real time: {t2[0] - t1[0]:.2f} seconds")
... print(f" CPU time: {t2[1] - t1[1]:.2f} seconds")
... print()
...
sleeper()
Real time: 1.75 seconds
CPU time: 0.00 seconds
spinlock()
Real time: 1.77 seconds
CPU time: 1.77 seconds
You first define two test functions, sleeper()
and spinlock()
. The first function asks your operating system’s task scheduler to suspend the current thread of execution for about 1.75 seconds. During this time, the function remains dormant without occupying your computer’s CPU, allowing other threads or programs to run. In contrast, the second function performs a form of busy waiting by wasting CPU cycles without doing any useful work.
Later, you call both of your test functions. Before and after each invocation, you check the current time with time.perf_counter()
to obtain the elapsed real time, or wall-clock time, and time.process_time()
to get the CPU time. These will tell you how long your functions took to execute and how much of that time they spent on the processor. If a function waits for another thread or an I/O operation to finish, then it won’t use any CPU time.
Note: The performance of a computer program is typically limited by the available processing power, memory amount, input and output operations, and program latency.
If a given task predominantly does a lot of computation, then the processor’s speed will determine how long it’ll take to finish. Such a task is CPU-bound. You can sometimes run such tasks in parallel on multiple CPU cores simultaneously to reduce the overall computation time.
On the other hand, an I/O-bound task spends most of its time waiting for data to arrive from a disk, a database, or a network. Such tasks can benefit from using faster I/O channels or running them concurrently as well as asynchronously.
Read the full article at https://realpython.com/python-profiling/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]