Using Python to make unstable APIs reliable
Introduction
Sales Spirit is the company behind several publisher websites, such as Prijsvergelijken.nl: the largest telecom comparison website in the Netherlands. Important for our success is having a complete and flawless product offering on our websites. To achieve this goal, we work closely together with many business partners. Maintaining different and varying product offerings from business partners is a constant struggle.
Automating the maintenance using partner APIs is an important aspect of our work at Sales Spirit. These APIs vary widely in terms of reliability, performance, protocols and language, completeness, etc. At the same time we like to offer our clients reliable, good performing and complete websites. It wasn’t until we incorporated Python in our workflow before we were fully up for this challenge. Python has helped us to built an API platform that does not only deliver output of great quality, but is very readable and maintainable as well.
Building an API platform with Python
We developed a tailor-made API platform at Sales Spirit, to deal with the APIs as provided by our business partners. The platform has many use cases, one such use case is our postcode check. Clients can fill in their postcode and house number at our broadband compare page, to check for broadband availability at their home address. Under the hood we check the availability of broadband at several broadband providers using their APIs. In this specific use case performance and reliability are very important, since we are dealing with a live service. The APIs themselves do not always offer this.
We have ensured the performance and reliability by organizing our API platform in a master – worker configuration. The workers run concurrently, each dealing with one API call. The workers are also sandboxed, such that the API platform remains stable should anything happen to a worker. Another advantage of having the workers sandboxed is that they can easily be killed. We can therefore put workers on a time bound, this helps to ensure that our platform can deliver within a certain timespan. The master – worker configuration was implemented using Python Threads. As opposed to other thread solutions, Python threads are really easy to use. Most operations in Python have guaranteed atomicity, due to the Global Interpreter Lock (GIL). This has saved us a tremendous amount of time while ensuring thread safety for our workers. As a bonus, the infrastructure for creating Python Threads is well organized and documented and is incorporated in Python’s standard library. To finish off our API platform, we had to come up with a solution for making processing and collecting API results easy. For the processing part we ended up writing a library, containing a bunch of data processing tools. Most tools are essentially abstracted versions of popular Python libraries, entirely tailored to our needs. For example, our tool for generating SQL queries from API results is entirely based on SQLAlchemy. Python really shines when it comes down to writing powerful and easy to use abstractions of otherwise complicated operations.
We designed a special thread safe output class for collecting API results. The class is passed to each worker as an object. Workers can write results to the object without having to worry about how all data eventually comes together. We used Python lists and dictionaries as our basic building blocks for developing the output class. Thus most of the heavy lifting for making the output class thread safe came out of the Python box.
The results written to the output class are shared with the master thread. It is the master’s job to bring the individual results together and to generate a final output. The master gets its instructions from a configuration file. The configuration file may contain simple instructions, such as outputting all results as a list. It is also possible to setup a repeat instruction for failed API requests. Using instructions such as these may improve the quality of the output, but has an impact on performance. For live services such as the postcode check, this may not be the best approach for improving on output quality and completeness. Instead, we fill in the blanks using the output from workers that were successful. With our API platform, this advanced post processing technique is all setup with just a few lines in the configuration file.
Conclusion
Important for the success of Sales Spirit is having a complete and flawless product offering on our websites. The pages are often automatically populated with data, using partner APIs. Ensuring the quality of websites using these APIs is a real challenge.
We used Python to develop a tailor-made API platform that deals with the APIs as provided by our business partners. The API platform ensures the quality of our products and is part of our success. Python has helped us to built an API platform that does not only deliver output of great quality, but is very readable and maintainable as well. Python has saved us days of development time and will save us even more in the future due to the readability and maintainability of the code.