Recently, I caused a pretty big production issue. It was bad. It all happened when I tried to harden our APIs – by disabling weak cipher suites in the TLS protocol. If you’re not sure what that means – or how it is done, stay tuned! In this post, I’ll explain what happened, why it’s important to harden your APIs, and how to do it properly.
Mmm, something looks weird here…
A few months ago, while investigating a bug in our iOS app, I noticed something weird: Each device I checked had no records in our logging system – meaning, it had not sent any logs for the past 14 days. Now, I know we at Soluto are really good developers – but no errors in the last 14 days? That’s pretty suspicious! So, I decided to run a query to show all the errors from our iOS app in the last 14 days and was amazed by the results:
Before we keep investigating this bug, let’s do a quick recap of how logging works at Soluto. We have an API that receives all the logs from our mobile app (Android/iOS) and forwards it to our logging system. This allows us, for example, to easily change how and where we send logs without the need to release a new version of our mobile app.
Back to the graph above. It’s clear that something bad happened on September 7th (notice the big orange circle – where are all the logs?), but what was it? Such a clear drop in the logs could indicate that the issue is related to the API. To make things even weirder – this issue only presented itself in iOS logs – Android logs kept going through as usual.
As I said, it seemed to me like an issue with the Logging API. And since I did publish a security fix to disable weak cipher suites on that very day, it was very likely related to that change.
Wait – what are cipher suites?
At the high level, TLS is the protocol behind HTTPS, and ciphers suites are the building blocks of the connection. TLS (among other things) is responsible for encrypting the traffic between the client and the server. Now, as there are many encryption protocols, the client and the server need to negotiate and choose the protocol to use in this specific connection. The negotiation is done using cipher suites – each cipher suite describes the protocol, key length, and a few more factors. The technical details are a bit more complicated for this discussion, and if you want to learn more – you are more than welcome to read this.
Now, there are many cipher suites out there – and not all of them are strong. Some of them could be cracked in minutes. Let’s say an attacker is able to tamper with the cipher suites negotiation flow and force the client and server to use weak cipher suites. The attacker could then crack it and decrypt the connection even though both the client and the server think they are talking over an encrypted channel. The only way to protect from such an issue is to disable weak cipher suites on the server side. After disabling them, even if an attacker is able to tamper with the negotiation, the server will refuse to use a weak cipher and abort the connection.
Testing weak cipher suites
Before disabling weak cipher suites, as with any other feature, I want to have a relevant test case. The test is simple: Get all the available cipher suites from the server, and fail the test if a weak cipher suite found (Read this OWASP guide on how to test it manually for more information). Luckily for us, we can use NMap tool for that. NMap is a free security scanner tool, that can scan the target for various security vulnerabilities, including weak cipher suites. Using NMap is pretty straightforward:
nmap --script ssl-enum-ciphers -p 443 -Pn <host name>
Just replace <host name> with the host that you want to check. NMap can produce XML file with the result that is easy to process – you can use this script I wrote: It will set the exit code to 1 if NMap reports on any cipher suite with a grade less than A. Setting the exit code will allow us to easily integrate it into the CI/CD pipeline, and fail the build if a weak certificate found. In the future, this might be included in OWASP Glue. You can run the script easily using docker:
docker run -it --rm soluto/test-ssl-cipher-suites <your-host>
Time to disable weak ciphers on IIS
Ok, we have a failing test in our CI/CD pipeline that checks the cipher suites – let’s work on fixing it! The bad news – disabling weak ciphers on IIS is only possible by changing a Registry key – not so fun. The good news? There is a tool that makes it easy to define which ciphers you want to disable, and it does that for you – IISCrypto. IISCrypto can work either as a command line utility or with a UI. You can even create a template, by specifying which ciphers you want to disable, and saving it to a file. Then, you can use the command line utility to apply the template to the host by running:
IISCryptoCli.exe /template soluto.ictpl
We host many of our APIs on Azure Cloud Service platform. Cloud Service is a PaaS solution, which allows you to (relatively) easily deploy your code. To install additional software on the server running your code, you can use a Startup Task. A Startup Task is basically a batch script that you deploy with your code. Then, this script run on the server during the provisioning process. We can bundle IISCrypto with our dedicated template into a startup task, and voila – no more weak TLS ciphers suites. Now, after publishing the new code to production, the test from the previous section will pass.
The next step was to roll out this startup task to all our APIs (micro-service can be a challenge sometimes). One of the first APIs I changed was Logging API – the one I describe at the beginning. This is the API that’s responsible for shipping the logs from our mobile app. All the tests were green, and I felt pretty safe with the deployment. Then, I found out that the deployment also caused all the logs requested from our iOS app to fail. Why? Well, it took me some time to find the answer, but we finally figured it out – Apple ATS.
Starting with iOS 9, Apple rolled out a new feature called ATS or App Transport Security. ATS aimed to improve the security of mobile apps by enforcing many things, including HTTPS. If you’ve developed an iOS app in the last 2 years, you’ve probably encountered an error when trying to send a request over HTTP (not HTTPS). To do this, you had to disable ATS (Careful, not a good practice to do this in production!) in order for this request to work (See this question on Stack Overflow as an example). This is a pretty common occurrence with ATS, and I encountered it myself a few times before. What I was not aware of is that ATS also requires specific cipher suites (one that has PFS – perfect forward secrecy – you can find more about it here). If the server does not support it, ATS will not allow the TLS connection.
So ATS was the reason – but why? Apparently, the issue was the server OS: Microsoft changed the name of the ciphers between windows server 2012 and 2016 (See this page for all the keys per OS version). Logging API was deployed to servers with OS 2012, and the template was created using 2016 cipher suites. So, some of the strong cipher suites (that also supported PFS) were disabled.
So, what did I’ve learned from this story? Firstly, you can’t be too careful, especially when dealing with things that you don’t fully understand. Secondly, setting strong TLS ciphers is complicated. Always take into consideration all of your clients.
I hope that you enjoy reading this post and learned something new from my mistakes. After all, that’s the best way to learn! 🙂