Users connect again and again… forever. The full user profile (name, address, phone, etc…) is requested on each connection.
This is the preferred mode of testing because it is the most intensive and it is similar to peak load during peak hours.
Bottlenecks can be identified. Unreliable components will break apart:
- Max-out disk, CPU and network (until one becomes the bottleneck)
- Blocking I/O or inefficient I/O will destroy performances
- Logs will saturate disks when not managed properly
- Memory leaks pile up (if present) until the application crashes
- Race condition are more likely to happen and to be detected
- Bad resource usage (e.g. not freeing file and resources) is fatal
- Multiple of those issues can add up fast and kill the application right away
The authentication service is a web service exposing various API (REST, OAuth, SAML…) for use by all the applications throughout the organisation to authenticate users and get information about them. Nothing ever deals with LDAP directly, except that authentication service.
While we could authenticate directly against the LDAP for performance testing, we explicitly DO NOT want to do that. The performance of a single isolated LDAP server makes little sense and is of limited interest. We care about the performance of the full authentication chain, of which the LDAP server is an important factor.
Software and Configuration
- Centos 6.5
- OpenDJ 2.6
- OpenLDAP 2.4 (hdb)
- Symas Silver (OpenLDAP mdb)
All testing is done with 100 000 users in the LDAP server.
All applications are optimized, configured, tweaked and tuned for maximum performances. We went through all the official documentation, the performance optimization guide (when there is one), all the first page results on google for “Tuning <ldap server name>” and similar search strings.
We ran the performance tests dozen of times to find the best settings for each software.All the fluff about optimizing indexes, tuning cache settings, tuning database size, tuning connection pool, tuning logging levels… and much more, was done.
Settings are as close to production as possible. That implies one notable rule. We did not enable any of the ‘production unsafe’ settings like changing database syncing or data consistency behavior. They’re the kind of settings to give a performance boost at the cost of eventually destroying all user data on a power loss. Definitely not acceptable for production. If it’s not acceptable for production, it’s not worth testing.
PreTesting – Development Machine
- Windows 7 Enterprise
- CPU i5-2520m (2 cores, 4 threads)
- Memory 8 GB
- HDD 320 GB 7200 rpm
- VmWare Player
Each software run inside a dedicated VM (1 core, 2GB). The scenario itself runs in JMeter on the host.
Note: OpenLDAP is at 0 because it crashes.
PreProduction – Servers
Everything is virtualised on VmWare ESXi servers, unfortunately i can’t fully disclose the physical hardware of the hosts. Each software runs inside a dedicated VM.
- CPU 4 cores of a Xeon v3 year 2015
- 4 GB memory
- 10 Gbit networking
- (shared disks) 16 HDD SAS 10k rpm in RAID
- (shared disks) multiple 4GB battery backed raid controllers
Note: OpenLDAP (hdb) is missing because it failed miserably the preliminary tests on the laptop environment.
OpenLDAP & ApacheDS
They have poor performance in write and mediocre in read only. They both use a BerkeleyDB internally and exhibit similar behavior. OpenLDAP crashes under load. ApacheDS had to be configured with a special option (no write sync) to add initial users or it would have taken an entire week. They are not satisfactory. It looks like there is some sort of internal locking in the ldap or the database which block access to entries and result in shitty performances.
Symas OpenLDAP has good performances yet it lacks a proper administration interface, configuration tools and instructions (same as the bare OpenLDAP). The Internet saying it’s 3-10 times faster than OpenLDAP for about 3-10 times less memory are about right. (though it can be tough at times to compare <number> to <crashed>).
We believe that high traffic sites admitting to use OpenLDAP are actually using the Symas Edition. Either directly via paid subscription or indirectly by scraping open source code and packages to rebuild it themselves.
The top end version is actually quite cheap. I remember seeing a 75k€ somewhere for a site license. (one deployment in one company, unlimited computers, unlimited cores). It makes sense that someone (e.g. a telecom company with 2M users) starts with the classic OpenLDAP only to get disappointed by it, and then transition to the Symas Edition which is able to take the load and seem reliable enough.
OpenDJ has the best performances, the best administration tools both graphical and command line, as well as the best documentation. The multi-site replication is designed for a worldwide scale deployment with scalability, high availability and high performance in mind. This is the best LDAP server by an order of magnitude, be it about features or reliability or technical details.
The license costs money though. If you have the budget then you should go for it. It is worth every penny and it can even save you money in the long run.
So far OpenDJ has been a great experience to use and manage. I wish every other ldap server would die, especially OpenLDAP. Then everyone would be forced to use this one and it would be awesome 😀
Open Source vs Proprietary | Free vs Paid
(I will not get into a philosophical debate about those, I do not care, I take the tool which can do the job for the budget I have.)
Most people tend to assimilate Open Source to Free (as in ‘no money’). This is a common mistake but a mistake nonetheless. There are old sayings like “You get what you pay for” and “If it’s too good to be true, it probably isn’t.”. They turn out to be especially true in the case of LDAP servers.
All of the above servers are (mostly) open source. Yet the 2 working ones (Symas and OpenDJ) are not free at all. They will charge you a license fee and an optional support subscription to get and use their packages.
You still have the option to grab some of the source and try to build everything by yourself. That can satisfy the cheapskate hacker in you but that isn’t necessarily the best option once you realize your day-of-work costs 1000$ and you have some serious critical stuff to run (e.g. a telecom company with 2M users).