This article is a follow up one I wrote last week entitled “The NoSQL Movement, LINQ, and MongoDB – Oh My!”. In that article I introduced the NoSQL movement, MongoDB, and showed you how to program against it in .NET using LINQ and NoRM. You can also watch my conference presentation at MongoDB Seattle 2011 or this DevelopMentor webcast.
I highlighted two cornerstone reasons why you might ditch your SQL Server for the NoSQL world of MongoDB. Those were
1. Ease-of-use and deployment
For ease-of-use, you’ll want to read the original article.
This article is about the performance argument for MongoDB over SQL Server (or MySql or Oracle). In the first article, I threw out a potentially controversial graph showing MongoDB performing 100 *times* better than SQL Server for inserts.
“A potentially controversial graph showing MongoDB performing 100 times better than SQL Server”
We’ll see source code, downloadable and executable examples and you can verify all of this for yourselves. But first, here’s a new twist on an old proverb:
“Data is money”
If your application is data intensive and stores lots of data, queries lots of data, and generally lives and breathes by its data, then you’d better do that efficiently or have resources (i.e. money) to burn.
Let’s imagine you’re creating a website that is for-pay and data intensive. If you were to attempt to plan out your operating costs per user to help guide the pricing of your product then the cost of storing, querying, and managing your data will likely be a significant part of that calculation.
If there is a database that is 100 times faster than SQL Server, free, easy to administer and you program it with LINQ just as you would with SQL Server then that is a very compelling choice.
When you have such a database, it means you can run your system on commodity hardware rather than high-end servers. It means you can have fewer servers to maintain and purchase or lease. It means you can charge a lot less per user of your application and get the same revenue. Think about it.
“It means you can charge a lot less per user of your application and get the same revenue. Think about it.”
One more story before we see the statistics. Kristina Chodorow from 10Gen gave a talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with MongoDB”. You can watch the recording here:
[The audio and video isn’t too hot, but the content is. Skip the first minute without audio.]
During this talk, Kristina describes SourceForge’s experience moving from MySql to MongoDB. On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.
Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats, and some code.
Model a data intensive web application aiming to support as many concurrent users as possible. There will be users from the web application itself. But there will also be users from an API and external applications. Users will interact with the data by having nearly as many inserts as they do queries. Their inserts are all small pieces of data and are all independent of each other.
Let me just get this out of the way and I mean the following in the nicest of ways: I don’t care about your scenario or use-case. The scenario above is what I’m trying to model. I’m not trying to do bulk-inserts or loading large files into databases or anything like that. MongoDB may be great for these. SQL Server may have specialized features around your use-case, etc. They don’t apply in my scenario. So please don’t wonder why I’m not using bulk inserts or anything like that in the examples below.
Insert Speed Comparison
It’s the inserts where the differences are most obvious between MongoDB and SQL Server.
These inserts were performed by inserting 50,000 independent objects using NoRM for MongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:
MongoDB basic class
SQL Server basic class
I ran five concurrent clients hammering the databases with inserts. Here’s the screenshots for running against MongoDB and against SQL Server. Let’s zoom into the most important result with the output from one of five concurrent clients:
That’s right. It’s 2 seconds verses 3 1/2 minutes!
Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the inserts. After discussing these results with some friends, I re-ran the tests using raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL. That still leaves MongoDB 30x-50x faster than SQL.
Query Speed Comparison
Now let’s see about getting the data out using the same objects above on the indexed Id field for each database.
Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.
“If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.”
Complex Data and the Real World
Feel like that was an overly simplified example? Here’s some real world data with foreign keys and joins. Below is the complex data model.
It shouldn’t surprise you that MongoDB does even better here without its joins.
All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core 2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8
If you want to see the entire set of data above as an Excel spreadsheet, you can download that here:
You can also download the sample code. Before you do, realize I haven’t done a bunch of work to make it super easy to run. But you should be able to figure it out. Just turn the knobs on the PerfConstants class for the number of inserts and queries. Then comment or uncomment sections of the code in the clients for your scenarios.
The expected use is that you’ll start the launcher application then use it to launch five concurrent clients at exactly the same time.
Some thanks are in order for all the help I got bouncing around ideas as well as trying different scenarios. Thanks to