MongoDB vs. SQL Server 2008 Performance Showdown

This article is a follow up one I wrote last week entitled “The NoSQL Movement, LINQ, and MongoDB – Oh My!”. In that article I introduced the NoSQL movement, MongoDB, and showed you how to program against it in .NET using LINQ and NoRM. You can also watch my conference presentation at MongoDB Seattle 2011 or this DevelopMentor webcast.

I highlighted two cornerstone reasons why you might ditch your SQL Server for the NoSQL world of MongoDB. Those were

1. Ease-of-use and deployment
2. Performance

For ease-of-use, you’ll want to read the original article.

This article is about the performance argument for MongoDB over SQL Server (or MySql or Oracle). In the first article, I threw out a potentially controversial graph showing MongoDB performing 100 *times* better than SQL Server for inserts.

“A potentially controversial graph showing MongoDB performing 100 times better than SQL Server”

We’ll see source code, downloadable and executable examples and you can verify all of this for yourselves. But first, here’s a new twist on an old proverb:

“Data is money”

If your application is data intensive and stores lots of data, queries lots of data, and generally lives and breathes by its data, then you’d better do that efficiently or have resources (i.e. money) to burn.

Let’s imagine you’re creating a website that is for-pay and data intensive. If you were to attempt to plan out your operating costs per user to help guide the pricing of your product then the cost of storing, querying, and managing your data will likely be a significant part of that calculation.

If there is a database that is 100 times faster than SQL Server, free, easy to administer and you program it with LINQ just as you would with SQL Server then that is a very compelling choice.

When you have such a database, it means you can run your system on commodity hardware rather than high-end servers. It means you can have fewer servers to maintain and purchase or lease. It means you can charge a lot less per user of your application and get the same revenue. Think about it.

“It means you can charge a lot less per user of your application and get the same revenue. Think about it.”

One more story before we see the statistics. Kristina Chodorow from 10Gen gave a talk a few weeks ago at San Francisco’s MySQL Meetup entitled “Dropping ACID with MongoDB”. You can watch the recording here:

http://www.ustream.tv/recorded/6146875

[The audio and video isn’t too hot, but the content is. Skip the first minute without audio.]

During this talk, Kristina describes SourceForge’s experience moving from MySql to MongoDB. On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.

Not convinced of this NoSQL thing yet? Fair enough. Here are some graphs, some stats, and some code.

The scenario:

Model a data intensive web application aiming to support as many concurrent users as possible. There will be users from the web application itself. But there will also be users from an API and external applications. Users will interact with the data by having nearly as many inserts as they do queries. Their inserts are all small pieces of data and are all independent of each other.

Let me just get this out of the way and I mean the following in the nicest of ways: I don’t care about your scenario or use-case. The scenario above is what I’m trying to model. I’m not trying to do bulk-inserts or loading large files into databases or anything like that. MongoDB may be great for these. SQL Server may have specialized features around your use-case, etc. They don’t apply in my scenario. So please don’t wonder why I’m not using bulk inserts or anything like that in the examples below.

Insert Speed Comparison

It’s the inserts where the differences are most obvious between MongoDB and SQL Server.

These inserts were performed by inserting 50,000 independent objects using NoRM for MongoDB and LINQ to SQL for SQL Server 2008. Here are the data models:

MongoDB basic class

SQL Server basic class

I ran five concurrent clients hammering the databases with inserts. Here’s the  screenshots for  running against MongoDB and  against SQL Server. Let’s zoom into the most important result with the output from one of five concurrent clients:

MongoDB:

SQL Server:

That’s right. It’s 2 seconds verses 3 1/2 minutes!

Now to be fair, this was using LINQ to SQL on the SQL side which is slow on the inserts. After discussing these results with some friends, I re-ran the tests using raw ADO.NET style programming and saw a 1.5x-3x performance improvement for SQL. That still leaves MongoDB 30x-50x faster than SQL.

Query Speed Comparison

Now let’s see about getting the data out using the same objects above on the indexed Id field for each database.

Here MongoDB still kicks some SQL butt with almost 3x performance. If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.

“If we were to leverage the mad scale-out options that MongoDB affords then we could kick that up to many times more.”

Complex Data and the Real World

Feel like that was an overly simplified example? Here’s some real world data with foreign keys and joins. Below is the complex data model.

MongoDB:

SQL Server:

It shouldn’t surprise you that MongoDB does even better here without its joins.

The Hardware

All of these tests were run on a Lenovo T61 on Windows 7 64-bit with a dual-core 2.8 GHz processor using the 64-bit versions of both SQL Server 2008 Standard and MongoDB 1.4.1. You can even see a picture of the computer here: http://twitpic.com/hywa8

Your Turn

If you want to see the entire set of data above as an Excel spreadsheet, you can download that here:

sql-vs-mongo.xlsx

You can also download the sample code. Before you do, realize I haven’t done a bunch of work to make it super easy to run. But you should be able to figure it out. Just turn the knobs on the PerfConstants class for the number of inserts and queries. Then comment or uncomment sections of the code in the clients for your scenarios.

The expected use is that you’ll start the launcher application then use it to launch five concurrent clients at exactly the same time.

Download Sample:

SpeedOfSqlVsMongoDBAnddotNetSample.zip

Got feedback? Write a comment or contact me on Twitter: @mkennedy or find me in
any of these other ways
.

Thanks!

Some thanks are in order for all the help I got bouncing around ideas as well as trying different scenarios. Thanks to

Eric Cain @arcain
Jim Lehmer@dullroar
Karl Seguin @karlseguin

Cheers!
@mkennedy 

37 thoughts on “MongoDB vs. SQL Server 2008 Performance Showdown

  1. Why is the Customer/Order example for SQLServer modelled as a many-to-many relationship? An order cannot have multiple customers, so the customer id should have gone directly in the Order table.

    • Hi,

      That is a good point and something I overlooked when putting the sample together. The most important take-away from the sample is this:

      SQL requires joins, joins are slow. MongoDB is fast in large part because it doesn’t use joins (most of the time).

      That it doesn’t correctly match Customers/Orders isn’t that important.

      I guess in some sense you could say it’s not a fair comparison. However, we’ve certainly other tests with different data models and it turns out the performance of SQL is way off of that of MongoDB, so I haven’t re-written it.

      I’m considering putting together another post using EC2 machines so we can all test exact platform. I’ll update the sample if I do so.

      Thanks!
      Michael

      • SQL JOINS are not slow if you use an appropriate key. GUIDS will work. However (as implemented in your sample code) they are not monotonically increasing. Therefore inserts alone will be slow due to page splits and fragmentation. Then, because they are somewhat random, the index becomes more of a scan than an seek. Therefore it’s not an apples to apples comparison.

  2. Hi
    I am using both MYSQL and MONGODB in my application for retriving data, I am using java for the development of my application,
    I have a collection name as tesing_mongodb in Mongodb contains 2,00,000 documents and i have a table in mySql named as testing_sql contains same records as in testing_mongodb, means table testing_sql contains 2,00,000 rows as well and both contains exactly the same data,

    I created 2 different pages to retrieve records, one for connection with mongodb and one for MySql,
    And i retrieved all the data from Mongodb Collection testing_mongodb on 1 page and i retrieved all the data from MYSQL table testing_sql on another page,
    the response time from mongodb Collection to fetch result is around or more than 1 minute and
    the response time from MYSQL is around 3 to 4 second.
    Now, I heard that Mondodb is very fast in compare with MYSQL but I found this statement is wrong, can any one tell me the reason,
    I am using latest version of mongodb with single server,
    I am using windows xp service pack 2 and my ram is just 2 GB
    and I am running MYSQL on this same system

    • Hi,

      Sounds like something is off. First of all, do you have indexes on your queries in Mongo?

      Second, check out this quote from Wordnik on the MongoDB users page:

      http://www.mongodb.org/display/DOCS/Production+Deployments

      “Wordnik stores its entire text corpus in MongoDB – 1.2TB of data in over 5 billion records. The speed to query the corpus was cut to 1/4 the time it took prior to migrating to MongoDB.”

      They were using MySQL previously AFAIK.

      Cheers,
      Michael

    • Check you have 64 bit machine on which Mongodb is running.
      Have at least 4-8 GB RAM.
      Check where clause is indexed.
      Pretty much it… Mongodb will throw MySQL out of the roof…
      We have achieved 7k decent shopping cart objects and inserted up to 1 Million records and queried random objects and were able to retrieve that shopping cart in < 2 milliseconds.

  3. Comparing MongoDB to SQLServer makes me want to bang my head against a wall. You sure like comparing Apples to Oranges, do you?

  4. Does anyone know how RAM requirements compare? The last comment talks about 4-8 GB RAM… getting a 4-8 GB VPS or cloud host could be a very expensive proposition for a small developer.

    • That is a good point. MongoDB is pretty hungry for memory. However, you can get the m1.medium instance on EC2 for around $90/month or so and that has 3.7 GB. Probably a pretty good host. If you host MongoDB in Linux (still accessible via Windows instance) you could cut that in half or double the RAM.

  5. Michael, Are these results using local(client and server in same machine) MongoDB or remote(client and DB Sever in seperate machines) because results with remote MongoDB are like SQL Server took 27secs for insertion of 25000 records and MongoDB took 40 secs for the same.

    • Hi,

      They are on the same machine. There are changes in the way the newer MongoDB drivers wait on the server which makes individual calls slower without affecting the real scalability.

      Look at the disk usage & CPU usage for the remote SQL & MongoDB servers. Add more clients until you see the server utilization in the same range.

      Cheers,
      Michael

  6. Michael,
    YesSQL is the transactional world.
    Wrap yours inserts in transaction and you’ll get the great performance in SQL Server.
    Or use trasactions in MongoDb for each insert :)

    Hope this helps.

  7. Thank you for this article and sharing your work.

    What were the indexes in place on each of the Mongo and the MS-SQL databases during this test?

    Did the tests run multiple times, or once for each presented timing?

    What were the db config files (ConfigurationFile.ini and memory/working set/etc settings for MS-SQL and mongodb.conf for Mongo, e.g).

    • Hi Ross. You’re welcome!

      All queries are against primary or foreign-key relationships both in SQL & MongoDB. I ran the tests a few times to “warm up” the servers before doing the timing. I recorded just one run (not the average) for the numbers, but running them repeatedly generated roughly the same numbers.

      All the config settings are default, out of the box config.

      Cheers,
      Michael

  8. MongoDB relies on the system pager to write the memory mapped file to disk and will return as soon as it has updated the memory — before the OS has finished syncing the memory mapped file. SQL Server returns after the data has been at least reliably logged. Your faster MongoDB is exposing you to the possibility of data loss.

    • Hi,

      That is partly why MongoDB is fast (the other is more coherent data structures). However, you’re not looking at the whole picture when making this comparison.

      One instance of MongoDB != One instance of SQL Server

      MongoDB is build to be used in replica sets: http://www.mongodb.org/display/DOCS/Replica+Sets. The recommendation is to have at least one slave, one master. This configuration is potentially provides more data safety than a single SQL Server.

      Here’s an example connectoin string that will not return from the write until replication has succeeded on at least two machines out of 3 in a replica set:

      mongodb://host1,host2,host3/?safe=true;w=2

      This gives you both speed + data safety.

      Cheers,
      Michael

  9. Now that I have an Eiffel driver for MongoDB, I will try my own comparison at the end of next week. In the middle of the week, we are getting new PCI-e SSD cards, which are much faster than their SSD SATA counterparts and certainly faster than a typical SATA HDD. With the high-end PCI-e SSD cards, I am expecting very big performance increases and will want to see how SQL Server stacks up in this arena.

    We started as a SQL Server shop, but will someday make the cut over to MongoDB if I can convince the power-that-be about the real-world business Use Case and money difs between the two. For instance, our DBA team has spent almost a month making changes to a SQL Server DB that could have been literally swallowed in two days by coding it directly and letting the DB just have it! At about an average of $75/hour for DBA+AppDev labor for 3 people, we have spent about $20K+! It is time for that kind of expenditure to bite the dust!

    Thanks for your hard work. I will remember this site and your conclusions as a ready-reference.

  10. Is it about lots of small rights and lots of small reads?
    I work with financial data, my queries retrieve 1/2gb at a time or compare two columns each of which is 1/2gb of data. Does noSQl perform better in doing comparisons on in generating calculated fields from existing data? Again is noSQL better for transactions similar to amazon.com purchases or searches which are lots and lots “small” searches, and no calculated fields generated?

  11. Pingback: { name: "mongo", type: "db"} - open source NOSQL

  12. Pingback: .NET Blog | { name: "mongo", type: "db"} - open source NOSQL

  13. I dont understand, I have really bad result in access time (both read and write) do you have any link to advise me to :
    create index correctly
    reduce read access time
    reduce write access time

    and so, an other question, does the number of collection have an impact on this time

  14. Pingback: MONGODB – Parte 1 Dati documentali VS dati relazionali PRO e CONTRO | Web Improvement

  15. Pingback: ORM-Wahl fürs Web: Entity Framework+MSSQL oder MongoDB? ← Benjamin ABT

  16. Hi, I’ve read this article, and, I want to give my own opinion.
    In fact, I’ve found that the performance difference between MongoDB and Sql Server is more represents the difference between MongoDB’s client driver and Sql Server’s client driver(ADO.NET).
    I once made a test, I emit ADO.NET’s code and ignore some useless ILCode, then I run such query over SqlServer, our test finished about 80,000 query operation in one second(Platform:I7, 12GB MEM, WIN7).

    • Thanks for the stats. You need to be careful of a few things.

      1. That you run as many clients as your server will support. The natural network latency is a bigger driver in MongoDB than for SQL so do things in parallel to tax the server.

      2. Watch disk & cpu load to see your actually taxing the server for each.

      3. That inserts are NOT bulk inserts (that’s not the scenario we’re testing here).

      Cheers,
      Michael

  17. As far as I remember MongoDB is not transactional and comparing transactional system with no transactional (which does not ensure data integrity) is pointless. However results are quite interesting.

    • Hi q,

      MongoDB *is* somewhat transactional as long as you are only working with one document. In many cases, this is the same as standard RDBMSes. I update a customer, their address, their orders, etc. You could do that transactionally / atomically without trouble using MongoDB provided your schema is setup correctly.

      You cannot update multiple documents atomically, true. But for many apps, there are ways around this (e.g. compensation, optimistic concurrency – a la Entity Framework, etc).

      Cheers,
      Michael

  18. Hi Michael,

    the database setup you used, isn’t ideal for SQL Server. Having GUIDs as clustered indexes is probably causing a lot of IO on your system because the inserts happen basically randomly across the database pages and cause page splits.

    This article by Kimberly L. Tripp (http://www.sqlskills.com/blogs/kimberly/ever-increasing-clustering-key-the-clustered-index-debate-again/) explains how a clustered index should be : unique, narrow, static and evern-increasing.

    With that in mind, I modified your database script to include a RowId column for each table. I made it a bigint identity column and the clustering index. The primary key was kept on your GUID ID columns.

    Running 10000 inserts with the ADO.NET style client on my Lenovo W530 with SQL Server 2012 ran between 2.104 seconds and 2.174 seconds. Quite different from the 204 seconds you initially reported and very close to the MongoDB results.

    Cheers

    Hans

      • I am pretty sure it will be similar to the guy above who talked about how they needed a team of DBA’s to work over an entire db. Imagine having to spend $20k for a month just to get something to perform decently… sounds like maybe the wrong tool is being used if you ask me.

        I think the big draw is the fact its easier to get good results without being a MS-Sql dba or expert, And on top of that its nice that your data store can exist on more than just a windows box.

        So I am not sure if anyone is aware that the cost of Sql Server license is not cheap, neither is windows VMs or windows Server licenses, then lets say you multiply that cost by 3 because you want some redundancy or load balancing…. Linux + mongo could provide equivalent or better performance with a greatly reduced cost is what I am taking away from this.

  19. re: Tim Burnett on March 28, 2014 at 11:08 am:
    You don’t need to be an uber dba to set your primary key to an int and have it auto-increment…

  20. Pingback: שלום, מונגו! (MongoDB) | חדשות

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s