Thanks for checking out this article. All things around performance are very sensative to changes in versions, hardware, and much more. Since 2008, there are so many different factors and changes to all the software involved, I’ve decided to take down this article as it has become really out of date.
If you want to learn more about MongoDB or get started using it, please consider checking out one of my courses:
MongoDB Quickstart with Python [free]
MongoDB for Developers with Python
If you happen to create a modern performance comparison, feel free to add a link and comment to this article.
Why is the Customer/Order example for SQLServer modelled as a many-to-many relationship? An order cannot have multiple customers, so the customer id should have gone directly in the Order table.
That is a good point and something I overlooked when putting the sample together. The most important take-away from the sample is this:
That it doesn’t correctly match Customers/Orders isn’t that important.
I guess in some sense you could say it’s not a fair comparison. However, we’ve certainly other tests with different data models and it turns out the performance of SQL is way off of that of MongoDB, so I haven’t re-written it.
I’m considering putting together another post using EC2 machines so we can all test exact platform. I’ll update the sample if I do so.
SQL JOINS are not slow if you use an appropriate key. GUIDS will work. However (as implemented in your sample code) they are not monotonically increasing. Therefore inserts alone will be slow due to page splits and fragmentation. Then, because they are somewhat random, the index becomes more of a scan than an seek. Therefore it’s not an apples to apples comparison.
I agree. Bad design causes bad response times. MongoDb is NOT a RDBMS. MongoDB was designed for Unstructured data for not that many large tables. I have a table that is way wider that 600KB and over 20mill rows and performs like crap in mongoDb. The data provided above it is true for that specific scenario but you canot generalize against major platforms like Microsoft SQL server or Oracle.
I am using both MYSQL and MONGODB in my application for retriving data, I am using java for the development of my application,
I have a collection name as tesing_mongodb in Mongodb contains 2,00,000 documents and i have a table in mySql named as testing_sql contains same records as in testing_mongodb, means table testing_sql contains 2,00,000 rows as well and both contains exactly the same data,
I created 2 different pages to retrieve records, one for connection with mongodb and one for MySql,
And i retrieved all the data from Mongodb Collection testing_mongodb on 1 page and i retrieved all the data from MYSQL table testing_sql on another page,
the response time from mongodb Collection to fetch result is around or more than 1 minute and
the response time from MYSQL is around 3 to 4 second.
Now, I heard that Mondodb is very fast in compare with MYSQL but I found this statement is wrong, can any one tell me the reason,
I am using latest version of mongodb with single server,
I am using windows xp service pack 2 and my ram is just 2 GB
and I am running MYSQL on this same system
Sounds like something is off. First of all, do you have indexes on your queries in Mongo?
Second, check out this quote from Wordnik on the MongoDB users page:
“Wordnik stores its entire text corpus in MongoDB – 1.2TB of data in over 5 billion records. The speed to query the corpus was cut to 1/4 the time it took prior to migrating to MongoDB.”
They were using MySQL previously AFAIK.
Same as always, it’s about use case.
I can make a raspberry pi outperform a blade server… if I get to design the test.
Check you have 64 bit machine on which Mongodb is running.
Have at least 4-8 GB RAM.
Check where clause is indexed.
Pretty much it… Mongodb will throw MySQL out of the roof…
We have achieved 7k decent shopping cart objects and inserted up to 1 Million records and queried random objects and were able to retrieve that shopping cart in < 2 milliseconds.
Something is wrong with your sql server. My instance of Sql Server 2008 insert 10000 rows in milliseconds, so your tests are irrelevant. In 204 seconds, into a sql database can be inserted millions of records.
Comparing MongoDB to SQLServer makes me want to bang my head against a wall. You sure like comparing Apples to Oranges, do you?
Articles like this make me want to bang my head against a wall. Comparing apples to oranges….
Please watch this video: http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale
I was hoping for a deep analysis of the performance differences with details of set and optimizations, but nothing like that is included. Any mention of indexes, caching, data integrity – none that I could see. I’d be expecting at least an analysis of where in the internal process the slowdown in SQL Server processing is happening. Have you tried any profiling, do you use prepared statements? What about memory pools?
It looks to me like you have very little experience in database modelling and RDBMS performance optimization. Just the fact that you call SQL Server “SQL” indicates you probably don’t know that SQL is a language.
I’m sure NoSQL style databases are better in certain circumstances, but there’s far too little info here for an informed decision.
BTW, is SQL Server 2008 faster at updates than MongoDB? You didn’t mention any stats. Wordnik discovered that if you update a MongoDB record it locks the whole database!!
Check this out as well:
Man, do you really believe that SQL server will need 2 minutes to insert 10 000 rows? I’m not sql server advocate but if this was true nobody would be using SQL server now. Recently I did perf tests for a simple application and SQL server was happy to do 15000 inserts per second – even faster than your Mongo benchmark.
I suspect something is wrong with your testing application and if you knew SQL good enough such results would look very suspicious to you.
Does anyone know how RAM requirements compare? The last comment talks about 4-8 GB RAM… getting a 4-8 GB VPS or cloud host could be a very expensive proposition for a small developer.
That is a good point. MongoDB is pretty hungry for memory. However, you can get the m1.medium instance on EC2 for around $90/month or so and that has 3.7 GB. Probably a pretty good host. If you host MongoDB in Linux (still accessible via Windows instance) you could cut that in half or double the RAM.
Michael, Are these results using local(client and server in same machine) MongoDB or remote(client and DB Sever in seperate machines) because results with remote MongoDB are like SQL Server took 27secs for insertion of 25000 records and MongoDB took 40 secs for the same.
They are on the same machine. There are changes in the way the newer MongoDB drivers wait on the server which makes individual calls slower without affecting the real scalability.
Look at the disk usage & CPU usage for the remote SQL & MongoDB servers. Add more clients until you see the server utilization in the same range.
In SQL all insertion are transaction. So you should not insert them by one but by blocks of 10-100-1000 etc.
Start your transaction, insert 1000 rows then commit it and Mongo DB will be slowly.
See more in my article “MongoDB vs SQL Server ?” (sorry, in French, but contains more code and graphs than words)
YesSQL is the transactional world.
Wrap yours inserts in transaction and you’ll get the great performance in SQL Server.
Or use trasactions in MongoDb for each insert :)
Hope this helps.
Thank you for this article and sharing your work.
What were the indexes in place on each of the Mongo and the MS-SQL databases during this test?
Did the tests run multiple times, or once for each presented timing?
What were the db config files (ConfigurationFile.ini and memory/working set/etc settings for MS-SQL and mongodb.conf for Mongo, e.g).
Hi Ross. You’re welcome!
All queries are against primary or foreign-key relationships both in SQL & MongoDB. I ran the tests a few times to “warm up” the servers before doing the timing. I recorded just one run (not the average) for the numbers, but running them repeatedly generated roughly the same numbers.
All the config settings are default, out of the box config.
MongoDB relies on the system pager to write the memory mapped file to disk and will return as soon as it has updated the memory — before the OS has finished syncing the memory mapped file. SQL Server returns after the data has been at least reliably logged. Your faster MongoDB is exposing you to the possibility of data loss.
That is partly why MongoDB is fast (the other is more coherent data structures). However, you’re not looking at the whole picture when making this comparison.
One instance of MongoDB != One instance of SQL Server
MongoDB is build to be used in replica sets: http://www.mongodb.org/display/DOCS/Replica+Sets. The recommendation is to have at least one slave, one master. This configuration is potentially provides more data safety than a single SQL Server.
Here’s an example connectoin string that will not return from the write until replication has succeeded on at least two machines out of 3 in a replica set:
This gives you both speed + data safety.
Now that I have an Eiffel driver for MongoDB, I will try my own comparison at the end of next week. In the middle of the week, we are getting new PCI-e SSD cards, which are much faster than their SSD SATA counterparts and certainly faster than a typical SATA HDD. With the high-end PCI-e SSD cards, I am expecting very big performance increases and will want to see how SQL Server stacks up in this arena.
We started as a SQL Server shop, but will someday make the cut over to MongoDB if I can convince the power-that-be about the real-world business Use Case and money difs between the two. For instance, our DBA team has spent almost a month making changes to a SQL Server DB that could have been literally swallowed in two days by coding it directly and letting the DB just have it! At about an average of $75/hour for DBA+AppDev labor for 3 people, we have spent about $20K+! It is time for that kind of expenditure to bite the dust!
Thanks for your hard work. I will remember this site and your conclusions as a ready-reference.
Is it about lots of small rights and lots of small reads?
I work with financial data, my queries retrieve 1/2gb at a time or compare two columns each of which is 1/2gb of data. Does noSQl perform better in doing comparisons on in generating calculated fields from existing data? Again is noSQL better for transactions similar to amazon.com purchases or searches which are lots and lots “small” searches, and no calculated fields generated?
I dont understand, I have really bad result in access time (both read and write) do you have any link to advise me to :
create index correctly
reduce read access time
reduce write access time
and so, an other question, does the number of collection have an impact on this time
Hi, I’ve read this article, and, I want to give my own opinion.
In fact, I’ve found that the performance difference between MongoDB and Sql Server is more represents the difference between MongoDB’s client driver and Sql Server’s client driver(ADO.NET).
I once made a test, I emit ADO.NET’s code and ignore some useless ILCode, then I run such query over SqlServer, our test finished about 80,000 query operation in one second(Platform:I7, 12GB MEM, WIN7).
Thanks for the stats. You need to be careful of a few things.
1. That you run as many clients as your server will support. The natural network latency is a bigger driver in MongoDB than for SQL so do things in parallel to tax the server.
2. Watch disk & cpu load to see your actually taxing the server for each.
3. That inserts are NOT bulk inserts (that’s not the scenario we’re testing here).
From this post:
(Third comment) i understand that when not using get last error the insert statement issued to mongo rwtuen before it is succefullt persisted even on one node. Is your test using mongo with fire and forget or in safe mode?
As far as I remember MongoDB is not transactional and comparing transactional system with no transactional (which does not ensure data integrity) is pointless. However results are quite interesting.
MongoDB *is* somewhat transactional as long as you are only working with one document. In many cases, this is the same as standard RDBMSes. I update a customer, their address, their orders, etc. You could do that transactionally / atomically without trouble using MongoDB provided your schema is setup correctly.
You cannot update multiple documents atomically, true. But for many apps, there are ways around this (e.g. compensation, optimistic concurrency – a la Entity Framework, etc).
with db big ( 4.000.000 – 6.000.000 row). how query speed ?
Definitely better than SQL (if it involves joins at least). Check out what Wordnick is doing with Mongo. See this link (Search for wordnick):
Wordnik stores its entire text corpus in MongoDB – 3.5T of data in 20 billion records. The speed to query the corpus was cut to 1/4 the time it took prior to migrating to MongoDB. More about MongoDB at Wordnik…
the database setup you used, isn’t ideal for SQL Server. Having GUIDs as clustered indexes is probably causing a lot of IO on your system because the inserts happen basically randomly across the database pages and cause page splits.
This article by Kimberly L. Tripp (http://www.sqlskills.com/blogs/kimberly/ever-increasing-clustering-key-the-clustered-index-debate-again/) explains how a clustered index should be : unique, narrow, static and evern-increasing.
With that in mind, I modified your database script to include a RowId column for each table. I made it a bigint identity column and the clustering index. The primary key was kept on your GUID ID columns.
Running 10000 inserts with the ADO.NET style client on my Lenovo W530 with SQL Server 2012 ran between 2.104 seconds and 2.174 seconds. Quite different from the 204 seconds you initially reported and very close to the MongoDB results.
VERY nice. Can’t wait to see Michael’s response to this salient comment!
I am pretty sure it will be similar to the guy above who talked about how they needed a team of DBA’s to work over an entire db. Imagine having to spend $20k for a month just to get something to perform decently… sounds like maybe the wrong tool is being used if you ask me.
I think the big draw is the fact its easier to get good results without being a MS-Sql dba or expert, And on top of that its nice that your data store can exist on more than just a windows box.
So I am not sure if anyone is aware that the cost of Sql Server license is not cheap, neither is windows VMs or windows Server licenses, then lets say you multiply that cost by 3 because you want some redundancy or load balancing…. Linux + mongo could provide equivalent or better performance with a greatly reduced cost is what I am taking away from this.
Thanks Tim. Totally agree.
Michael, if you totally agree, you should take down this post or put a disclaimer at the header, it is spamming the internet, Mongo vs SQL Server,
You are using Guids, large size, and they are not even incremental, causing defragmentation. You are being an internet view hog. Nosql has legitimate benefits, but don’t stretch and create false impressions.
Actually SQL Server can do the exact same it can store XML and it can store Object Graphs and Relational Data
Some RDBMS systems (SQL Server as an example) can store XML or Object Graphs and Relational data and retrieve them used like this it might be just as performant as MongoDB most apps are not written that way because most developers probably would not realize it is in the box so to speak. In a transactional context (dealing with money) or where data meeds to be updated. Teleriks SiteFinity running on SQL Server stores and Retrieves JSON in and out of SQL Server particularly for the custom Widgets in this implementation.
re: Tim Burnett on March 28, 2014 at 11:08 am:
You don’t need to be an uber dba to set your primary key to an int and have it auto-increment…
nice benchmark…im currently using MongoDB…
Please Michael YOU MUST STUDY FIRST DATABASE MODELING.
I don’t normally call somebody an idiot, but gees – could you have any more sophomoric an argument? Michael holds a PHD, I do believe. Prove you are so much more knowledgeable by pointing out the flaws in his model instead of assuming that writing a statement in caps is enough to prove its validity.
you can read Hans Van Eylen’s comment. After that, let’s all happyly agree with NPI.
And let me say this: PHD doesn’t give you an infabible license.
You’re telling everyone that it took over 3 minutes to insert 10,000 rows of data which contain 3 columns? If that was the case Sql Server wouldn’t still be around. Seems like mongoDB is for developers who could care less about understanding a relational model, stored procedures, and security, and don’t want to have to go through dbas to push their non performing code.
Yes, I am not a SQL Server fanboy but this article is totally biased, sadly.
Most of my own SQL Server woes seem to be related to indexes. We didn’t know how to maintain them, and apparently neither does our application. Also, I think we abuse the relational model… for us it’s normal to regularly add fields to our “main” table. Databases grow and performance takes a holiday, and we end up building additional databases for new data; sharding a project across many databases is horrible. Everything is less than optimal.
Our business needs seem to be realized by MongoDB… flexible “schema” and horizontal scaling. Getting away from MS products is also attractive (as previous posters commented) since we fight our problems by adding more SQL Servers. I’m a .NET programmer so I’m not actually biased against MS!
MongoDB is really compelling but I can’t shake the feeling that our database application and its data model are to blame, not SQL Server.
Thanks for the article Michael, I found this via google when looking for a cheaper solution for certain data storage.
My opinion on the comment stream is that each Database type has its uses. A solution is simple, use NoSQL for fast data inserts and retrieval for the web, for example a product catalog and use RDBSSQL for where atomic transactions really matters for example payment gateways and bank transactions.
What about a database that isn’t constantly being written to? I am currently fighting with an SQL 2008 DB with three columns (1PK, 2IX) and over 4 BILLION rows. All this database does is get queried through a web app, after initial creation it doesn’t really grow. MongoDB still good for this? These databases I am working on will grow to nearly 8TB.
I suppose it depends on the type of queries you’ll run. But given that you can horizontally scale MongoDB to many servers in a cluster and spread the calls across those (and execute the queries in parallel on them), generally I’d say yes. But running a cluster of servers can be more work so just be aware of that.
poor programming == poor performance.