Document: WM-045 P. Webb
Category: Tutorial 2020.01.28
Migrating from MongoDB to RethinkDB
Abstract
Thank me later
Body
RethinkDB, seemingly on life support for quite some time, is seeing a
revival[1] of sorts. As such, I thought it prudent to make available
evergreen content for my favorite database these days. If you are
interested in trying RethinkDB you can check out these[2] two[3]
tutorials (my guide will not cover installation or setup).
1. Preparing MongoDB exports
# Command
mongoexport --port PORT_NUMBER --db DATABASE_NAME --collection COLLECTION_NAME --out COLLECTION_NAME-`date "+%Y-%m-%d"`.json --pretty --jsonArray
# Example
mongoexport --port 98765 --db dawebb --collection users --out users-`date "+%Y-%m-%d"`.json --pretty --jsonArray
There’s a bit to unpack here so I’ll break it down. Keep in mind
that all the parameters yelling at you are placeholders (for you
to replace with your own parameters).
Actually, the placeholders are self-explanatory but the second
half of the command is interesting.
``COLLECTION_NAME-date "+%Y-%m-%d".json` makes it so the
exported collection looks like users-2020-01-24.json, with the
date being whenever you ran the above command. Super nifty for
backups too.
The --pretty flag isn't necessary for the import into RethinkDB
to work, it's for *you* to inspect the export for any reason.
The last flag[4], --jsonArray, is the most important. For some
reason, MongoDB exports each item in a collection as its own
object *not* separated by commas. Maybe MongoDB's import process
doesn't choke on malformed JSON but everything else does.
--jsonArray puts the contents of the export into a single JSON
array. Like you'd expect by default…maybe that's just me.
NOTE: --out is the destination path so if you haven't prefaced
`COLLECTION_NAME-date "+%Y-%m-%d".json` with a path, the
export will be in your home directory.
Anyhoo once you've exported the collections you care about, SFTP
into that server to grab them and place them on your Desktop so
you don't have a brain fart and forget where you put them
moments later.
2. Migrating, phase 01
MongoDB comes with some oddities that you may not want in your new
database. Notably, how it deals with IDs. Here's an example:
<!--CODE:BLOCK:1-->
In RethinkDB IDs are simply id and you have no need for v so
you probably don't want these values in your shiny new database.
Also, you may have decided to use this migration period switch up
your schema. Combine nameFirst with nameLast? Drop plan?
Update timzeone? Replace createdAt with created? Regardless,
you're gonna need to do a bit of legwork to clean your
MongoDB export(s).
The entire script I use is hosted here[5] but I'll point out some
relevant pieces.
If you have any fields with dates/milliseconds, your import will
fail unless you wrap those fields in new Date like so:
<!--CODE:BLOCK:2-->
To reuse the IDs that were generated in MongoDB for usage in
RethinkDB, you're gonna need to do something like this:
<!--CODE:BLOCK:3-->
You'll also need to make sure to explicity select the fields you
want to transfer into your new export. The gist linked above
should answer remaining questions you may have.
3. Importing into RethinkDB
Even though you've already installed RethinkDB, you need to
install the Python driver[6] as well (for importing functionality,
at least I had to do this for macOS).
Also, make sure you are importing your newly processed/migrated
data into RethinkDB, not the original nonsense from your MongoDB
export (unless of course, that's your plan).
<!--CODE:BLOCK:4-->
If you don't have a password on your RethinkDB database, you can
safely omit the --password-file flag. Otherwise, make sure the
password file only contains the password. If your IDE
automatically generates new lines in files, just create the
password file with nano.
Make sure you run the above command while RethinkDB is running and
you'll see freshly created tables successfully created.
4. Migrating, phase 02
Alright, we're almost at the finish line!
One of the neat things about RethinkDB (and a feature that
convinced me to make the jump) is its Data Explorer. It's a UI
that allows you to manipulate or check out your tables. There are
just two remaining things we need to do and they're quick and
easy: 1) set up indexes for our tables and 2) update time-based
data to a format RethinkDB likes.
Visit http://localhost:8080 (default port, unless you changed
it) and click on "Data Explorer" in the header. In the text field
you'll be able to perform queries using JavaScript.
4.1. Setting up indexes
By default id` is an index but you may want more. Indexes are for
fields with unique values so it’s easy to think of which field(s)
would be suitable.
Sometimes, only the ID would be unique and that’s fine.
// Command
r.db("DATABASE_NAME").table("TABLE_NAME").index_create("FIELD_WITH_UNIQUE_VALUE");
// Examples
r.db("dawebb").table("users").index_create("email");
r.db("dawebb").table("posts").index_create("slug");
Now let’s update our time-based fields:
// Command
r.db("DATABASE_NAME").table("TABLE_NAME").update({
created: r.ISO8601(r.row("created")),
updated: r.ISO8601(r.row("updated"))
});
// Examples
r.db("dawebb").table("users").update({
created: r.ISO8601(r.row("created")),
updated: r.ISO8601(r.row("updated"))
});
r.db("dawebb").table("visits").update({
timestamp: r.ISO8601(r.row("timestamp"))
});
FIN
And there you have it! A super easy guide to move from MongoDB to
RethinkDB. I’ve been using RethinkDB for several months now and I
am way happier than I was with MongoDB. While super easy to get
into, once you get in too deep it becomes an exercise in
frustration to find solutions to ambiguous errors and the MongoDB
docs are not user-friendly.
Contrast that with RethinkDB’s Data Explorer, clear error
messages, and clean documentation and it’s not difficult to
imagine why I’d make the switch. 🕸
P.S. New year, new projects[7], and now I feel like I need a new
design for this blog. And then I remembered that first I need to
create a personal API[8] so this blog can just become the
presentation layer for the content.
2020.01.30 update
Another reason to migrate is the license of MongoDB: SSPL vs.
Apache 2 of RethinkDB.
— af[9]
For others who may not know what SSPL[10] entails (like me until I
read the linked post):
Basically, SSPL means one cannot offer MongoDB as a hosted
service. That makes sense from their end as they offer hosting.
However, it’s a bit of a punk move because they are preventing
potential competition from forcing them to improve
their product. 🕸