Skip to main content

Mongo DB

Airbyte's certified MongoDB connector offers the following features:

Quick Start

This section provides information about configuring the MongoDB V2 source connector. If you are upgrading from a previous version of the MongoDB V2 source connector, please refer to the upgrade instructions in this document.

New Installation/New Source Connector Configuration

Here is an outline of the minimum required steps to configure a new MongoDB V2 source connector:

  1. Create or discover the configuration of a MongoDB replica set, either hosted in MongoDB Atlas or self-hosted.
  2. Create a new MongoDB source in the Airbyte UI
  3. (Airbyte Cloud Only) Allow inbound traffic from Airbyte IPs

Once this is complete, you will be able to select MongoDB as a source for replicating data.

Step 1: Create a dedicated read-only MongoDB user

These steps create a dedicated, read-only user for replicating data. Alternatively, you can use an existing MongoDB user with access to the database.

MongoDB Atlas
  1. Log in to the MongoDB Atlas dashboard.
  2. From the dashboard, click on "Database Access" under "Security"

Security Database Access

  1. Click on the "+ ADD NEW DATABASE USER" button.

Add New Database User

  1. On the "Add new Database User" modal dialog, choose "Password" for the "Authentication Method".

Authentication Method

  1. In the "Password Authentication" section, set the username to READ_ONLY_USER in the first text box and set a password in the second text box.

Username and Password

  1. Under "Database User Privileges", click on "Select one built-in role for this user" under "Built-in Role" and choose "Only read any database".

Database User Privileges

  1. Enable "Restrict Access to Specific Clusters/Federated Database instances" and enable only those clusters/database that you wish to replicate.

Restrict Access

  1. Click on "Add User" at the bottom to save the user.

Add User

Self Hosted

These instructions assume that the MongoDB shell is installed. To install the MongoDB shell, please follow these instructions.

  1. From a terminal window, launch the MongoDB shell:
> mongosh <connection string to cluster> --username <user with admin permissions>;
  1. Switch to the admin database:
test> use admin
switched to db admin
  1. Create the READ_ONLY_USER user with the read role:
admin> db.createUser({user: "READ_ONLY_USER", pwd: "READ_ONLY_PASSWORD", roles: [{role: "read", db: "TARGET_DATABASE"}]})
note

Replace READ_ONLY_PASSWORD with a password of your choice and TARGET_DATABASE with the name of the database to be replicated.

  1. Next, enable authentication, if not already enabled. Start by editing the /etc/mongodb.conf by adding/editing these specific keys:
net:
bindIp: 0.0.0.0

security:
authorization: enabled
note

Setting the bindIp key to 0.0.0.0 will allow connections to database from any IP address. Setting the security.authorization key to enabled will enable security and only allow authenticated users to access the database.

Step 2: Discover the MongoDB cluster connection string

These steps outline how to discover the connection string of your MongoDB instance.

MongoDB Atlas

Atlas is MongoDB's cloud-hosted offering. Below are the steps to discover the connection configuration for a MongoDB Atlas-hosted replica set cluster:

  1. Log in to the MongoDB Atlas dashboard.
  2. From the dashboard, click on the "Connect" button of the source cluster.

Connect to Source Cluster

  1. On the "Connect to <cluster name>" modal dialog, select "Shell" under the "Access your data through tools" section.

Shell Connect

  1. Copy the connection string from the entry labeled "2. Run your connection string in your command line" on the modal dialog, removing/avoiding the quotation marks.

Copy Connection String

Self Hosted Cluster

Self-hosted clusters are MongoDB instances that are hosted outside of MongoDB Atlas. Below are the steps to discover the connection string for a MongoDB self-hosted replica set cluster.

  1. Refer to the MongoDB connection string documentation for instructions on discovering a self-hosted deployment connection string.

Step 3: Configure the Airbyte MongoDB Source

To configure the Airbyte MongoDB source, use the database credentials and connection string from steps 1 and 2, respectively. The source will test the connection to the MongoDB instance upon creation.

Upgrade From Previous Version

caution

The 1.0.0 version of the MongoDB V2 source connector contains breaking changes from previous versions of the connector.

The quickest upgrade path is to click upgrade on any out-of-date connection in the UI. These connections will display the following message banner:

Action Required There is a pending upgrade for MongoDB.

Version 1.0.0: We advise against upgrading until you have run a test upgrade as outlined here. This version brings a host of updates to the MongoDB source connector, significantly increasing its scalability and reliability, especially for large collections. As of this version with checkpointing, CDC incremental updates and improved schema discovery, this connector is also now certified. Selecting Upgrade will upgrade all connections using this source, require you to reconfigure the source, then run a full reset on all of your connections.

Upgrade MongoDB by Dec 1, 2023 to continue syncing with this source. For more information, see this guide.

After upgrading to the latest version of the MongoDB V2 source connector, users will be required to manually re-configure existing MongoDB V2 source connector configurations. The required configuration parameter values can be discovered using the quick start steps in this documentation.

Replication Methods

The MongoDB source utilizes change data capture (CDC) as a reliable way to keep your data up to date.

CDC

Airbyte utilizes the change streams feature of a MongoDB replica set to incrementally capture inserts, updates and deletes using a replication plugin. To learn more how Airbyte implements CDC, refer to Change Data Capture (CDC).

Schema Enforcement

By default the MongoDB V2 source connector enforces a schema. This means that while setting up a connector it will sample a configureable number of docuemnts and will create a set of fields to sync. From that set of fields, an admin can then deselect specific fields from the Replication screen to filter them out from the sync.

When the schema enforced option is disabled, MongoDB collections are read in schema-less mode which doesn't assume documents share the same structure. This allows for greater flexibility in reading data that is unstructured or vary a lot in between documents in a single collection. When schema is not enforced, each document will generate a record that only contains the following top-level fields:

{
"_id": <document id>,
"data": {<a JSON cotaining the entire set of fields found in document>}
}

The contents of data will vary according to the contents of each document read from MongoDB. Unlike in Schema enforced mode, the same field can vary in type between document. For example field "xyz" may be a String on one document and a Date on another. As a result no field will be omitted and no document will be rejected. When Schema is not enforced there is not way to deselect fields as all fields are read for every document.

Limitations & Troubleshooting

  • Only supports replica set cluster type.
  • Schema discovery uses sampling of the documents to collect all distinct top-level fields. This value is universally applied to all collections discovered in the target database. The approach is modelled after MongoDB Compass sampling and is used for efficiency. By default, 10,000 documents are sampled. This value can be increased up to 100,000 documents to increase the likelihood that all fields will be discovered. However, the trade-off is time, as a higher value will take the process longer to sample the collection.
  • When Running with Schema Enforced set to false there is no attempt to discover any schema. See more in Schema Enforcement.
  • TLS/SSL is required by this connector. TLS/SSL is enabled by default for MongoDB Atlas clusters. To enable TSL/SSL connection for a self-hosted MongoDB instance, please refer to MongoDb Documentation.
  • Views, capped collections and clustered collections are not supported.
  • Empty collections are excluded from schema discovery.
  • Collections with different data types for the values in the _id field among the documents in a collection are not supported. All _id values within the collection must be the same data type.
  • MongoDB's change streams are based on the Replica Set Oplog, which has retention limitations. Syncs that run less frequently than the retention period of the oplog may encounter issues with missing data.
  • Atlas DB cluster are only supported in a dedicated M10 tier and above. Lower tiers may fail during connection setup.

Configuration Parameters

Parameter NameDescription
Cluster TypeThe type of the MongoDB cluster (MongoDB Atlas replica set or self-hosted replica set).
Connection StringThe connection string of the source MongoDB cluster. For Atlas hosted clusters, see the quick start guide for steps to find the connection string. For self-hosted clusters, refer to the MongoDB connection string documentation for more information.
Database NameThe name of the database that contains the source collection(s) to sync.
UsernameThe username which is used to access the database. Required for MongoDB Atlas clusters.
PasswordThe password associated with this username. Required for MongoDB Atlas clusters.
Authentication Source(MongoDB Atlas clusters only) Specifies the database that the supplied credentials should be validated against. Defaults to admin. See the MongoDB documentation for more details.
Schema EnforcedControls whether schema is discovered and enforced. See discussion in Schema Enforcement.
Initial Waiting Time in Seconds (Advanced)The amount of time the connector will wait when it launches to determine if there is new data to sync or not. Defaults to 300 seconds. Valid range: 120 seconds to 1200 seconds.
Size of the queue (Advanced)The size of the internal queue. This may interfere with memory consumption and efficiency of the connector, please be careful.
Discovery Sample Size (Advanced)The maximum number of documents to sample when attempting to discover the unique fields for a collection. Default is 10,000 with a valid range of 1,000 to 100,000. See the MongoDB sampling method for more details.

For more information regarding configuration parameters, please see MongoDb Documentation.

Changelog

VersionDatePull RequestSubject
1.2.12023-12-1833549Add logging to understand op log size.
1.2.02023-12-1833438Remove LEGACY state flag
1.1.02023-12-1432328Schema less mode in mongodb.
1.0.122023-12-1333430Add more verbose logging.
1.0.112023-11-2833356Support for better debugging tools.
1.0.102023-11-2832886Handle discover phase OOMs
1.0.92023-11-0832285Additional support to read UUIDs
1.0.82023-11-0832125Fix compilation warnings
1.0.72023-11-0732250Add support to read UUIDs.
1.0.62023-11-0632193Adopt java CDK version 0.4.1.
1.0.52023-10-3132028url encode username and password.
Handle a case of document update and delete in a single sync.
1.0.32023-10-1931629Allow discover operation use of disk file when an operation goes over max allowed mem
1.0.22023-10-1931596Allow use of temp disk file when an operation goes over max allowed mem
1.0.12023-10-0331034Fix field filtering logic related to nested documents
1.0.02023-10-0329969General availability release using Change Data Capture (CDC)
0.2.52023-07-2728815Revert back to version 0.2.0
0.2.42023-07-2628760Fix bug preventing some syncs from succeeding when collecting stats
0.2.32023-07-2628733Fix bug preventing syncs from discovering field types
0.2.22023-07-2528692Fix bug preventing statistics retrieval from views
0.2.12023-07-2128527Log server information
0.2.02023-06-2627737License Update: Elv2
0.1.192022-10-0717614Increased discover performance
0.1.182022-10-0517590Add ability to enforce SSL in MongoDB connector and check logic
0.1.172022-09-0816401Fixed bug with empty strings in fields with aibyte_transform
0.1.162022-08-1814356DB Sources: only show a table can sync incrementally if at least one column can be used as a cursor field
0.1.152022-06-1713864Updated stacktrace format for any trace message errors
0.1.142022-05-0512428JsonSchema: Add properties to fields with type 'object'
0.1.132022-02-2110276Create a custom codec registry to handle DBRef MongoDB objects
0.1.122022-02-1410256(unpublished) Add -XX:+ExitOnOutOfMemoryError JVM option
0.1.112022-01-109238Return only those collections for which the user has privileges
0.1.102021-12-309202Update connector fields title/description
0.1.92021-12-078491Configure 10000 limit doc reading during Discovery step
0.1.82021-11-298306Added milliseconds for date format for cursor
0.1.72021-11-228161Updated Performance and updated cursor for timestamp type
0.1.52021-11-178046Added milliseconds to convert timestamp to datetime format
0.1.42021-11-157982Updated Performance
0.1.32021-10-197160Fixed nested document parsing
0.1.22021-10-076860Added filter to avoid MongoDb system collections
0.1.12021-09-216364Source MongoDb: added support via TLS/SSL
0.1.02021-08-305530New source: MongoDb ported to java