I love sharing my screw-ups with you all. One of my favorite things, right up there with raindrops on roses and whiskers on kittens. It's therapeutic for me and, hopefully, helpful for you in the God-I-hope-I-can-avoid-HIS-idiotic-mistakes way. Win/win.
Thankfully, it happens so often I've got enough material to keep this blog afloat for months. Take our recent Oracle issues at the office.
We have a fairly large Oracle database in one of our remote offices that's home to about 200GB of GIS data. In addition to that production version, we maintain a development copy of the data for, well, development.
Occasionally, we need to move data between the two instances, and my prefered method for doing so involves using the Oracle Datapump utilities (expdp for exporting, impdp for importing). My process looks something like this:
- Dump source database to disk via expdp.
- Copy dump file to destination server.
- Drop the schemas I need to refresh.
- Import the specific schemas via impdp.
I could just export the specific schemas that I'm interested in, but I like to get the full backup just in case there's some object I forgot I needed, etc.
Couple quick explanations for any non-Oracle types out there. First, in Oracle, a "schema" and a "user" are essentially interchangeable terms. Schemas own objects, can have roles/privileges granted to them, etc.
For example, the TestUser schema could be granted the CONNECT role, thereby giving it the ability to, well, connect to the database. The TestUser role might also be granted the ability to create objects like tables, views, etc., so you might find yourself querying the TestUser.TestTable table.
Second, Oracle has an object called a Sequence that doesn't exist in certain other databases (<cough>MS SQL<cough>). Like the name implies, a Sequence is an object you can use to create unique, sequential-ish* values to insert into tables. Something you might insert, for example, into an ID field.
Got it? Everything making sense? Ok, on to My Screw Up.
The other day, I needed to refresh several Schemas in our Dev environment with Production data and, as usual, I did so using those 4 basic steps I mentioned earlier. Export, copy, drop, import. Seemed to work like a champ.
Until, that is, users started playing with Dev again.
Suddenly, they'd receive various contraint violations when trying to insert into tables. In a nutshell, some of the tables had unique constraints on ID fields so a user or application couldn't accidentally insert records with the same ID. The users and applications would get those ID values by using the NextVal function of existing sequences.
Now, as I mentioned, Oracle GUARANTEES that a sequence will never ever give out the same value twice (unless, of course, you went and fiddled with the Sequence by reseeding it, dropping/recreating, etc). And yet, a database that worked fine in Production seemed to be doing just that.
Or rather, it seems that somehow the Sequences and Tables using them were getting out of sync.
The weird thing was that it didn't happen all the time. We had some 2000 sequences in this database, and not all of them were acting like this. Some applications would work fine after one database refresh, and fail completely after another. No rhyme or reason.
We hit our heads against this for days, trying to get to the bottom of it before we figured out what was happening.
Remember how I mentioned that the Production database was pretty large, about 200GB in size? Well, with our hardware, Oracle Data Dump would take about 1 hour to completely export the data. And, because we were lazy, we were performing the export during normal working hours, meaning that users were actively hitting the database during that hour long export process.
Furthermore, after looking at the export logs, we realized that the expdp process dumped out the Sequence data toward the beginning of the process and dumped the table data toward the end.
See what was happening?
Let's say that we started the export at noon. In our mythical database, we've got a TestUser schema with a TestSequence object used to populate the TestId field in TestTable.
Before the export begins, TestSequence's Current Value would be 100, it's NextVal would be 101, and the MAX(TestId) in TestUser.TestTable should be 100.
So the export kicks off and dumps the TestSequence info to disk at around 12:01. It then goes on to dump other objects to disk. Constraints, views, etc.
Now it gets to the tables, maybe 15 minutes later. It starts processing those tables, starting with the largest ones. TestUser.TestTable is a smaller table, so maybe expdp doesn't dump it to disk until 12:45.
Meanwhile, endusers have been actively using the database and inserting rows into TestUser.TestTable, using TestUser.TestSequence to get the TestID data. So maybe now TestSequence's Current Value is up to 105. More importantly, the MAX(TestId) in TestTable is also up to 105 when it's finally written to the dumpfile.
Now, when I copy that dump file over to the destination server and import it, it imports TestSequence with the data it had when it was first exported (CurrVal: 100, NextVal: 101) and the TestTable with the data it had when it was exported (Max(TestID): 105).
See the problem? When someone in Dev next tries to insert into the copy of TestTable, it gets the value 101 from TestSequence. However, that table already had that value inserted. User ends up with an ORA-00001: unique constraint violated message, and I get my lunch interrupted with phone calls.
Simple fix, of course, once we figured out the problem: don't export while users are connected. Either schedule the dump to occur during after hour maintenance windows, or lock everyone out for that hour.
Alternatively, we could have taken a second export of the source database, one that only dumps the Sequence info. Then, after importing the first, full file, import the second sequence-only one.
*Oracle guarantees that Sequence values will be unique, but not necessarily sequential. Has to do with how Oracle handles caching sequence values for different sessions, etc. If you need the values to be sequential, you'll need to write your own function.