This post is also available in: Português
Those days, my production dataguard system suddenly started to receive a weird ORA-01031: insufficient privileges on DGMGRL for any task like "show configuration" or "edit database set state". More weird is that nothing (apparently) was changed on the system. Another troubleshooting saga in my DBA life was about to start..
My broker was configured with user SYS.. how is it possible for user SYS to have a missing privilege?! If the error was related with invalid password, this would be much easier to analyze.
As sometimes DGMGRL is a little bit buggy, first thing I've tried was to remove the configuration and add it back. The remove only happened in one side.. so I had to drop manually the broker files from the standby side to create a new configuration. When I executed the ADD DATABASE command, I received the same boring error: ORA-01031: insufficient privileges. Time to double check all the dataguard setup.
So I started my check list:
- Check if my password file and database SYS user password were aligned. ✅
- Check if all password files were aligned across primary instances and standby. ✅
- Check if REMOTE_LOGIN_PASSWORDFILE was correct. ✅
- Check if AUTHENTICATION_SERVICES on sqlnet.ora was not blocking password file authentication. ✅
- If the static listeners were up and working. ✅
- If the TNS file was aligned across primary and standby instances. ✅
- Tested if SYS connection using @TNS alias worked using sqlplus to primary and standby. ✅
- Tested if SYS connection using @TNS alias worked using dgmgrl to primary and standby. ✅
And everything was working but the ADD DATABASE command returned this damn boring "ORA-01031: insufficient privileges". Again: it was not an authentication problem but a privilege problem with SYS!!
I've decided to open a SR and meanwhile I started debugging DGMGRL. Running the ADD DATABASE with "dgmgrl -debug" didn't give me any info at all.. so next step is to enable trace on the database side.
I've turned on event 1031 on all my instances on both primary and standby:
SQL> alter system set events '1031 trace name errorstack level 10';
However, the ADD COMMAND, even triggering the ORA-01031 error, was not generating anything on the traces! I've turned it off to avoid excessive trace logs on the production daily operation:
SQL> alter system set events '1031 trace name context off';
A teammate also told me to check if the password on pwfile was case sensitive while insensitive withing the database. That was a good shoot.. my SEC_CASE_SENSITIVE_LOGON was set to FALSE and my SYS password had a mix of upper and lower chars. My pwfile was not created with ignorecase=y.. could it be the reason? Unfortunately, no.. I've changed it and realigned the pwfile but the error persisted.
Finally, after +6 hours on the issue, another friend told me about this Doc ID:
Basically this note points some checks I've already tested, like if oracle account is within dba group, etc.. plus a very specific item that surprised me:
NSCD issues Intermittent ora-1031 errors for sysdba OS authentication can occur because of the nscd service that caches information from the /etc/passwd and /etc/group as configured in /etc/nscd.conf, if stopping this service makes the problem go away then please refer to the OS vendor.
And voila! After a service nscd stop/start, the ADD COMMAND succeeded. I could have never imagined this service would impact the SYS privileges.. meanwhile, oracle support was still asking in my sev1 ticket to check the archive gaps between primary and standby...
Have you enjoyed? Please leave a comment or give a 👍!