关于ORA-3136造成数据库宕机的疑惑

mayongzhi 2008-06-23 05:59:22

环境：linux as4 2台 +oracle 10.2.1 +盘柜两个节点的rac (asm+raw)

其中一个节点在持续报（每分钟三次，持续9个小时)
WARNING: inbound connection timed out (ORA-3136)
然后数据库无响应，应用连不上数据库。客户端也连接不上这个节点

当时的日志注意时间点10.11-15左右无响应了：
ARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:07:33 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:08:40 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:09:24 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:09:24 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:11:21 2008
parallel recovery started with 7 processes
Thu Jun 19 10:11:22 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:11:26 2008
Started redo scan
Thu Jun 19 10:11:48 2008
Completed redo scan
7949 redo blocks read, 877 data blocks need recovery
Thu Jun 19 10:12:03 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:12:13 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:12:13 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:12:33 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:12:33 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:13:36 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:13:36 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:13:41 2008
Started redo application at
Thread 2: logseq 428, block 350638
Thu Jun 19 10:14:13 2008
Recovery of Online Redo Log: Thread 2 Group 4 Seq 428 Reading mem 0
Mem# 0 errs 0: +ASM_DATA/oradb/onlinelog/group_4.266.652718953
Thu Jun 19 10:14:17 2008
Completed redo application
Thu Jun 19 10:14:19 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:14:35 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:15:23 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:15:23 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:16:27 2008
Completed instance recovery at
Thread 2: logseq 428, block 358587, scn 5360583084
755 data blocks read, 3186 data blocks written, 7949 redo blocks read
Thu Jun 19 10:17:47 2008
WARNING: inbound connection timed out (ORA-3136)
Thu Jun 19 10:17:49 2008

查看sqlnet.log，持续报如下错:
Fatal NI connect error 12170.

VERSION INFORMATION:
TNS for Linux: Version 10.2.0.1.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 10.2.0.1.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 10.2.0.1.0 - Production
Time: 19-JUN-2008 10:08:40
Tracing not turned on.
Tns error struct:
ns main err code: 12535
TNS-12535: TNS:operation timed out
ns secondary err code: 12606
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=10.1.4.101)(PORT=23001))

在网上查找了一番，都是摘了METALINK的一段话，改为0，如下面：
Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.123)(PORT=58147)) 这是和网络连接相关的一个错误,Metalink上给出了如下的解决方案:
1.set INBOUND_CONNECT_TIMEOUT_ =0 in listener.ora
2. set SQLNET.INBOUND_CONNECT_TIMEOUT = 0 in sqlnet.ora of server.
3. stop and start both listener and database.
4. Now try to connect to DB and observe the behaviour

但没人说是如何产生的，怎样可以重现，改为0的意义（问题是否依然存在，只是不报警了？)

哎，希望高手们有知道的分析下，
还有此故障的一个节点坏了后，为什么另一个节点也不行了呢?

...全文

6034 14 打赏收藏转发到动态举报

写回复

用AI写文章

14 条回复

切换为时间正序

请发表友善的回复…

发表回复

lindejin 2011-12-20

打赏
举报

Well 2011-03-18

打赏
举报

学习了。。

william3033 2009-06-10

打赏
举报

学习。

liuyi8903 2009-06-10

打赏
举报

Set-up client sqlnet tracing and listener tracing, both set to level 16 or support

eg:

CLIENT (SQLNET.ORA)

trace_level_client = 16
trace_file_client = cli
trace_directory_client = /u01/app/oracle/product/9.0.1/network/trace
trace_unique_client = on
trace_timestamp_client = on
trace_filelen_client = 100
trace_fileno_client = 2
log_file_client = cli
log_directory_client = /u01/app/oracle/product/9.0.1/network/log
tnsping.trace_directory = /u01/app/oracle/product/9.0.1/network/trace
tnsping.trace_level = admin

SERVER (SQLNET.ORA)
...
trace_level_server = 16
trace_file_server = svr
trace_directory_server = /u01/app/oracle/product/9.0.1/network/trace
trace_unique_server = on
trace_timestamp_server = on
trace_filelen_server = 100
trace_fileno_server = 2
log_file_server = svr
log_directory_server = /u01/app/oracle/product/9.0.1/network/log

namesctl.trace_level = 16
namesctl.trace_file = namesctl
namesctl.trace_directory = /u01/app/oracle/product/9.0.1/network/trace
namesctl.trace_unique = on

LISTENER (LISTENER.ORA)
...
trace_level_listener = 16
trace_file_listener = listener
trace_directory_listener = /u01/app/oracle/product/9.0.1/network/trace
trace_timestamp_listener = on
trace_filelen_listener = 100
trace_fileno_listener = 2
logging_listener = off
log_directory_listener = /u01/app/oracle/product/9.0.1/network/log
log_file_listener=listener

inthirties 2009-06-10

打赏
举报

这么有规律的连接，很像攻击

inthirties 2009-06-10

打赏
举报

up，问题解决了没有

inthirties 2009-06-05

打赏
举报

是不是有类似Dos的攻击，

==================================================================
Inthirties关注Oracle数据库维护优化，安全，备份，恢复，迁移，故障处理

如果你需要帮助或想和我一起学习的请联系
联系方式QQ：370140387
电子邮件：dba@Inthirties.com
网站: http://www.inthirties.com

manchun 2009-06-04

打赏
举报

[Quote=引用 6 楼 soonwind 的回复:]
我也碰到这个问题，日志如下：

Sun Jun 4 17:42:31 2006
Created guaranteed restore point C090604
Sun Jun 4 19:01:38 2006
*************************************************************
Unable to allocate flashback log of 2041 blocks from
current recovery area of size 4294967296 bytes.
Recovery Writer (RVWR) is stuck until more space is
available in the recovery area.
Unable to w…
[/Quote]

支持，学习了，楼主怎么还不结贴啊？

soonwind 2009-06-04

打赏
举报

我也碰到这个问题，日志如下：

Sun Jun 4 17:42:31 2006
Created guaranteed restore point C090604
Sun Jun 4 19:01:38 2006
*************************************************************
Unable to allocate flashback log of 2041 blocks from
current recovery area of size 4294967296 bytes.
Recovery Writer (RVWR) is stuck until more space is
available in the recovery area.
Unable to write Flashback database log data because the
recovery area is full, presence of a guaranteed
restore point and no reusable flashback logs.
Use ALTER SYSTEM SET db_recovery_file_dest_size command
to add space. DO NOT manually remove flashback log files
to create space.

后来发现是Created guaranteed restore point 这个搞的，原因闪回空间已经不足了。
把ALTER SYSTEM SET db_recovery_file_dest_size=加大。

问题就解决了。

mayongzhi 2008-06-25

打赏
举报

可是其它服务器好好的.而且,在那个时间点确实是alert.log 有异常
Thu Jun 19 10:14:13 2008
Recovery of Online Redo Log: Thread 2 Group 4 Seq 428 Reading mem 0
Mem# 0 errs 0: +ASM_DATA/oradb/onlinelog/group_4.266.652718953