Linux系统内核崩溃如何排查？( 二 ) _系统

这一次一开始自己便换了一个角度去思考问题，去年是基于单台服务器去分析内核日志错误信息，已经没有什么好的方式了。所以准备同时分析所有出错服务器的日志（幸好以前找运维要了所有出错服务器的内核日志并且保存下来了，不然怎么死的都不知道），找出他们的共同点。首先他们的共同点就是出现了trace子系统打印的警告信息“Delta way too big！… 。。”的信息，但是根据相关信息，这个是不会导致linux系统挂起的。而且确实我们线上的服务器并不是全部都ssh不上去，不过还是在RedHat官方网站找到类似的bug（url：
https： //access.redhat.com/knowledge/solutions/70051），并且给出了解决方案。bug信息和解决方案如下：
why kernel is throwing “Delta way too big” out with
WARNING： at kernel trace ring_buffer，c:1988 rb_reserve_next_event+0x2ce/0×370 messages
0 Issue kernel is throwing ”Delta way too big” out with kernel oops on server
Environment（环境）
•Red Hat Enterprise Linux 6 service pack 1
Resolution（解决方案）
The warning ”Delta way too big” warning might appear on a system with unstable shed clock right after the system is resumed and tracingwas enabled during the suspend.
Since it’s not realy bug，and the unstable sched clock is working fast and reliable otherwise，We suggested to keep using the sched clock in any case and just to make note in the warning itself.or disables tracing by #echo 0 》 /sys/kernel/debug/tracing/tracing_on
Root Cause（根本原因） this case was ftrace involved ftrace due to teh call to ftrace_raw_event_power_end （debugfs is mounted and ftrace loaded in this case），they are to do with problems calculating a time stamp for a ring buffer entry.
Message comes from here and appears to be indicating problems with time stability.
1966 static int
1967 rb_add_time_stamp（struct ring_buffer_per_cpu *cpu_buffer，
1968 u64 *ts，u64 *delta）
1969 {
1970 struct ring_buffer_event *event;
1971 static int once;
1972 int ret;
1973
1974 if （unlikely（*delta 》（1ULL 《《 59） && ！once++）） {
1975 int local_clock_stable = 1;
1976 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
1977 local_clock_stable = sched_clock_stable;
1978 #endif
1979 printk（KERN_WARNING ”Delta way too big！ %llu”
1980 “ ts=%llu write stamp = %llu\n%s”，
1981 （unsigned long long）*delta，
1982 （unsigned long long）*ts，
1983 （unsigned long long）cpu_buffer-》write_stamp，
1984 local_clock_stable ？ ”“ ：
1985 “If you just came from a suspend/resume，\n”
1986 “please switch to the trace global clock：\n”
1987 ” echo global 》 /sys/kernel/debug/tracing/trace_clock\n”）;
1988 WARN_ON（1）;
This called from rb_reserve_next_event（） here.
2122 /*
2123 * Only the first commit can update the timestamp.
2124 * Yes there is a race here. If an interrupt comes in
2125 * just after the conditional and it traces too，then it
2126 * will also check the deltas. More than one timestamp may
2127 * also be made. But only the entry that did the actual
2128 * commit will be something other than zero.
2129 */
2130 if （likely（cpu_buffer-》tail_page == cpu_buffer-》commit_page &&
2131 rb_page_write（cpu_buffer-》tail_page） ==
2132 rb_commit_index（cpu_buffer））） {
2133 u64 diff;
2134
2135 diff = ts - cpu_buffer-》write_stamp;
2136
2137 /* make sure this diff is calculated here */
2138 barrier（）;
2139
2140 /* Did the write stamp get updated already？ */
2141if （unlikely（ts 《 cpu_buffer-》write_stamp））

Linux系统内核崩溃如何排查？( 二 )

推荐阅读

独生子女高考加多少分，广西农村独生子女高考加多少分

蚂蚁腰是多少厘米

重庆公租房有燃气灶吗

vv7浦东什么地方有4s 长城vv54s店

LDPE吹桶可以吗

电磁炉的常见故障有哪些电磁炉常见故障有什么

单方面提出离婚应该是怎么申请

什么颜色的围巾百搭有图片

香葱品种有哪几种？各有什么特点

马鞭草有香味吗

干贝煮10分钟可以吃吗

如何修改qq密码？改密码的方法是什么？

自动挡开车启动步骤

那一刻我真的很感动那一刻我感动了

固安房价为什么这么贵，听说现在北京人都去固安买房了是真的么

牙签发射器,750克白萝卜牙签