Linux運維實戰指南：10個常見問題及解決方案

作為一名資深運維工程師，我總結了在日常工作中最常遇到的Linux系統問題。這些問題涵蓋了系統性能、磁碟管理、網路配置等核心領域，每個問題都提供了詳細的診斷步驟和多種解決方案。

1. 系統負載過高，響應緩慢

問題現象

• 系統響應時間明顯延長
• 使用者操作卡頓
• 應用程式啟動緩慢

診斷步驟

# 檢視系統負載uptimetop -chtop# 檢視CPU使用情況vmstat 1 5iostat -x 1 5# 檢視記憶體使用free -hcat /proc/meminfo

解決方案

臨時處理：

# 找出佔用CPU最高的程序ps aux --sort=-%cpu | head -10# 殺死異常程序（謹慎操作）kill -9 PID# 清理系統快取（謹慎使用）echo 3 > /proc/sys/vm/drop_caches

長期最佳化：

• 調整程序優先順序：nice -n 10 command
• 最佳化系統引數：調整/etc/sysctl.conf
• 增加硬體資源或最佳化應用程式

2. 磁碟空間不足

問題現象

• 系統提示"No space left on device"
• 無法建立新檔案
• 應用程式異常退出

診斷步驟

# 檢視磁碟使用情況df -hdu -sh /*du -sh /var/log/*# 查詢大檔案find / -type f -size +100M -execls -lh {} \;find /var/log -name "*.log" -size +50M# 檢視inode使用情況df -i

解決方案

即時清理：

# 清理系統日誌journalctl --vacuum-time=7dlogrotate -f /etc/logrotate.conf# 清理臨時檔案rm -rf /tmp/*rm -rf /var/tmp/*# 清理包快取apt clean  # Ubuntu/Debianyum clean all  # CentOS/RHEL

預防措施：

# 設定日誌輪轉cat > /etc/logrotate.d/custom << EOF/var/log/application.log {    daily    rotate 7    compress    delaycompress    missingok    notifempty    create 644 user group}EOF# 設定磁碟監控echo"*/10 * * * * root df -h | awk '\$5 > 80 {print \$0}' | mail -s 'Disk Usage Alert' [email protected]" >> /etc/crontab

3. 網路連線異常

問題現象

• 無法訪問外網
• 服務間通訊失敗
• 網路延遲過高

診斷步驟

# 檢查網路介面ip addr showifconfig# 測試網路連通性ping -c 4 8.8.8.8traceroute google.commtr google.com# 檢查路由表ip route showroute -n# 檢查DNS解析nslookup google.comdig google.com# 檢查防火牆狀態iptables -L -nfirewall-cmd --list-all

解決方案

網路配置修復：

# 重啟網路服務systemctl restart networking  # Ubuntusystemctl restart network     # CentOS# 手動配置IP（臨時）ip addr add 192.168.1.100/24 dev eth0ip route add default via 192.168.1.1# 修改DNS配置echo"nameserver 8.8.8.8" > /etc/resolv.confecho"nameserver 8.8.4.4" >> /etc/resolv.conf

防火牆配置：

# 檢視並允許特定埠iptables -A INPUT -p tcp --dport 80 -j ACCEPTiptables -A INPUT -p tcp --dport 443 -j ACCEPT# 儲存防火牆規則iptables-save > /etc/iptables/rules.v4

4. 服務無法啟動

問題現象

• systemctl start 命令失敗
• 服務狀態顯示failed
• 應用程式埠未監聽

診斷步驟

# 檢視服務狀態systemctl status service_namejournalctl -u service_name -n 50# 檢查配置檔案systemctl cat service_namenginx -t  # 檢查nginx配置apache2ctl configtest  # 檢查apache配置# 檢查端口占用netstat -tulpn | grep :80ss -tulpn | grep :80lsof -i :80

解決方案

配置檔案修復：

# 備份原配置cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak# 測試配置語法nginx -tsystemctl reload nginx# 檢視服務依賴systemctl list-dependencies service_name

許可權問題處理：

# 檢查檔案許可權ls -la /var/log/nginx/chown -R nginx:nginx /var/log/nginx/chmod 755 /var/log/nginx/# 檢查SELinux狀態getenforcesetsebool -P httpd_can_network_connect 1

5. SSH連線被拒絕

問題現象

• "Connection refused"錯誤
• "Permission denied"提示
• 連線超時

診斷步驟

# 檢查SSH服務狀態systemctl status sshdps aux | grep sshd# 檢查SSH配置sshd -T | grep -E "(port|permitrootlogin|passwordauthentication)"# 檢查網路連線netstat -tulpn | grep :22iptables -L | grep ssh# 檢視認證日誌tail -f /var/log/auth.logjournalctl -u sshd -f

解決方案

服務修復：

# 重啟SSH服務systemctl restart sshd# 檢查配置檔案語法sshd -t# 修改SSH埠（如果需要）sed -i 's/#Port 22/Port 2222/' /etc/ssh/sshd_configsystemctl reload sshd

安全配置最佳化：

# 停用root登入sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config# 啟用金鑰認證sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/' /etc/ssh/sshd_config# 設定登入失敗限制echo"MaxAuthTries 3" >> /etc/ssh/sshd_configecho"MaxStartups 10:30:60" >> /etc/ssh/sshd_config

6. 記憶體使用率過高

問題現象

• 系統響應緩慢
• 應用程式被OOM killer殺死
• swap使用率很高

診斷步驟

# 檢視記憶體使用詳情free -hcat /proc/meminfovmstat 1 5# 檢視記憶體消耗top程序ps aux --sort=-%mem | head -10top -o %MEM# 檢查swap使用swapon -scat /proc/swaps# 檢視OOM killer日誌dmesg | grep -i "killed process"journalctl -k | grep -i "killed process"

解決方案

臨時釋放記憶體：

# 清理快取echo 3 > /proc/sys/vm/drop_cachessync# 重啟記憶體佔用過高的服務systemctl restart high_memory_service# 調整swap使用策略echo 10 > /proc/sys/vm/swappiness

長期最佳化：

# 永久設定swappinessecho"vm.swappiness=10" >> /etc/sysctl.conf# 配置應用程式記憶體限制# 在systemd服務檔案中新增[Service]MemoryLimit=1GMemoryMax=1G

7. 檔案系統錯誤

問題現象

• 檔案系統只讀
• 檔案損壞或丟失
• 磁碟I/O錯誤

診斷步驟

# 檢查檔案系統狀態mount | grep "ro,"df -h# 檢視磁碟錯誤dmesg | grep -i errorcat /var/log/messages | grep -i error# 檢查磁碟健康狀態smartctl -a /dev/sdabadblocks -v /dev/sda1

解決方案

檔案系統修復：

# 解除安裝檔案系統（如果可能）umount /dev/sda1# 執行檔案系統檢查fsck -f /dev/sda1e2fsck -f /dev/sda1  # ext檔案系統xfs_repair /dev/sda1  # XFS檔案系統# 強制以讀寫模式重新掛載mount -o remount,rw /

預防措施：

# 定期檔案系統檢查echo"0 2 * * 0 root fsck -A -R -T -C -a" >> /etc/crontab# 監控磁碟健康smartctl -t short /dev/sdasmartctl -a /dev/sda

8. 時間同步問題

問題現象

• 系統時間不準確
• 日誌時間戳混亂
• 認證失敗

診斷步驟

# 檢視當前時間datetimedatectl status# 檢查NTP服務systemctl status ntpsystemctl status chronyntpq -p# 檢視時區設定ls -la /etc/localtimecat /etc/timezone

解決方案

NTP配置：

# 安裝NTP服務apt install ntp  # Ubuntuyum install ntp  # CentOS# 配置NTP伺服器cat > /etc/ntp.conf << EOFserver 0.pool.ntp.org iburstserver 1.pool.ntp.org iburstserver 2.pool.ntp.org iburstEOF# 啟動並啟用NTPsystemctl enable ntpsystemctl start ntp

使用timedatectl（推薦）：

# 啟用NTPtimedatectl set-ntp true# 設定時區timedatectl set-timezone Asia/Shanghai# 手動同步時間ntpdate -s time.nist.gov

9. 程序殭屍/孤兒程序

問題現象

• 系統中存在大量zombie程序
• 程序無法正常終止
• 資源無法釋放

診斷步驟

# 檢視殭屍程序ps aux | awk '$8 ~ /^Z/ {print $0}'ps -eo pid,stat,comm | grep Z# 檢視程序樹pstree -pps -ef --forest# 檢視程序狀態cat /proc/PID/statusls -la /proc/PID/

解決方案

清理殭屍程序：

# 找到父程序並重啟ps -o pid,ppid,state,comm | grep Zkill -CHLD parent_pid# 強制殺死程序組kill -9 -process_group_id# 重啟相關服務systemctl restart problematic_service

預防措施：

# 編寫指令碼監控殭屍程序cat > /usr/local/bin/zombie_monitor.sh << 'EOF'#!/bin/bashzombies=$(ps aux | awk '$8 ~ /^Z/ {print $2}' | wc -l)if [ $zombies -gt 10 ]; thenecho"發現 $zombies 個殭屍程序" | mail -s "殭屍程序警告" [email protected]fiEOFchmod +x /usr/local/bin/zombie_monitor.shecho"*/5 * * * * root /usr/local/bin/zombie_monitor.sh" >> /etc/crontab

10. 系統安全問題

問題現象

• 異常網路連線
• 未知程序執行
• 系統檔案被修改

診斷步驟

# 檢查異常連線netstat -antup | grep ESTABLISHEDss -tuln# 檢視登入記錄last -n 20lastlogwho -a# 檢查可疑程序ps aux | grep -v "\["lsof -ifind /tmp -type f -executable# 檢查系統完整性rpm -Va  # RHEL/CentOSdebsums -c  # Ubuntu/Debian

解決方案

安全加固：

# 更新系統補丁apt update && apt upgrade  # Ubuntuyum update  # CentOS# 配置防火牆ufw enable# Ubuntufirewall-cmd --permanent --add-service=sshfirewall-cmd --reload  # CentOS# 安裝安全工具apt install fail2ban rkhunter chkrootkit

監控指令碼：

# 建立安全監控指令碼cat > /usr/local/bin/security_check.sh << 'EOF'#!/bin/bash# 檢查異常登入lastlog | awk '$2 !~ /Never/ && $2 !~ /pts/ {print "異常登入: " $0}'# 檢查可疑程序ps aux | awk '$11 ~ /^\[/ {next} $1 == "root" && $11 !~ /^\// {print "可疑程序: " $0}'# 檢查網路連線netstat -antup | awk '$6 == "ESTABLISHED" && $5 !~ /^(127\.|192\.168\.|10\.)/ {print "外部連線: " $0}'EOFchmod +x /usr/local/bin/security_check.shecho"0 */6 * * * root /usr/local/bin/security_check.sh | mail -s 'Security Check Report' [email protected]" >> /etc/crontab