The Dangers of Default SSH Configurations

Summary

An unexpected discovery of SSH brute force attempts led me to implement better security measures to protect my VPS, create a Python script to first, resolve the geolocations of IP addresses and second, transform the log file into an Excel file and, finally, develop a Power BI report to analyze the log file more effectively and better understand the threat actors.

The Unexpected Discovery

It all began when I logged into my VPS (ahmed.ovh) like any other day to run a git pull. Out of curiosity, I decided to run ss, and to my surprise, I noticed another open SSH connection.

Established SSH connection from 199.245.100.193 to my VPS

This was extremely bizarre, as I am the only person who knows the credentials to my VPS. Determined to figure out what was happening, I started investigating.

Investigating the Anomaly

  1. I first executed the who command to see all currently logged-in SSH sessions, but there were no connections other than mine.

  2. Next, I ran the last command to check previously logged-in SSH sessions, but the mysterious IP address wasn't there either.

  3. I then executed sudo journalctl -u ssh | grep "199.245.100.193" and my suspicions were confirmed, this is indeed a malicious activity.

Multiple failed brute-force attempts
  1. I Executed sudo journalctl -u ssh | grep "Failed" > failed_ssh_attempts to help me identify if other IP addresses had attempted to brute-force SSH Auth. The results were shocking:

    1. The failed_ssh_attempts file turned out to be 58MB in size, containing 541,156 lines (authentication attempts) from 8007 different IP addresses, all occurring within a 90-day period.

I'm now considering creating an entire dashboard to better analyze the log file as a fun side project ๐Ÿคทโ€โ™‚๏ธ so more on that later.

Investigation Conclusion

So far, I've come to the conclusion that my VPS has been targeted by users from all around the globe (a quick geolocation of a few IP addresses returned results from India, the Netherlands, the USA, and more).

It's very easy to search for servers worldwide with open SSH ports and launch a dictionary attack against them. I believe that's what happened in my case.

That said, I still donโ€™t fully understand why the IP 199.245.100.193 appears in the ss output but is absent everywhere else. My best guess is that an ESTABLISHED connection doesnโ€™t necessarily mean authentication was successful. This could indicate that, at the time I ran ss, the attacker was actively attempting a brute-force attack.

This has later been confirmed, as I execute the command tail failed_ssh_attempts the last entry is indeed from that ip address.

With all of this in mind, itโ€™s clear that itโ€™s finally time to invest some efforts into securing my VPS!

Next Step: Securing My VPS

  1. Update SSH Configuration

    1. Disable root login via SSH

    2. Change the default SSH port to a non-standard one for added security and allow access to it through the firewall

    3. Disable password-based authentication; enforce SSH key-based login only

  2. Install and configure Fail2ban

    1. Enable fail2ban: Automatically block IP addresses that attempt brute-force attacks or other malicious activities

  3. Kill all SSH processes

  4. Make sure that I have unattended-upgrades package installed and running

And voilร ! We can now sit back and watch UFW block potential attackers using tail -f /var/log/ufw.log, all while sipping on our coffee of victory.

We won, we can now chill

Interactive Report and Data Analysis

  1. Download the failed_ssh_attempts file from the VPS to my local machine using SCP.

    1. Reminder: this file is the output of journalctl -u ssh | grep "Failed password" > failed_ssh_attempts

  2. Convert it into a table format and saved it as failed_ssh_attempts.xlsx

  3. Create a new Power BI Report in Power BI Desktop

Python Script

import geocoder
import pycountry
import pandas as pd

data = []

with open("failed_ssh_attempts.txt") as file:
    for _, line in enumerate(file.readlines()):
        dateAndTime = line.split(" vps-360cad1e")[0]    
        passwordAuthOrNoneAuth = line.split("Failed ")[1].split(" ")[0]
        username = line.split("for ")[1]
        if "invalid user" in username:
            username = line.split("for invalid user ")[1]
        username = username.split(" ")[0]
        attackerIp = line.split("from ")[1].split(" ")[0]
        attackerGeolocation = geocoder.ip(attackerIp)
        attackerCountry = pycountry.countries.get(alpha_2=attackerGeolocation.country).name
        attackerCity = attackerGeolocation.city

        data.append({
            'dateAndTime': dateAndTime,
            'passwordAuthOrNoneAuth': passwordAuthOrNoneAuth,
            'username': username,
            'attackerIp': attackerIp,
            'attackerCountry': attackerCountry,
            'attackerCity': attackerCity,
        })

df = pd.DataFrame(data)
df.to_excel('failed_ssh_attempts.xlsx', index=True)

The script worked perfectly, but I hit the geocoder rate limit and was only able to resolve about 600 IP addresses. I needed a solution, so initially, I considered rotating between different free proxies. However, I decided to switch to using a VPN instead. Additionally, I had to modify the script to process the log file in chunks. I will also make sure to save the geolocation data of resolved IP addresses to avoid sending duplicate requests for IPs whose locations have already been determined.

import geocoder
import pycountry
import pandas as pd
from itertools import islice
import pickle

data = []
old_last_reached_attempt = last_reached_attempt = 0

try:
    with open("cached_ips.pkl", "rb") as file:
        cached_ips = pickle.load(file)
except FileNotFoundError:
    cached_ips = []

with open("failed_ssh_attempts.txt") as file:
    for _, line in enumerate(islice(file, last_reached_attempt, None)):
        print(_+old_last_reached_attempt, ((_+old_last_reached_attempt)/541160)*100)
        dateAndTime = line.split(" vps-360cad1e")[0]
        passwordAuthOrNoneAuth = line.split("Failed ")[1].split(" ")[0]
        username = line.split("for ")[1]
        if "invalid user" in username:
            username = line.split("for invalid user ")[1]
        username = username.split(" ")[0]

        attackerIp = line.split("from ")[1].split(" ")[0]

        try:
            attacker_info = next((entry for entry in cached_ips if entry["ip"] == attackerIp), None)
            if attacker_info:
                print(attacker_info)
                attackerCountry = attacker_info["country"]
                attackerCity = attacker_info["city"]
            else:
                attackerGeolocation = geocoder.ip(attackerIp)
                attackerCountry = pycountry.countries.get(alpha_2=attackerGeolocation.country).name
                attackerCity = attackerGeolocation.city

                cached_ips.append({"ip": attackerIp, "country": attackerCountry, "city": attackerCity})

            data.append({
                'dateAndTime': dateAndTime,
                'passwordAuthOrNoneAuth': passwordAuthOrNoneAuth,
                'username': username,
                'attackerIp': attackerIp,
                'attackerCountry': attackerCountry,
                'attackerCity': attackerCity,
            })

            last_reached_attempt = _+old_last_reached_attempt
            print(last_reached_attempt)
        except:
            break

df = pd.DataFrame(data)
df.to_excel(f'failed_ssh_attempts_{last_reached_attempt}.xlsx', index=True)

with open("cached_ips.pkl", "wb") as file:
    pickle.dump(cached_ips, file)

During the first execution of this script, I was able to add 60,013 new entries to an Excel file, that's a ~9870% growth, all thanks to implementing caching for the resolved ips.

However, the process isn't perfect. I still need to change the VPN server for each execution whenever I hit the rate limit. Additionally, there is some manual effort involved in running the script and updating the last_reached_attempt value. That said, it's not a big issue since my primary goal is to get the job done.

In the future, I might optimize the process, improve logging, and enhance the script to accept any SSH log file as input. I might even create a public Github repository for it. ๐Ÿคทโ€โ™‚๏ธ

After a couple of executions, we finally have all chunks of data, each stored in a separate Excel file. Our next task is to merge them into a single file.

import pandas as pd
import os
import re

# Get the list of all .xlsx files in the current directory
xlsx_files = [file for file in os.listdir() if file.endswith('.xlsx')]
xlsx_files.sort(key=lambda x: int(re.search(r'(\d+)', x).group()))

# Initialize an empty list to store dataframes
dfs = []

# Loop through each .xlsx file
for file in xlsx_files:
    # Read the Excel file
    df = pd.read_excel(file)

    # Remove the first column (index column)
    df = df.iloc[:, 1:]

    # Append the dataframe to the list
    dfs.append(df)

# Concatenate all dataframes vertically (stack them on top of each other)
merged_df = pd.concat(dfs, ignore_index=True)

# Save the merged dataframe to a new Excel file
merged_df.to_excel('merged_failed_ssh_attempts.xlsx', index=False)

print("Files merged successfully into 'merged_failed_ssh_attempts.xlsx'.")

And voila! All that's left to do for this unexpected project is report creation!

Github Repository

I then created a single Python script that automates all these tasks more efficiently. To execute it, all you need to do is:

python ssh_log_to_excel.py (-i log_file.txt/-g) -o output
Github Repository

Power BI Report

  1. Loaded data into Power BI Desktop and opened my Excel sheet in Power Query.

  2. Promoted the first row as Headers.

  3. I tried to change the dateAndTime column to Datetime, but it returned errors. To resolve this, I:

    1. Split the column by space, which created three columns: Month, Day, and Time.

      1. Assigned the correct data type to each column.

    2. Merged the Month and Day columns, then changed the new column's type to Date.

  4. Added an index column.

  5. Created a date dimension using M Query and assigned the correct data types.

  6. Applied the changes and exited Power Query.

  7. Created a one-to-many relationship between the dateAndTime column in the failedSSHAttempts table and the date column in the dimDate table in the Model view.

  8. I ensured that the months in the dimDate table were sorted by their corresponding month numbers in the Table view.

Sorting Month by Month Number
  1. Now, that everything is set and ready. All thatโ€™s left is report creation and finding interesting insights to learn more about these threat actors.

Final Power BI Report

Conclusion

There are some interesting insights to take away from the big picture of this report, especially when pinpointing where most of the attacks are coming from which sheds some light on the current state of cybersecurity. One particularly interesting insight is that there were 2,371 attempts using the username ahmed indicating that threat actors are likely incorporating domain names into their dictionary attacks. Additionally, if I were to not secure my VPS, we'd expect about 220,047 attacks by the end of November which makes sense given the fact that servers become more discoverable by scanners over time.

This project has been both enjoyable and educational. It highlights the importance of robust security configurations and effective security controls. For example, if I were using a SIEM, I would be notified of such malicious behavior.

ใคใฅใ

Last updated

Was this helpful?