The Dangers of Default SSH Configurations
Summary
An unexpected discovery of SSH brute force attempts led me to implement better security measures to protect my VPS, create a Python script to first, resolve the geolocations of IP addresses and second, transform the log file into an Excel file and, finally, develop a Power BI report to analyze the log file more effectively and better understand the threat actors.
The Unexpected Discovery
It all began when I logged into my VPS (ahmed.ovh) like any other day to run a git pull. Out of curiosity, I decided to run ss, and to my surprise, I noticed another open SSH connection.

This was extremely bizarre, as I am the only person who knows the credentials to my VPS. Determined to figure out what was happening, I started investigating.
Investigating the Anomaly
I first executed the
whocommand to see all currently logged-in SSH sessions, but there were no connections other than mine.Next, I ran the
lastcommand to check previously logged-in SSH sessions, but the mysterious IP address wasn't there either.I then executed
sudo journalctl -u ssh | grep "199.245.100.193"and my suspicions were confirmed, this is indeed a malicious activity.

I Executed
sudo journalctl -u ssh | grep "Failed" > failed_ssh_attemptsto help me identify if other IP addresses had attempted to brute-force SSH Auth. The results were shocking:The
failed_ssh_attemptsfile turned out to be 58MB in size, containing 541,156 lines (authentication attempts) from 8007 different IP addresses, all occurring within a 90-day period.
I'm now considering creating an entire dashboard to better analyze the log file as a fun side project ๐คทโโ๏ธ so more on that later.
Investigation Conclusion
So far, I've come to the conclusion that my VPS has been targeted by users from all around the globe (a quick geolocation of a few IP addresses returned results from India, the Netherlands, the USA, and more).
That said, I still donโt fully understand why the IP 199.245.100.193 appears in the ss output but is absent everywhere else. My best guess is that an ESTABLISHED connection doesnโt necessarily mean authentication was successful. This could indicate that, at the time I ran ss, the attacker was actively attempting a brute-force attack.
With all of this in mind, itโs clear that itโs finally time to invest some efforts into securing my VPS!
Next Step: Securing My VPS
Update SSH Configuration
Disable root login via SSH
Change the default SSH port to a non-standard one for added security and allow access to it through the firewall
Disable password-based authentication; enforce SSH key-based login only
Install and configure Fail2ban
Enable fail2ban: Automatically block IP addresses that attempt brute-force attacks or other malicious activities
Kill all SSH processes
Make sure that I have unattended-upgrades package installed and running
And voilร ! We can now sit back and watch UFW block potential attackers using tail -f /var/log/ufw.log, all while sipping on our coffee of victory.

Interactive Report and Data Analysis
Download the
failed_ssh_attemptsfile from the VPS to my local machine using SCP.Reminder: this file is the output of
journalctl -u ssh | grep "Failed password" > failed_ssh_attempts
Convert it into a table format and saved it as
failed_ssh_attempts.xlsxCreate a new Power BI Report in Power BI Desktop
Python Script
import geocoder
import pycountry
import pandas as pd
data = []
with open("failed_ssh_attempts.txt") as file:
for _, line in enumerate(file.readlines()):
dateAndTime = line.split(" vps-360cad1e")[0]
passwordAuthOrNoneAuth = line.split("Failed ")[1].split(" ")[0]
username = line.split("for ")[1]
if "invalid user" in username:
username = line.split("for invalid user ")[1]
username = username.split(" ")[0]
attackerIp = line.split("from ")[1].split(" ")[0]
attackerGeolocation = geocoder.ip(attackerIp)
attackerCountry = pycountry.countries.get(alpha_2=attackerGeolocation.country).name
attackerCity = attackerGeolocation.city
data.append({
'dateAndTime': dateAndTime,
'passwordAuthOrNoneAuth': passwordAuthOrNoneAuth,
'username': username,
'attackerIp': attackerIp,
'attackerCountry': attackerCountry,
'attackerCity': attackerCity,
})
df = pd.DataFrame(data)
df.to_excel('failed_ssh_attempts.xlsx', index=True)
The script worked perfectly, but I hit the geocoder rate limit and was only able to resolve about 600 IP addresses. I needed a solution, so initially, I considered rotating between different free proxies. However, I decided to switch to using a VPN instead. Additionally, I had to modify the script to process the log file in chunks. I will also make sure to save the geolocation data of resolved IP addresses to avoid sending duplicate requests for IPs whose locations have already been determined.
import geocoder
import pycountry
import pandas as pd
from itertools import islice
import pickle
data = []
old_last_reached_attempt = last_reached_attempt = 0
try:
with open("cached_ips.pkl", "rb") as file:
cached_ips = pickle.load(file)
except FileNotFoundError:
cached_ips = []
with open("failed_ssh_attempts.txt") as file:
for _, line in enumerate(islice(file, last_reached_attempt, None)):
print(_+old_last_reached_attempt, ((_+old_last_reached_attempt)/541160)*100)
dateAndTime = line.split(" vps-360cad1e")[0]
passwordAuthOrNoneAuth = line.split("Failed ")[1].split(" ")[0]
username = line.split("for ")[1]
if "invalid user" in username:
username = line.split("for invalid user ")[1]
username = username.split(" ")[0]
attackerIp = line.split("from ")[1].split(" ")[0]
try:
attacker_info = next((entry for entry in cached_ips if entry["ip"] == attackerIp), None)
if attacker_info:
print(attacker_info)
attackerCountry = attacker_info["country"]
attackerCity = attacker_info["city"]
else:
attackerGeolocation = geocoder.ip(attackerIp)
attackerCountry = pycountry.countries.get(alpha_2=attackerGeolocation.country).name
attackerCity = attackerGeolocation.city
cached_ips.append({"ip": attackerIp, "country": attackerCountry, "city": attackerCity})
data.append({
'dateAndTime': dateAndTime,
'passwordAuthOrNoneAuth': passwordAuthOrNoneAuth,
'username': username,
'attackerIp': attackerIp,
'attackerCountry': attackerCountry,
'attackerCity': attackerCity,
})
last_reached_attempt = _+old_last_reached_attempt
print(last_reached_attempt)
except:
break
df = pd.DataFrame(data)
df.to_excel(f'failed_ssh_attempts_{last_reached_attempt}.xlsx', index=True)
with open("cached_ips.pkl", "wb") as file:
pickle.dump(cached_ips, file)During the first execution of this script, I was able to add 60,013 new entries to an Excel file, that's a ~9870% growth, all thanks to implementing caching for the resolved ips.
However, the process isn't perfect. I still need to change the VPN server for each execution whenever I hit the rate limit. Additionally, there is some manual effort involved in running the script and updating the last_reached_attempt value. That said, it's not a big issue since my primary goal is to get the job done.
In the future, I might optimize the process, improve logging, and enhance the script to accept any SSH log file as input. I might even create a public Github repository for it. ๐คทโโ๏ธ
After a couple of executions, we finally have all chunks of data, each stored in a separate Excel file. Our next task is to merge them into a single file.
import pandas as pd
import os
import re
# Get the list of all .xlsx files in the current directory
xlsx_files = [file for file in os.listdir() if file.endswith('.xlsx')]
xlsx_files.sort(key=lambda x: int(re.search(r'(\d+)', x).group()))
# Initialize an empty list to store dataframes
dfs = []
# Loop through each .xlsx file
for file in xlsx_files:
# Read the Excel file
df = pd.read_excel(file)
# Remove the first column (index column)
df = df.iloc[:, 1:]
# Append the dataframe to the list
dfs.append(df)
# Concatenate all dataframes vertically (stack them on top of each other)
merged_df = pd.concat(dfs, ignore_index=True)
# Save the merged dataframe to a new Excel file
merged_df.to_excel('merged_failed_ssh_attempts.xlsx', index=False)
print("Files merged successfully into 'merged_failed_ssh_attempts.xlsx'.")
And voila! All that's left to do for this unexpected project is report creation!
Github Repository
I then created a single Python script that automates all these tasks more efficiently. To execute it, all you need to do is:
python ssh_log_to_excel.py (-i log_file.txt/-g) -o outputPower BI Report
Loaded data into Power BI Desktop and opened my Excel sheet in Power Query.
Promoted the first row as Headers.
I tried to change the
dateAndTimecolumn toDatetime, but it returned errors. To resolve this, I:Split the column by space, which created three columns: Month, Day, and Time.
Assigned the correct data type to each column.
Merged the Month and Day columns, then changed the new column's type to
Date.
Added an index column.
Created a date dimension using M Query and assigned the correct data types.
Applied the changes and exited Power Query.
Created a one-to-many relationship between the
dateAndTimecolumn in thefailedSSHAttemptstable and thedatecolumn in thedimDatetable in the Model view.I ensured that the months in the
dimDatetable were sorted by their corresponding month numbers in the Table view.

Now, that everything is set and ready. All thatโs left is report creation and finding interesting insights to learn more about these threat actors.

Conclusion
There are some interesting insights to take away from the big picture of this report, especially when pinpointing where most of the attacks are coming from which sheds some light on the current state of cybersecurity. One particularly interesting insight is that there were 2,371 attempts using the username ahmed indicating that threat actors are likely incorporating domain names into their dictionary attacks. Additionally, if I were to not secure my VPS, we'd expect about 220,047 attacks by the end of November which makes sense given the fact that servers become more discoverable by scanners over time.
This project has been both enjoyable and educational. It highlights the importance of robust security configurations and effective security controls. For example, if I were using a SIEM, I would be notified of such malicious behavior.
ใคใฅใ
Last updated
Was this helpful?
