Remaking HaveIBeenPwned

Asphyxia

Owner
Administrator
Apr 25, 2015
1,845
2
2,199
327
So, a lot of sites exist similar to HaveIBeenPwned, I went at my own attempt of making one on a much smaller scale with only one large breach (Cit0day):
Code:
   1  apt update
    2  apt upgrade -y
    3  wget localhost
    4  ls
    5  apt install apache2
    6  ls
    7  ls -l
    8  ls -la
    9  ls
   10  cd /
   11  cd /var/www/
   12  ls
   13  cd html/
   14  ls
   15  apt install php
   16  ls
   17  rm index.html
   18  nano index.php
   19  nano index.php
   20  ip a
   21  ip a | grep "inet "
   22  apt install mariadb
   23  apt install mysql_server
   24  apt install mysql
   25  apt install mariadb
   26  apt install mariadb_server
   27  sudo apt update
   28  sudo apt install gnupg
   29  cd /tmp
   30  cd /tmp
   31  wget https://dev.mysql.com/get/mysql-apt-config_0.8.13-1_all.deb
   32  ls
   33  rm mysql-apt-config_0.8.13-1_all.deb
   34  wget https://dev.mysql.com/get/mysql-apt-config_0.8.16-1_all.deb
   35  sudo dpkg -i mysql-apt-config*
   36  sudo apt update
   37  sudo dpkg-reconfigure mysql-apt-config
   38  sudo apt install mysql-server
   39  ls -lah
   40  sudo systemctl status mysql
   41  mysql_secure_installation
   42  mysqladmin -u root -p version
   43  ls
   44  cd /var/www/html
   45  ls
   46  rm index.php
   47  nano index.php
   48  nano index.php
   49  ssh [email protected]
   50  ls
   51  cd /root
   52  ls
   53  mv emails.txt /var/www/html
   54  ls
   55  cd /var/www/
   56  ls
   57  cd html/
   58  ls
   59  time LC_ALL=C grep -Fx -m1 "[email protected]" ./emails.txt
   67  ls
   68  mv emails.txt ../
   69  nano index.php
   70  cd /var/www/html
   71  ls
   72  nano index.php
   73  ls
   74  history
   75  history | nc termbin.com 9999

Supposedly if you wanted more real-time search results, you'd want to be running against something like Azure SQL (Azure Database) or perhaps Solr / Lucene / Elasticsearch.. somethin' that is capable of indexing and generating near-immediate results (ms, not seconds).

You could roll this via the open source DIY method or deploy in AWS.. or hell, just build up on Azure. Pretty sure HIBP does lotta functions and makes use of serverless-esque technology as to scale up and down to limit cost but keep performance when busy times arise.

Anyways..

http://cit0day.com/ was a side-project (just for fun.. to try to keep peeps safe 'n' sh**)

http://cit0day.com/ .. and if you want to see all the emails that were in the listings (this is very noisy and has buncha extra junk data, but here)

http://cit0day.com/ ...... dump like literally just add the word "dump" after http://cit0day.com/ that is the 2 GB of email data (which has lotta phone #s and other garbage) that were in the cit0day premium leaks.

One could go through this grepping/awking/whatever-the-fu**ing to get valid emails out of this mess of data then potentially use verify commands via SMTP to check which emails are presently existing.

Then.. you have a valid email list ;) aye.

Not to be an a-hole or anything but, if your email is in any of this.. maybe time to just make a new email lol.
 
Last edited:

Asphyxia

Owner
Administrator
Apr 25, 2015
1,845
2
2,199
327
Dayum I almost forgot to drop the structure of the site setup:
/var/www/html is where the typical Index goes for the website, so backwards one directory (/var/www) I placed an emails.txt file with all the emails stacked.

Inside /var/www/html I have an index.php file:
Code:
<!DOCTYPE html>
<html lang="en">
<head>
  <title>Cit0day search</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
</head>
<body>

<div class="container">




<h1>Cit0day</h1>
<p>We make checking the Cit0day database simple for victims to check against the 20,000+ database breaches.</p>

<form action="index.php" method="post">
<input type="text" placeholder="Email" name="Email">
<input type="submit" value="Search">
</form>

<?php
if(isset($_POST['Email']))
{

$email = $_POST['Email'];

if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
//Continue, do not die
} else {
  die("Not a valid email address");
}

//LC_ALL=C grep -Fx -m1 "[email protected]" ./emails.txt

shell_exec("ls");
$output = shell_exec('LC_ALL=C grep -Fx -m1 "'.$email.'" ../emails.txt');
//echo "<pre>$output</pre>";
if(strlen($output) > 3)
{
echo "Your email address was found in the breach. <b>Change your passwords now.</b>";
}else{
echo "Your email address was not found in the premium breach. This tool is free and takes no liability, your security is your own responsibility.";
}
}
?>
</div>

</body>
</html>

I added a bit more sanity checking..

if | @ ' and other such characters, f that. Nah thanks.. no ' ' (space), no %.. skip skip..............


Code:
<?php
if(isset($_POST['Email']))
{

$email = $_POST['Email'];

if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
//Continue, do not die
} else {
  die("Not a valid email address");
}

if (strpos($email, '&') == true) {
        die("Fuck no");;
    }

if (strpos($email, '$') == true) {
                die("Fuck no");;
        }

if (strpos($email, ';') == true) {
                die("Fuck no");;
        }

if (strpos($email, '%') == true) {
                die("Fuck no");;
        }

if (strpos($email, ' ') == true) {
                die("Fuck no");;
        }

if (strpos($email, '<') == true) {
                die("Fuck no");;
        }

if (strpos($email, '>') == true) {
                die("Fuck no");;
        }

if (strpos($email, '|') == true) {
                die("Fuck no");;
        }

if (strpos($email, "'") == true) {
                die("Fuck no");;
        }

if (strpos($email, '"') == true) {
                die("Fuck no");;
        }

Also, inb4 "y u did not use || instead of all the separate functions.." dgaf I also did extra ;'s also dgaf. Tired, going 2 bed.

ALso from a computer science perspective, this could be optimized using more RAM-stuff, splitting/chunking out the data for example any email addresses starting with a[a-c].txt any emails starting with a[d-f].txt........... etc, then we only grep things according to logical rules. The SSD and possibly in-RAM loaded stuff (or even an SQL db).. this would suffice just fine in my opinion.

No sense in over-complicating this sh** if we just want an email checker.. yes/no on this email list. Either way, sorting through 2GB of text file data and kicking a search off.. that's fun! Feel free to contrib and tell us what you would do for optimization(s)
 
Last edited:
Top