1
0
Fork 0

tex: reformat stuff around ImportSchema

This commit is contained in:
surtur 2023-08-20 05:49:41 +02:00
parent 05cbde6ac5
commit 57e3ae4cd3
Signed by: wanderer
SSH Key Fingerprint: SHA256:MdCZyJ2sHLltrLBp0xQO0O1qTW9BT/xl5nXkDvhlMCI

View File

@ -600,19 +600,19 @@ in its raw form anymore but has to have been morphed into the precise shape the
application needs for further processing. Once imported, the application can application needs for further processing. Once imported, the application can
query the data at will, as it knows exactly the shape of it. query the data at will, as it knows exactly the shape of it.
This supposes the existence of a \emph{format} for importing, schema of which This supposes the existence of a \emph{format} for importing, the schema of
is devised in Section~\ref{sec:localDatasetPlugin}. which is devised in Section~\ref{sec:localDatasetPlugin}.
\n{3}{Local Dataset Plugin}\label{sec:localDatasetPlugin} \n{3}{Local Dataset Plugin}\label{sec:localDatasetPlugin}
Unstructured breach data from locally available datasets can be imported into Unstructured breach data from locally available datasets can be imported into
the application by first making sure it adheres to the specified schema (have a the application by first making sure it adheres to the specified schema (have a
look at the \emph{Breach Data Schema} in Listing~\ref{breachDataGoSchema}). If look at the breach \texttt{ImportSchema} in Listing~\ref{breachImportSchema}).
it does not (which is very likely with random breach data, as already mentioned If it does not (which is very likely with random breach data, as already
in Section~\ref{sec:dataSources}), it needs to be converted to a form that mentioned in Section~\ref{sec:dataSources}), it needs to be converted to a form
\emph{does} before importing it to the application, e.g.\ using a Python script that \emph{does} before importing it to the application, e.g.\ using a Python
or a similar method. script or a similar method.
Attempting to import data that does not follow the outlined schema should Attempting to import data that does not follow the outlined schema should
result in an error. Equally so, importing a dataset which is over a reasonable result in an error. Equally so, importing a dataset which is over a reasonable
@ -622,15 +622,18 @@ out-of-memory (OOM) situation on the host running the application, assuming
contemporary consumer hardware conditions (not HPC). contemporary consumer hardware conditions (not HPC).
\vspace{\parskip} \vspace{\parskip}
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go \begin{lstlisting}[language=Go,
struct with imports from the standard library assumed}, caption={Breach \texttt{ImportSchema} Go struct (imports from the standard
label=breachDataGoSchema, library assumed)},
label=breachImportSchema,
backgroundcolor=\color{lstbg}, backgroundcolor=\color{lstbg},
morekeywords={any} morekeywords={any,time}
] ]
type breachDataSchema struct { // ImportSchema is the model for importing locally available breach data.
type ImportSchema struct {
Name string Name string
Time time.Time Description string
Date time.Time
IsVerified bool IsVerified bool
ContainsPasswords bool ContainsPasswords bool
ContainsHashes bool ContainsHashes bool
@ -639,18 +642,20 @@ morekeywords={any}
HashPepperred bool HashPepperred bool
ContainsUsernames bool ContainsUsernames bool
ContainsEmails bool ContainsEmails bool
Data any Data *Data
} }
\end{lstlisting} \end{lstlisting}
\vspace*{-\baselineskip} \vspace*{-\baselineskip}
The Go representation shown in Listing~\ref{breachDataGoSchema} will in The Go \emph{struct} shown in Listing~\ref{breachImportSchema} will in
actuality translate to a YAML document written and supplied by an actuality translate to a YAML document written and supplied by an
administrative user of the program. The YAML format was chosen for several administrative user of the program. And while the author is personally not the
greatest supporter of YAML, however, the format was still chosen for several
reasons: reasons:
\begin{itemize} \begin{itemize}
\item relative ease of use (plain text, readability) \item relative ease of use (plain text, readability) for machines and people
alike
\item capability to store multiple \emph{documents} inside of a single file \item capability to store multiple \emph{documents} inside of a single file
\item most of the inputs being implicitly typed as strings \item most of the inputs being implicitly typed as strings
\item support for inclusion of comments \item support for inclusion of comments
@ -663,7 +668,8 @@ and written by humans and programs alike.
\smallskip \smallskip
\begin{lstlisting}[style=yaml, \begin{lstlisting}[style=yaml,
caption={Example Breach Data Schema supplied to the program as a YAML file, caption={A YAML file containing breach data formatted according to the
\texttt{ImportSchema},
optionally containing multiple documents}, optionally containing multiple documents},
label=breachDataYAMLSchema, label=breachDataYAMLSchema,
backgroundcolor=\color{lstbg}, backgroundcolor=\color{lstbg},
@ -696,11 +702,14 @@ backgroundcolor=\color{lstbg},
\vspace*{-\baselineskip} \vspace*{-\baselineskip}
Notice how the emails list (\texttt{.data/emails}) in Notice how the emails list (\texttt{.data/emails}) in
Listing~\ref{breachDataYAMLSchema} misses one record, perhaps because it was Listing~\ref{breachDataYAMLSchema} is missing one record, perhaps because it
not supplied or mistakenly omitted. This is a valid scenario (mistakes happen) was mistakenly omitted due to either machine error or unfamiliarity with the
and the application needs to be able to handle it. The alternative would be to format. This is a valid scenario (mistakes do happen) and the application needs
require the user to prepare the data in such a way that the empty/partial to be account for it. Alternatively, the program could start dropping
records would be dropped entirely. empty/partial records, but that behaviour could quickly lead to unhappy users.
The golden rule for the program is to \emph{always do the expected thing} (and
also not being overly smart about it, i.e.\ the simpler program flow is often
better).
\n{3}{Have I Been Pwned? Integration} \n{3}{Have I Been Pwned? Integration}