1
0

tex: reformat stuff around ImportSchema

This commit is contained in:
surtur 2023-08-20 05:49:41 +02:00
parent 05cbde6ac5
commit 57e3ae4cd3
Signed by: wanderer
SSH Key Fingerprint: SHA256:MdCZyJ2sHLltrLBp0xQO0O1qTW9BT/xl5nXkDvhlMCI

@ -600,19 +600,19 @@ in its raw form anymore but has to have been morphed into the precise shape the
application needs for further processing. Once imported, the application can
query the data at will, as it knows exactly the shape of it.
This supposes the existence of a \emph{format} for importing, schema of which
is devised in Section~\ref{sec:localDatasetPlugin}.
This supposes the existence of a \emph{format} for importing, the schema of
which is devised in Section~\ref{sec:localDatasetPlugin}.
\n{3}{Local Dataset Plugin}\label{sec:localDatasetPlugin}
Unstructured breach data from locally available datasets can be imported into
the application by first making sure it adheres to the specified schema (have a
look at the \emph{Breach Data Schema} in Listing~\ref{breachDataGoSchema}). If
it does not (which is very likely with random breach data, as already mentioned
in Section~\ref{sec:dataSources}), it needs to be converted to a form that
\emph{does} before importing it to the application, e.g.\ using a Python script
or a similar method.
look at the breach \texttt{ImportSchema} in Listing~\ref{breachImportSchema}).
If it does not (which is very likely with random breach data, as already
mentioned in Section~\ref{sec:dataSources}), it needs to be converted to a form
that \emph{does} before importing it to the application, e.g.\ using a Python
script or a similar method.
Attempting to import data that does not follow the outlined schema should
result in an error. Equally so, importing a dataset which is over a reasonable
@ -622,15 +622,18 @@ out-of-memory (OOM) situation on the host running the application, assuming
contemporary consumer hardware conditions (not HPC).
\vspace{\parskip}
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go
struct with imports from the standard library assumed},
label=breachDataGoSchema,
\begin{lstlisting}[language=Go,
caption={Breach \texttt{ImportSchema} Go struct (imports from the standard
library assumed)},
label=breachImportSchema,
backgroundcolor=\color{lstbg},
morekeywords={any}
morekeywords={any,time}
]
type breachDataSchema struct {
// ImportSchema is the model for importing locally available breach data.
type ImportSchema struct {
Name string
Time time.Time
Description string
Date time.Time
IsVerified bool
ContainsPasswords bool
ContainsHashes bool
@ -639,18 +642,20 @@ morekeywords={any}
HashPepperred bool
ContainsUsernames bool
ContainsEmails bool
Data any
Data *Data
}
\end{lstlisting}
\vspace*{-\baselineskip}
The Go representation shown in Listing~\ref{breachDataGoSchema} will in
The Go \emph{struct} shown in Listing~\ref{breachImportSchema} will in
actuality translate to a YAML document written and supplied by an
administrative user of the program. The YAML format was chosen for several
administrative user of the program. And while the author is personally not the
greatest supporter of YAML, however, the format was still chosen for several
reasons:
\begin{itemize}
\item relative ease of use (plain text, readability)
\item relative ease of use (plain text, readability) for machines and people
alike
\item capability to store multiple \emph{documents} inside of a single file
\item most of the inputs being implicitly typed as strings
\item support for inclusion of comments
@ -663,7 +668,8 @@ and written by humans and programs alike.
\smallskip
\begin{lstlisting}[style=yaml,
caption={Example Breach Data Schema supplied to the program as a YAML file,
caption={A YAML file containing breach data formatted according to the
\texttt{ImportSchema},
optionally containing multiple documents},
label=breachDataYAMLSchema,
backgroundcolor=\color{lstbg},
@ -696,11 +702,14 @@ backgroundcolor=\color{lstbg},
\vspace*{-\baselineskip}
Notice how the emails list (\texttt{.data/emails}) in
Listing~\ref{breachDataYAMLSchema} misses one record, perhaps because it was
not supplied or mistakenly omitted. This is a valid scenario (mistakes happen)
and the application needs to be able to handle it. The alternative would be to
require the user to prepare the data in such a way that the empty/partial
records would be dropped entirely.
Listing~\ref{breachDataYAMLSchema} is missing one record, perhaps because it
was mistakenly omitted due to either machine error or unfamiliarity with the
format. This is a valid scenario (mistakes do happen) and the application needs
to be account for it. Alternatively, the program could start dropping
empty/partial records, but that behaviour could quickly lead to unhappy users.
The golden rule for the program is to \emph{always do the expected thing} (and
also not being overly smart about it, i.e.\ the simpler program flow is often
better).
\n{3}{Have I Been Pwned? Integration}