tex: reformat stuff around ImportSchema
This commit is contained in:
parent
05cbde6ac5
commit
57e3ae4cd3
@ -600,19 +600,19 @@ in its raw form anymore but has to have been morphed into the precise shape the
|
||||
application needs for further processing. Once imported, the application can
|
||||
query the data at will, as it knows exactly the shape of it.
|
||||
|
||||
This supposes the existence of a \emph{format} for importing, schema of which
|
||||
is devised in Section~\ref{sec:localDatasetPlugin}.
|
||||
This supposes the existence of a \emph{format} for importing, the schema of
|
||||
which is devised in Section~\ref{sec:localDatasetPlugin}.
|
||||
|
||||
|
||||
\n{3}{Local Dataset Plugin}\label{sec:localDatasetPlugin}
|
||||
|
||||
Unstructured breach data from locally available datasets can be imported into
|
||||
the application by first making sure it adheres to the specified schema (have a
|
||||
look at the \emph{Breach Data Schema} in Listing~\ref{breachDataGoSchema}). If
|
||||
it does not (which is very likely with random breach data, as already mentioned
|
||||
in Section~\ref{sec:dataSources}), it needs to be converted to a form that
|
||||
\emph{does} before importing it to the application, e.g.\ using a Python script
|
||||
or a similar method.
|
||||
look at the breach \texttt{ImportSchema} in Listing~\ref{breachImportSchema}).
|
||||
If it does not (which is very likely with random breach data, as already
|
||||
mentioned in Section~\ref{sec:dataSources}), it needs to be converted to a form
|
||||
that \emph{does} before importing it to the application, e.g.\ using a Python
|
||||
script or a similar method.
|
||||
|
||||
Attempting to import data that does not follow the outlined schema should
|
||||
result in an error. Equally so, importing a dataset which is over a reasonable
|
||||
@ -622,15 +622,18 @@ out-of-memory (OOM) situation on the host running the application, assuming
|
||||
contemporary consumer hardware conditions (not HPC).
|
||||
|
||||
\vspace{\parskip}
|
||||
\begin{lstlisting}[language=Go, caption={Breach Data Schema represented as a Go
|
||||
struct with imports from the standard library assumed},
|
||||
label=breachDataGoSchema,
|
||||
\begin{lstlisting}[language=Go,
|
||||
caption={Breach \texttt{ImportSchema} Go struct (imports from the standard
|
||||
library assumed)},
|
||||
label=breachImportSchema,
|
||||
backgroundcolor=\color{lstbg},
|
||||
morekeywords={any}
|
||||
morekeywords={any,time}
|
||||
]
|
||||
type breachDataSchema struct {
|
||||
// ImportSchema is the model for importing locally available breach data.
|
||||
type ImportSchema struct {
|
||||
Name string
|
||||
Time time.Time
|
||||
Description string
|
||||
Date time.Time
|
||||
IsVerified bool
|
||||
ContainsPasswords bool
|
||||
ContainsHashes bool
|
||||
@ -639,18 +642,20 @@ morekeywords={any}
|
||||
HashPepperred bool
|
||||
ContainsUsernames bool
|
||||
ContainsEmails bool
|
||||
Data any
|
||||
Data *Data
|
||||
}
|
||||
\end{lstlisting}
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
The Go representation shown in Listing~\ref{breachDataGoSchema} will in
|
||||
The Go \emph{struct} shown in Listing~\ref{breachImportSchema} will in
|
||||
actuality translate to a YAML document written and supplied by an
|
||||
administrative user of the program. The YAML format was chosen for several
|
||||
administrative user of the program. And while the author is personally not the
|
||||
greatest supporter of YAML, however, the format was still chosen for several
|
||||
reasons:
|
||||
|
||||
\begin{itemize}
|
||||
\item relative ease of use (plain text, readability)
|
||||
\item relative ease of use (plain text, readability) for machines and people
|
||||
alike
|
||||
\item capability to store multiple \emph{documents} inside of a single file
|
||||
\item most of the inputs being implicitly typed as strings
|
||||
\item support for inclusion of comments
|
||||
@ -663,7 +668,8 @@ and written by humans and programs alike.
|
||||
|
||||
\smallskip
|
||||
\begin{lstlisting}[style=yaml,
|
||||
caption={Example Breach Data Schema supplied to the program as a YAML file,
|
||||
caption={A YAML file containing breach data formatted according to the
|
||||
\texttt{ImportSchema},
|
||||
optionally containing multiple documents},
|
||||
label=breachDataYAMLSchema,
|
||||
backgroundcolor=\color{lstbg},
|
||||
@ -696,11 +702,14 @@ backgroundcolor=\color{lstbg},
|
||||
\vspace*{-\baselineskip}
|
||||
|
||||
Notice how the emails list (\texttt{.data/emails}) in
|
||||
Listing~\ref{breachDataYAMLSchema} misses one record, perhaps because it was
|
||||
not supplied or mistakenly omitted. This is a valid scenario (mistakes happen)
|
||||
and the application needs to be able to handle it. The alternative would be to
|
||||
require the user to prepare the data in such a way that the empty/partial
|
||||
records would be dropped entirely.
|
||||
Listing~\ref{breachDataYAMLSchema} is missing one record, perhaps because it
|
||||
was mistakenly omitted due to either machine error or unfamiliarity with the
|
||||
format. This is a valid scenario (mistakes do happen) and the application needs
|
||||
to be account for it. Alternatively, the program could start dropping
|
||||
empty/partial records, but that behaviour could quickly lead to unhappy users.
|
||||
The golden rule for the program is to \emph{always do the expected thing} (and
|
||||
also not being overly smart about it, i.e.\ the simpler program flow is often
|
||||
better).
|
||||
|
||||
\n{3}{Have I Been Pwned? Integration}
|
||||
|
||||
|
Reference in New Issue
Block a user