L4 to Logical English transpiler¶

How to write L4 that translates to Logical English (LE); or, how L4 gets translated to LE¶

The following explains how the L4->LE transpiler translates L4 to LE with examples. If you are in a rush, ignore the explanations and just skim the examples.

This is a more intuitive, high-level discussion that’s aimed at someone who wants to understand how L4 gets translated to LE, so that they can more effectively formally model law using the Logical-English-y fragment of L4. It is not meant to be a rigorous specification of that fragment; for that, see Denotational semantics of L4 constitutive rules and predicates. Finally, this discussion does not discuss the implementation of the L4->LE transpiler in the Natural L4 Haskell codebase (though it does explain how the most important constructs get translated).

Although the following discussion does not assume prior knowledge of Logical English, it does assume some understanding of the generic L4 syntax and concepts, as well as of basic logic programming / Prolog concepts. (You’ll probably get something out of this even if you are new to logic programming, but you shouldn’t expect to understand everything.)

Simple Horn clauses¶

Let’s start with a L4 file with just one constitutive rule (or Horn clause).

GIVEN	x	IS A	Animal
DECIDE	x	is an aquatic animal
IF	x	lives in water

This simple L4 encoding gets translated to a .le file that has this sort of structure:

the target language is: prolog.

the templates are:
    *a x* is an aquatic animal,
    *a x* lives in water.

% Predefined stdlib for translating natural4 -> LE.
the knowledge base lib includes:
    <stuff that we won't repeat here for the sake of brevity>

the knowledge base encoding includes:
    a x is an aquatic animal
    if x lives in water

The structure of the .le output¶

Before discussing the translation of the Horn clause in detail, it’s worth briefly surveying the structure of the .le output.

The indented block below “the templates are:”

consists of the templates or natural language annotations
for the Logical English output. These templates declare to the downstream LE engine what the predicates that will be used in the LE program are.

The indented stuff below “the knowledge base encoding includes:”

consists of the Logical English rules and facts.

The declaration “the target language is: prolog.”

tells the Logical English compiler that this should subsequently be transformed in turn into Prolog.

Logical English is, in this way, used merely as a wrapper around Prolog.

The LE templates and parameters or argument places in the predicates¶

Now that we’ve seen what the high level structure of the .le output looks like, let’s look at the LE translation of the constitutive rule in more detail. There are two things to explain about the translation: (i) the templates or natural language annotations and (ii) the LE version of the rule itself.

Let’s begin with the templates. The templates declare to the LE compiler what the predicates of the rules and facts of this LE program will be, and in particular, via asterisks, what the argument places of these predicates are.

For example, in the templates that were generated by the L4->LE compiler for the aquatic-animal example

the templates are:
    *a x* is an aquatic animal,
    *a x* lives in water.

the template *a x* lives in water corresponds to the one-argument-place Prolog predicate lives_in_water. And we know there’s only one argument place, because there’s only one pair of asterisks in that template. (If you are new to Prolog, you can think of an argument place as something that can be substituted by either a variable or a constant. This isn’t the most general formulation, but it’s good enough for our purposes.)

Now, you might wonder: what is it in the L4 that indicates that this should be a predicate with just one argument place?

The answer is, it has to do with the GIVEN L4 keyword. Whenever you want to declare that some string is a variable in your L4 constitutive rule, you have to (i) declare it as a GIVEN variable and (ii) when writing the rule, have that string be in its own cell in the spreadsheet.

So, e.g., if you want to write the equivalent of this Prolog

prolog

grandparent(X, Z) :- parent(X, Y ), parent(Y, Z)

you should write in L4

GIVEN	x
	y
	z
DECIDE	x	is grandparent of	z
IF	x	is parent of	y
AND	y	is parent of	z

This will get transpiled to this LE rule

a x is grandparent of a z
if x is parent of a y
and y is parent of z.

and, when the LE compiler is subsequently invoked, to the equivalent Prolog.

Exercise for the reader: what would the corresponding LE template(s) look like?

Aside: a potential gotcha to note about the `GIVEN` variables¶

Important: You want to make sure that the GIVEN variables are in their own cells, and that the thing that’s declared as a GIVEN is exactly the same as the thing that’s used in the rule itself.

For example, if what is in the cell is x is rather than just x, as in

GIVEN	x
	y
	z
DECIDE	x is	grandparent of	z
IF	x	is parent of	y
AND	y	is parent of	z

then that will not get transpiled to the intended LE.

How the L4 rules get translated to LE rules¶

Now that we’ve seen what LE templates do and how they get generated from the L4, let’s look at the LE rules.

Recall that the L4 aquatic animal example

GIVEN	x	IS A	Animal
DECIDE	x	is an aquatic animal
IF	x	lives in water

was translated to the following LE rule:

the knowledge base encoding includes:
    a x is an aquatic animal
    if x lives in water.

How does the L4->LE transpiler translate simple L4 constitutive rules to LE rules? As the aforegoing examples demonstrate, it, among other things,

drops L4-specific keywords like DECIDE
for every term t that (i) is declared in the L4 as a GIVEN variable and (ii) that is put in a cell of its own in the L4 rule (c.f. x in the aquatic animal example), it adds an a prefix to t the first time that t appears in the rule.

The latter might seem mysterious: why do we have to prefix such terms with a in the LE? That’s because the LE compiler needs to know, when an argument place or variable indicator in a template has been substituted with a term, whether the substituting term is a variable or something else (e.g. a constant, or a non-constant expression, or a compound term). And the way that a variable gets marked as such to the LE compiler in an LE rule, is via being prefixed with a the first time it occurs in the rule.

And yes, this is yet another reason why you want to be careful that, e.g., the thing that’s declared as a GIVEN is exactly the same as the thing that’s used in the rule itself. That is, this sort of thing affects not only the generation of the LE templates by the L4->LE transpiler, but also the generation of the LE rules.

The other things you need to get Boolean Prolog compound terms¶

We’ve seen a few basic examples of constitutive rules, including one with AND (the grandparent example). Let’s talk now about the other key things you need to know to model law with basic clausal logic; namely, OR, indentation, and negation as failure / weak negation.

What if you wanted to model the following, more complicated rule?

default

A data breach with a organization harms an individual
if (i) it exposed data from the individual
and (ii) it either relates to the name of the individual
          or to an account the individual had with the organization

There are various ways to formally model this, but let’s suppose you wanted to treat data breach, organization, and individual as variables.

You can encode this in L4, for LE (and by extension Prolog), with

GIVEN	data breach		IS A	Data Breach
	organization		IS A	Organization
	individual		IS A	Person
DECIDE	data breach	with	organization	harms	individual
IF	data breach	with	organization	exposed data from	individual
AND	data breach	with	organization	related to the name of	individual
	OR	data breach	with	organization	relates to an account	individual	had with	organization

This example also demonstrates how indentation in L4 matters: that’s how we make it clear that this has the form (p if q and (r or s)) as opposed to the form (p if (q and r) or s).

Exercise for the reader: to what extent are the indentation rules in L4 and LE the same? Try experimenting with examples!

Negation as failure also works the way you might expect:

GIVEN	person	IS A	Person
DECIDE	person	qualifies for this country’s benefits
IF	person	is citizen
AND	NOT	person	is citizen of any other country

gets transpiled into this LE rule

a person qualifies for this country's benefits
if person is citizen
and it is not the case that
    person is citizen of any other country.

Exercise for the reader: what would the corresponding LE template(s) look like?

Working with dates when transpiling to LE (in broad brush strokes)¶

You’ll want to be able to work with dates in a ‘first-class’ way, when modelling contracts and legislation. Fortunately, the L4->LE transpiler allows you to write L4 constitutive rules that involve dates. For example, suppose that you’re administering a grant with an application deadline of 2023-10-30:

GIVEN	date of application
DECIDE	you do not qualify for our fabulous grant
IF	date of application	is after	2023-10-30

(Note that dates must be in YY-MM-DD format.)

This gets transformed to this Logical English rule

you do not qualify for our fabulous grant
if a date of application is after 2023-10-30.

before being handled in turn by Joe Watt’s date-related Logical English predicates (see our fork of Logical English) and Prolog date library.

We just discussed after, but there’s also within and before. You can also ask whether a date is a certain number of days or weeks or months before/after/within some other date; for more information on those predicates, or on how the date-related functionality works, see Denotational semantics of L4 constitutive rules and predicates.

An interlude on `IS` and “is”¶

IS as in term equality¶

We often want to be able to check if some term is really some other term. For example, how would you encode in the LE fragment of L4 that an income source is taxable if the income source is profits or is investment dividends?

For this, you would use the uppercase IS L4 keyword:

GIVEN	income source
DECIDE	income source	is taxable
IF	income source	IS	profits
OR	income source	IS	investment dividends

This example, though short, is subtle.

The most important thing to note is that you have to use IS and not the lowercase is when checking if income source is the same term as profits or investment dividends, and you have to put the IS in its own cell.

By contrast, in “DECIDE income source is taxable,” we want to stick with the lowercase “is”, since what we are really saying there is the Prolog is_taxable(X) — the “is” there is part of the predicate, and not a term equality operator. (And to make things clearer, though this is not required by the transpiler, when the “is” is really part of the predicate, we should group the “is” in the same cell as the rest of the predicate rather than by itself, as in the example above.)

`t_1 IS NOT t_2`¶

Relatedly, you might want to check that some term is not some other term. You can do this with IS NOT, where the IS and NOT must be broken up into separate cells (that are next to each other).

`t_1 IS IN t_2`¶

This gets transpiled to Logical English’s t_1 is in t_2, and thence to the Prolog member(t_1, t_2).

Doing arithmetic in L4, with LE as the target¶

Again, let’s approach this with examples. Suppose you’re trying to operationalize the following bank regulations regarding an upcoming higher-than-usual interest rates promotion:

If a customer has stashed at least $3,000 with us across their current and savings accounts — i.e., if what we might call the total balances, or the sum of what they have in savings and in their current account, is at least $3,000 — then they qualify for this interest-rates promotion.

How would you express this in the LE fragment of L4?

Here’s one approach:

GIVEN	customer
	funds in current account
	savings
	total balances
DECIDE	customer	qualifies for higher interest rate promotion
IF	customer’s	curr acc funds	is	funds in current account
AND	customer’s	savings acc funds	is	savings
AND	total balances	IS	SUM	funds in current account
				savings
AND	total balances	>=	3000

This gets transpiled to this LE rule

a customer qualifies for higher interest rate promotion
if customer's curr acc funds is a funds in current account
and customer's savings acc funds is a savings
and a total balances is the sum of [funds in current account, savings]
and total balances >= 3000.

You can test that this does what we expect, with the VSCode Logical English extension, by adding the following to the outputted .le file and querying with the query q and scenario test.

scenario test is:
  alice's curr acc funds is 5000.
  alice's savings acc funds is 10.
  bob's curr acc funds is 40.
  bob's savings acc funds is 100.

query q is:
  which customer qualifies for higher interest rate promotion.

There are a couple of things to note about this example.

First, the X's F is value pattern corresponds conceptually to entity-attribute-value triples. This pattern is a convenient way to to work with classes or data types in the Logical-English and JSON-Schema fragments of L4; see Web form for more details.

Second, the key bit of syntax you need for summing up things is, well, SUM (these keywords will tend to be capitalized in L4). As the example shows, SUM takes arguments vertically in L4. Note, as the transpiled output suggests, that it can take an arbitrary number of arguments — it’s not limited to two arguments.

Other arithmetic-related predicates¶

Other arithmetic-related predicates include:

t IS MAX t_1 t_2 ... t_n
t IS MIN t_1 t_2 ... t_n
t IS PRODUCT t_1 t_2 ... t_n

The syntactic transformations from L4 to LE for these predicates are similar to what we saw with t IS SUM t_1 t_2 ... t_n.

There are also - t IS MAX x where φ(x) - t IS MAX x where φ(x) - t IS SUM x where φ(x)

These get translated somewhat differently. For example, consider a scenario where you want to sum across all the taxable income that a person has earned from various income sources; suppose for simplicity that an income source is taxable if it’s profits or investment dividends.

sum_where_phi¶
GIVEN	income source
	person
	amount
DECIDE	person	earned taxable	amount	from	income source
IF	person	earned	amount	from	income source
AND	income source	is taxable
;;

GIVEN	income source
DECIDE	income source	is taxable
IF	income source	IS	profits
OR	income source	IS	investment dividends
;;

GIVEN	total taxable income
	person
	amount
	income source
DECIDE	person’s	total taxable income	is	amount
IF	amount	IS	SUM	x	where	person	earned taxable	x	from	income source

gets transpiled to these LE rules:

the knowledge base rules includes:
a person earned taxable a amount from a income source
if person earned amount from income source
and income source is taxable.

a income source is taxable
if income source is profits
or income source is investment dividends.

a person's a total taxable income is a amount
if amount is the sum of each x such that
    person earned taxable x from a income source.

You can check that this does what you might expect, with the following LE query and scenario:

scenario test is:
    alice earned 500 from profits.
    alice earned 700 from investment dividends.
    alice earned 212 from non-taxable source.

query q is:
    alice's total taxable income is which amount.

Arithmetic relations for comparing two arithmetic values¶

Finally, the L4->LE transpiler supports the following arithmetic relations for comparing arithmetic values:

<
>
<=
>=

Exercises¶

How would you write or model in L4: “The minimum monthly payment for the credit card is 3% of current balance or $50, whichever is higher, plus any overdue amounts” (adapted from https://www.uob.com.sg/assets/pdfs/gen_info_cards.pdf)
The transpiled LE code does not seem to be giving you the results you expect. When looking at the templates, you see that the generated templates include ones of the form **a t_1** is ... t_2. What might the issue(s) be?