L4 to Logical English transpiler

How to write L4 that translates to Logical English (LE); or, how L4 gets translated to LE

The following explains how the L4->LE transpiler translates L4 to LE with examples. If you are in a rush, ignore the explanations and just skim the examples.

This is a more intuitive, high-level discussion that’s aimed at someone who wants to understand how L4 gets translated to LE, so that they can more effectively formally model law using the Logical-English-y fragment of L4. It is not meant to be a rigorous specification of that fragment; for that, see Denotational semantics of L4 constitutive rules and predicates. Finally, this discussion does not discuss the implementation of the L4->LE transpiler in the Natural L4 Haskell codebase (though it does explain how the most important constructs get translated).

Although the following discussion does not assume prior knowledge of Logical English, it does assume some understanding of the generic L4 syntax and concepts, as well as of basic logic programming / Prolog concepts. (You’ll probably get something out of this even if you are new to logic programming, but you shouldn’t expect to understand everything.)

Simple Horn clauses

Let’s start with a L4 file with just one constitutive rule (or Horn clause).

GIVEN

x

IS A

Animal

DECIDE

x

is an aquatic animal

IF

x

lives in water

This simple L4 encoding gets translated to a .le file that has this sort of structure:

le
the target language is: prolog.

the templates are:
    *a x* is an aquatic animal,
    *a x* lives in water.

% Predefined stdlib for translating natural4 -> LE.
the knowledge base lib includes:
    <stuff that we won't repeat here for the sake of brevity>

the knowledge base encoding includes:
    a x is an aquatic animal
    if x lives in water

The structure of the .le output

Before discussing the translation of the Horn clause in detail, it’s worth briefly surveying the structure of the .le output.

The indented block below “the templates are:”
  • consists of the templates or natural language annotations

    for the Logical English output. These templates declare to the downstream LE engine what the predicates that will be used in the LE program are.

The indented stuff below “the knowledge base encoding includes:”
  • consists of the Logical English rules and facts.

The declaration “the target language is: prolog.”
  • tells the Logical English compiler that this should subsequently be transformed in turn into Prolog.

Logical English is, in this way, used merely as a wrapper around Prolog.

The LE templates and parameters or argument places in the predicates

Now that we’ve seen what the high level structure of the .le output looks like, let’s look at the LE translation of the constitutive rule in more detail. There are two things to explain about the translation: (i) the templates or natural language annotations and (ii) the LE version of the rule itself.

Let’s begin with the templates. The templates declare to the LE compiler what the predicates of the rules and facts of this LE program will be, and in particular, via asterisks, what the argument places of these predicates are.

For example, in the templates that were generated by the L4->LE compiler for the aquatic-animal example

le
the templates are:
    *a x* is an aquatic animal,
    *a x* lives in water.

the template *a x* lives in water corresponds to the one-argument-place Prolog predicate lives_in_water. And we know there’s only one argument place, because there’s only one pair of asterisks in that template. (If you are new to Prolog, you can think of an argument place as something that can be substituted by either a variable or a constant. This isn’t the most general formulation, but it’s good enough for our purposes.)

Now, you might wonder: what is it in the L4 that indicates that this should be a predicate with just one argument place?

The answer is, it has to do with the GIVEN L4 keyword. Whenever you want to declare that some string is a variable in your L4 constitutive rule, you have to (i) declare it as a GIVEN variable and (ii) when writing the rule, have that string be in its own cell in the spreadsheet.

So, e.g., if you want to write the equivalent of this Prolog

prolog
grandparent(X, Z) :- parent(X, Y ), parent(Y, Z)

you should write in L4

GIVEN

x

y

z

DECIDE

x

is grandparent of

z

IF

x

is parent of

y

AND

y

is parent of

z

This will get transpiled to this LE rule

le
a x is grandparent of a z
if x is parent of a y
and y is parent of z.

and, when the LE compiler is subsequently invoked, to the equivalent Prolog.

Exercise for the reader: what would the corresponding LE template(s) look like?

Aside: a potential gotcha to note about the GIVEN variables

Important: You want to make sure that the GIVEN variables are in their own cells, and that the thing that’s declared as a GIVEN is exactly the same as the thing that’s used in the rule itself.

For example, if what is in the cell is x is rather than just x, as in

GIVEN

x

y

z

DECIDE

x is

grandparent of

z

IF

x

is parent of

y

AND

y

is parent of

z

then that will not get transpiled to the intended LE.

How the L4 rules get translated to LE rules

Now that we’ve seen what LE templates do and how they get generated from the L4, let’s look at the LE rules.

Recall that the L4 aquatic animal example

GIVEN

x

IS A

Animal

DECIDE

x

is an aquatic animal

IF

x

lives in water

was translated to the following LE rule:

le
the knowledge base encoding includes:
    a x is an aquatic animal
    if x lives in water.

How does the L4->LE transpiler translate simple L4 constitutive rules to LE rules? As the aforegoing examples demonstrate, it, among other things,

  • drops L4-specific keywords like DECIDE

  • for every term t that (i) is declared in the L4 as a GIVEN variable and (ii) that is put in a cell of its own in the L4 rule (c.f. x in the aquatic animal example), it adds an a prefix to t the first time that t appears in the rule.

The latter might seem mysterious: why do we have to prefix such terms with a in the LE? That’s because the LE compiler needs to know, when an argument place or variable indicator in a template has been substituted with a term, whether the substituting term is a variable or something else (e.g. a constant, or a non-constant expression, or a compound term). And the way that a variable gets marked as such to the LE compiler in an LE rule, is via being prefixed with a the first time it occurs in the rule.

And yes, this is yet another reason why you want to be careful that, e.g., the thing that’s declared as a GIVEN is exactly the same as the thing that’s used in the rule itself. That is, this sort of thing affects not only the generation of the LE templates by the L4->LE transpiler, but also the generation of the LE rules.

The other things you need to get Boolean Prolog compound terms

We’ve seen a few basic examples of constitutive rules, including one with AND (the grandparent example). Let’s talk now about the other key things you need to know to model law with basic clausal logic; namely, OR, indentation, and negation as failure / weak negation.

What if you wanted to model the following, more complicated rule?

default
A data breach with a organization harms an individual
if (i) it exposed data from the individual
and (ii) it either relates to the name of the individual
          or to an account the individual had with the organization

There are various ways to formally model this, but let’s suppose you wanted to treat data breach, organization, and individual as variables.

You can encode this in L4, for LE (and by extension Prolog), with

GIVEN

data breach

IS A

Data Breach

organization

IS A

Organization

individual

IS A

Person

DECIDE

data breach

with

organization

harms

individual

IF

data breach

with

organization

exposed data from

individual

AND

data breach

with

organization

related to the name of

individual

OR

data breach

with

organization

relates to an account

individual

had with

organization

This example also demonstrates how indentation in L4 matters: that’s how we make it clear that this has the form (p if q and (r or s)) as opposed to the form (p if (q and r) or s).


Exercise for the reader: to what extent are the indentation rules in L4 and LE the same? Try experimenting with examples!

Negation as failure also works the way you might expect:

GIVEN

person

IS A

Person

DECIDE

person

qualifies for this country’s benefits

IF

person

is citizen

AND

NOT

person

is citizen of any other country

gets transpiled into this LE rule

le
a person qualifies for this country's benefits
if person is citizen
and it is not the case that
    person is citizen of any other country.

Exercise for the reader: what would the corresponding LE template(s) look like?

Working with dates when transpiling to LE (in broad brush strokes)

You’ll want to be able to work with dates in a ‘first-class’ way, when modelling contracts and legislation. Fortunately, the L4->LE transpiler allows you to write L4 constitutive rules that involve dates. For example, suppose that you’re administering a grant with an application deadline of 2023-10-30:

GIVEN

date of application

DECIDE

you do not qualify for our fabulous grant

IF

date of application

is after

2023-10-30

(Note that dates must be in YY-MM-DD format.)

This gets transformed to this Logical English rule

le
you do not qualify for our fabulous grant
if a date of application is after 2023-10-30.

before being handled in turn by Joe Watt’s date-related Logical English predicates (see our fork of Logical English) and Prolog date library.

We just discussed after, but there’s also within and before. You can also ask whether a date is a certain number of days or weeks or months before/after/within some other date; for more information on those predicates, or on how the date-related functionality works, see Denotational semantics of L4 constitutive rules and predicates.

An interlude on IS and “is”

IS as in term equality

We often want to be able to check if some term is really some other term. For example, how would you encode in the LE fragment of L4 that an income source is taxable if the income source is profits or is investment dividends?

For this, you would use the uppercase IS L4 keyword:

GIVEN

income source

DECIDE

income source

is taxable

IF

income source

IS

profits

OR

income source

IS

investment dividends

This example, though short, is subtle.

The most important thing to note is that you have to use IS and not the lowercase is when checking if income source is the same term as profits or investment dividends, and you have to put the IS in its own cell.

By contrast, in “DECIDE income source is taxable,” we want to stick with the lowercase “is”, since what we are really saying there is the Prolog is_taxable(X) — the “is” there is part of the predicate, and not a term equality operator. (And to make things clearer, though this is not required by the transpiler, when the “is” is really part of the predicate, we should group the “is” in the same cell as the rest of the predicate rather than by itself, as in the example above.)

t_1 IS NOT t_2

Relatedly, you might want to check that some term is not some other term. You can do this with IS NOT, where the IS and NOT must be broken up into separate cells (that are next to each other).

t_1 IS IN t_2

This gets transpiled to Logical English’s t_1 is in t_2, and thence to the Prolog member(t_1, t_2).

Doing arithmetic in L4, with LE as the target

Again, let’s approach this with examples. Suppose you’re trying to operationalize the following bank regulations regarding an upcoming higher-than-usual interest rates promotion:

If a customer has stashed at least $3,000 with us across their current and savings accounts — i.e., if what we might call the total balances, or the sum of what they have in savings and in their current account, is at least $3,000 — then they qualify for this interest-rates promotion.

How would you express this in the LE fragment of L4?

Here’s one approach:

GIVEN

customer

funds in current account

savings

total balances

DECIDE

customer

qualifies for higher interest rate promotion

IF

customer’s

curr acc funds

is

funds in current account

AND

customer’s

savings acc funds

is

savings

AND

total balances

IS

SUM

funds in current account

savings

AND

total balances

>=

3000

This gets transpiled to this LE rule

le
a customer qualifies for higher interest rate promotion
if customer's curr acc funds is a funds in current account
and customer's savings acc funds is a savings
and a total balances is the sum of [funds in current account, savings]
and total balances >= 3000.

You can test that this does what we expect, with the VSCode Logical English extension, by adding the following to the outputted .le file and querying with the query q and scenario test.

le
scenario test is:
  alice's curr acc funds is 5000.
  alice's savings acc funds is 10.
  bob's curr acc funds is 40.
  bob's savings acc funds is 100.

query q is:
  which customer qualifies for higher interest rate promotion.

There are a couple of things to note about this example.

First, the X's F is value pattern corresponds conceptually to entity-attribute-value triples. This pattern is a convenient way to to work with classes or data types in the Logical-English and JSON-Schema fragments of L4; see Web form for more details.

Second, the key bit of syntax you need for summing up things is, well, SUM (these keywords will tend to be capitalized in L4). As the example shows, SUM takes arguments vertically in L4. Note, as the transpiled output suggests, that it can take an arbitrary number of arguments — it’s not limited to two arguments.

Arithmetic relations for comparing two arithmetic values

Finally, the L4->LE transpiler supports the following arithmetic relations for comparing arithmetic values:

  • <

  • >

  • <=

  • >=

Exercises

  1. How would you write or model in L4: “The minimum monthly payment for the credit card is 3% of current balance or $50, whichever is higher, plus any overdue amounts” (adapted from https://www.uob.com.sg/assets/pdfs/gen_info_cards.pdf)

  2. The transpiled LE code does not seem to be giving you the results you expect. When looking at the templates, you see that the generated templates include ones of the form **a t_1** is ... t_2. What might the issue(s) be?