Eric Way's Personal Site

I Write $\sin(x)$ Not Tragedies

Solving SQL Murder Mystery With Logica

2021-06-30 Coding

Logica is a logic language for data manipulation developed by Google. It can be compiled into several SQL dialects, including BigQuery, SQLite and PostgreSQL. SQL, as a programming language, has several design flaws and suffers from bad readability especially when it comes to complex data queries. Logica has been aimed at mitigating those problems with SQL, and this post attempts to show that it does a quite good job in enhancing code readability and reusability, taking the SQL Murder Mystery as an example.

For each step, a series of Logica predicates I write, the SQL query the last Logica predicate is compiled to, and the tabular query results will be given. The first query simply retrieves relevant crime scene reports.

1
2
3
4
T(..r) :- 
`crime_scene_report`(..r),
r.type == "murder",
r.city == "SQL City";
1
2
3
4
5
6
7
SELECT
crime_scene_report.*
FROM
`crime_scene_report` AS crime_scene_report
WHERE
crime_scene_report.type = "murder" AND
crime_scene_report.city = "SQL City";
date type description city
20180215 murder REDACTED REDACTED REDACTED SQL City
20180215 murder Someone killed the guard! He took an arrow to the knee! SQL City
20180115 murder Security footage shows that there were 2 witnesses. The first witness lives at the last house on “Northwestern Dr”. The second witness, named Annabel, lives somewhere on “Franklin Ave”. SQL City

Only the last row seems to offer us useful information. We first need to ascertain the two witnesses mentioned. A combine clause is used to compute the maximum address number on “Northwestern Dr”. Notice that giving a field an alias of a constant expression can concisely restrict the value of the field. Also notice that A LIKE B expression in SQL is written as Like(A, B) in Logica. Predicate A includes the information of the witnesses, which is thereafter joined with the interview table to retrieve the transcripts. When joining tables, joining keys does not have to be specified explicitly as in SQL, only required to have the same alias.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Person(..r) :- `person`(..r);
A(..r) :-
Person(..r),
(
(r.address_number == (combine Max= num :-
Person(
address_number:num,
address_street_name:"Northwestern Dr"
)
),
r.address_street_name == "Northwestern Dr")

(Like(r.name, "%Annabel%"),
r.address_street_name == "Franklin Ave")
);

B(id:, name:, transcript:) :-
A(id:, name:),
`interview`(person_id:id, transcript:);

The complied SQL query is obviously more verbose, though it is very much the way how a human would write it. The subquery in place of the combine clause is harder to read, and when using UNION ALL we need another FROM clause which is repetitive. The WITH structure to a certain extent improves readability, but not as much as the use of separate, sequential predicates in Logica.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
WITH t_0_A AS (SELECT * FROM (

SELECT
person.*
FROM
`person` AS person
WHERE
person.address_number = (SELECT
MAX(t_2_person.address_number) AS logica_value
FROM
`person` AS t_2_person
WHERE
"Northwestern Dr" = t_2_person.address_street_name) AND
person.address_street_name = "Northwestern Dr"
UNION ALL

SELECT
t_4_person.*
FROM
`person` AS t_4_person
WHERE
(t_4_person.name LIKE "%Annabel%") AND
t_4_person.address_street_name = "Franklin Ave"

) AS UNUSED_TABLE_NAME )
SELECT
A.id AS id,
A.name AS name,
interview.transcript AS transcript
FROM
t_0_A AS A, `interview` AS interview
WHERE
interview.person_id = A.id;
id name transcript
14887 Morty Schapiro I heard a gunshot and then saw a man run out. He had a “Get Fit Now Gym” bag. The membership number on the bag started with “48Z”. Only gold members have those bags. The man got into a car with a plate that included “H42W”.
16371 Annabel Miller I saw the murder happen, and I recognized the killer from my gym when I was working out last week on January the 9th.

It turns out that we only need to use the first interview to find out the murderer. Here predicate A pins down the clues regarding the gym and B the car plate, both of which are joined to give the answer. Notice that the complied SQL query joins three tables directly, which can be circumvented using a @With predicate as shown in later steps.

1
2
3
4
5
6
7
8
9
10
A(..r) :- `get_fit_now_member`(..r),
r.membership_status == "gold",
Like(r.id, "48Z%");

B(id:, plate_number:, license_id:) :-
`drivers_license`(plate_number:, id:license_id),
`person`(id:, license_id:),
Like(plate_number, "%H42W%");

C(id:, name:) :- A(person_id:id, name:), B(id:);
1
2
3
4
5
6
7
8
9
10
11
SELECT
get_fit_now_member.person_id AS id,
get_fit_now_member.name AS name
FROM
`get_fit_now_member` AS get_fit_now_member, `drivers_license` AS drivers_license, `person` AS person
WHERE
get_fit_now_member.membership_status = "gold" AND
(get_fit_now_member.id LIKE "48Z%") AND
(drivers_license.plate_number LIKE "%H42W%") AND
person.id = get_fit_now_member.person_id AND
person.license_id = drivers_license.id;
id name
67318 Jeremy Bowers
value
Congrats, you found the murderer! But wait, there’s more… If you think you’re up for a challenge, try querying the interview transcript of the murderer to find the real villain behind this crime. If you feel especially confident in your SQL skills, try to complete this final step with no more than 2 queries. Use this same INSERT statement with your new suspect to check your answer.

A simple query to retrieve the interview of the murderer.

1
2
A(..r) :- `interview`(..r),
r.person_id == 67318;
1
2
3
4
5
6
SELECT
interview.*
FROM
`interview` AS interview
WHERE
interview.person_id = 67318;
person_id transcript
67318 I was hired by a woman with a lot of money. I don’t know her name but I know she’s around 5’5″ (65″) or 5’7″ (67″). She has red hair and she drives a Tesla Model S. I know that she attended the SQL Symphony Concert 3 times in December 2017.

Now predicate A is regarding the event and B driving license, and C joins them together with person. In A, an aggregation is used to count the number of times a person is checked in at the given event name during Dec 2017. (The dates are given in integers in the database.) I leave the “3 times” constraint to the joining predicate in a concise way. Notice that we can explicitly demand that Logica compile the subqueries individually without mixing them up in a large join query using a @With predicate for better efficiency in execution (though it doesn’t make much difference here).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@With(A);
A(person_id:, count? += 1) distinct :-
`facebook_event_checkin`(
event_name:"SQL Symphony Concert",
person_id:,
date:
),
date >= 20171201,
date <= 20171231;

@With(B);
B(..r) :- `drivers_license`(..r),
r.height >= 65, r.height <= 67,
r.hair_color == "red",
r.gender == "female",
r.car_make == "Tesla",
r.car_model == "Model S";

C(id:, name:) :-
A(person_id:id, count:3),
B(id:license_id),
`person`(id:, license_id:, name:);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
WITH t_0_A AS (SELECT
facebook_event_checkin.person_id AS person_id,
SUM(1) AS count
FROM
`facebook_event_checkin` AS facebook_event_checkin
WHERE
(facebook_event_checkin.date >= 20171201) AND
(facebook_event_checkin.date <= 20171231) AND
facebook_event_checkin.event_name = "SQL Symphony Concert"
GROUP BY person_id),
t_1_B AS (SELECT
drivers_license.*
FROM
`drivers_license` AS drivers_license
WHERE
(drivers_license.height >= 65) AND
(drivers_license.height <= 67) AND
drivers_license.hair_color = "red" AND
drivers_license.gender = "female" AND
drivers_license.car_make = "Tesla" AND
drivers_license.car_model = "Model S")
SELECT
A.person_id AS id,
person.name AS name
FROM
t_0_A AS A, t_1_B AS B, `person` AS person
WHERE
A.count = 3 AND
person.id = A.person_id AND
person.license_id = B.id;
id name
99716 Miranda Priestly
value
Congrats, you found the brains behind the murder! Everyone in SQL City hails you as the greatest SQL detective of all time. Time to break out the champagne!

Problem solved. It can be clearly seen above that writing Logica predicates instead of SQL queries greatly bolsters code readability and reusability.

This article was last updated on days ago, and the information described in the article may have changed.