Chapter 8 Sample Solutions

8.1 Datalemur 20 Solved SQL

8.1.1 Repeated transactions

Sometimes, payment transactions are repeated by accident; it could be due to user error, API failure or a retry error that causes a credit card to be charged twice.

Using the transactions table, identify any payments made at the same merchant with the same credit card for the same amount within 10 minutes of each other. Count such repeated payments.

Assumptions:

The first transaction of such payments should not be counted as a repeated payment. This means, if there are two transactions performed by a merchant with the same credit card and for the same amount within 10 minutes, there will only be 1 repeated payment.

Solution for PostgreSQL 14

SELECT 
COUNT(DISTINCT A.transaction_id) AS PAYMENT_COUNT
FROM transactions A
INNER JOIN transactions B 
  ON A.merchant_id=B.merchant_id
  AND A.credit_card_id=B.credit_card_id
  AND A.AMOUNT=B.AMOUNT
WHERE EXTRACT(EPOCH FROM (A.transaction_timestamp - B.transaction_timestamp))/60 > 0 
  AND EXTRACT(EPOCH FROM (A.transaction_timestamp - B.transaction_timestamp))/60 <= 10

Alternative solution in SNOWFLAKE

SELECT 
    COUNT(DISTINCT A.transaction_id) AS PAYMENT_COUNT
FROM transactions A
INNER JOIN transactions B 
    ON A.merchant_id = B.merchant_id
    AND A.credit_card_id = B.credit_card_id
    AND A.AMOUNT = B.AMOUNT
    AND A.transaction_id != B.transaction_id
WHERE 
    DATEDIFF(minute, B.transaction_timestamp, A.transaction_timestamp) > 0 
    AND DATEDIFF(minute, B.transaction_timestamp, A.transaction_timestamp) <= 10;

Using LAG() window function

WITH payments AS (
  SELECT 
    merchant_id, 
    EXTRACT(EPOCH FROM transaction_timestamp - 
      LAG(transaction_timestamp) OVER(
        PARTITION BY merchant_id, credit_card_id, amount 
        ORDER BY transaction_timestamp)
    )/60 AS minute_difference 
  FROM transactions
) 

SELECT COUNT(merchant_id) AS payment_count
FROM payments 
WHERE minute_difference <= 10;

8.1.2 Describe how recursive queries work in SQL.

Provide an example scenario where a recursive query would be beneficial.

Recursive queries in SQL enable iteration over hierarchical data structures like organizational charts, bill-of-materials, or nested comments. For example, you could use a recursive query to traverse a hierarchical data structure and retrieve all descendants of a specific node.

Scenario: Organizational Hierarchy Reporting

Imagine you have an employee table in a database that stores information about employees and their reporting structure in a large company. The table includes an EmployeeID, EmployeeName, and ManagerID, where ManagerID is a reference to another EmployeeID, indicating who the manager is for that employee.

Objective: You need to generate a report that lists all employees under a specific manager, including all levels of the hierarchy (i.e., the manager’s direct reports, the direct reports of those direct reports, and so on).

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName VARCHAR(100),
    ManagerID INT -- This references EmployeeID in the same table
);

Sample Data

INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID)
VALUES 
(1, 'Alice', NULL),      -- Alice is the CEO, no manager
(2, 'Bob', 1),           -- Bob reports to Alice
(3, 'Charlie', 2),       -- Charlie reports to Bob
(4, 'David', 2),         -- David reports to Bob
(5, 'Eve', 3),           -- Eve reports to Charlie
(6, 'Frank', 3);         -- Frank reports to Charlie

8.1.2.1 Recursive Query Example:

To retrieve the entire hierarchy under Bob (EmployeeID = 2):

WITH RECURSIVE EmployeeHierarchy AS (
    -- Anchor member: Start with the manager (Bob)
    SELECT 
        EmployeeID, 
        EmployeeName, 
        ManagerID
    FROM Employees
    WHERE EmployeeID = 2

    UNION ALL

    -- Recursive member: Retrieve all employees who report to the current level of employees
    SELECT 
        e.EmployeeID, 
        e.EmployeeName, 
        e.ManagerID
    FROM Employees e
    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)

SELECT 
    EmployeeID, 
    EmployeeName, 
    ManagerID
FROM EmployeeHierarchy;

Why Use a Recursive Query Here?

The recursive query effectively traverses the hierarchy and retrieves all descendants of a specific manager. This approach is scalable and adaptable to varying levels of hierarchy depth, which would be challenging to achieve with a non-recursive query.

8.1.3 Median Google Search Frequency

Google’s marketing team is making a Superbowl commercial and needs a simple statistic to put on their TV ad: the median number of searches a person made last year.

However, at Google scale, querying the 2 trillion searches is too costly. Luckily, you have access to the summary table which tells you the number of searches made last year and how many Google users fall into that bucket.

Write a query to report the median of searches made by a user. Round the median to one decimal point.