Chapter 5 MOST USED FUNCTIONS

5.1 string functions

LENGTH(): Returns the length of the string in the column.
LOWER() and UPPER(): Converts all characters in the string column to lowercase (LOWER) or uppercase (UPPER).
TRIM(): Removes leading and trailing spaces from the string in the column.
SUBSTR(string, start_position, [length]): Extracts a substring from a string, starting at a specified position, and optionally for a specified number of characters.
LEFT(column, 5): Extracts a specified number of characters from the left (beginning) of a string.
CONCAT(col1, col2): Concatenates two or more strings together.

SELECT
  LENGTH(first_name) AS lenght_of_first_name,
  LOWER(first_name) AS lower_case,
  UPPER(first_name) AS upper_case,
  TRIM(first_name) AS NAME,
  SUBSTR(first_name, 1, 5) AS NAME_5_CHAR,
  SUBSTR(last_name, 3) AS remaining_name,
  LEFT(first_name, 3) AS short_name,
  CONCAT(first_name, '_',  last_name) AS FULL_NAME
FROM employees;

5.2 Aggregate Functions:

These functions can also serve as window functions.

AVG() : Computes the average value of a set of rows within the window or partition or across all rows

SELECT 
      AVG(SALARY) AS AVERAGE_SALARY,
      AVG(SALARY) OVER (PARTITION BY DEPT) AS AVG_DEPT_SALARY
FROM employees;

COUNT() : Counts the number of rows in the window (partition) or across the entire result set.

SELECT 
      COUNT(DISTINCT ID) AS DIISTINCT_ID_COUNT,
      COUNT(EMP_ID) OVER (PARTITION BY DEPARTMENT) AS DEPT_EMP_COUNT
FROM employees;

SUM() : Computes the sum of values within the window (partition) or across the entire result set.

SELECT 
    SUM(SALARY) AS SALARY_TOTAL,
    SUM(SALARY) OVER (PARTITION BY DEPT) AS DEPT_SALARY_TOTAL
FROM employees;

MIN() and MAX(): Returns the minimum or maximum value within the window or partition.

SELECT 
      MIN(SALARY) AS MIN_SALARY,
      MAX(SALARY) AS MAX_SALARY,
      MIN(SALARY) OVER (PARTITION BY DEPT) AS MIN_DEPT_SALARY,
      MAX(SALARY) OVER (PARTITION BY DEPT) AS MAX_DEPT_SALARY
FROM employees;

5.3 Non-Aggregate Functions:

ABS() : Returns the absolute value of each number in the column.
CEIL() or CEILING() : Returns the smallest integer value greater than or equal to each number in the column.
FLOOR() : Returns the largest integer value less than or equal to each number in the column.
ROUND(col, 2): Rounds the values in cola to 2 decimal places.
POWER(col, n): Returns the value of the column raised to the power n.
SQRT(): Returns the square root of each number in the column.
EXP(): Returns e raised to the power of each number in the column.
LOG(): Returns the natural logarithm of each number in the column.
TRUNCATE(col, n): Truncates (cuts off) the values in the column to n decimal places, without rounding.

5.4 Window Functions

ROW_NUMBER(): Assigns a unique sequential integer to rows within a partition of a result set, starting at 1 for the first row in each partition.

This assigns a unique rank based on the salary, with the highest salary getting rank 1.

SELECT 
      ROW_NUMBER() OVER (PARTITION BY XYZ ORDER BY salary DESC) AS rank
FROM employees;

RANK(): Assigns a rank to each row within a partition of a result set, with the same rank assigned to rows with equal values. The next distinct value(s) are given a rank that skips numbers if there are ties.

If two employees have the same salary, they will receive the same rank, and the next rank will skip accordingly (e.g., 1, 2, 2, 4).

SELECT 
      RANK() OVER (PARTITION BY XYZ ORDER BY salary DESC) AS rank
FROM employees;

DENSE_RANK(): Similar to RANK(), but it does not skip ranks after ties.

If two employees have the same salary, they will receive the same rank, but the next rank will continue sequentially (e.g., 1, 2, 2, 3).

SELECT 
      DENSE_RANK() OVER (PARTITION BY XYZ ORDER BY salary DESC) AS rank
FROM employees;

NTILE(n): Divides the rows in an ordered partition into n buckets or groups, and assigns a bucket number to each row. This is useful for distributing rows into quantiles.

This divides the employees into four quartiles based on their salary.

SELECT 
      employee_id, salary,
      NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

LAG() and LEAD(): Accesses data from previous (LAG) or subsequent (LEAD) rows in the result set, without using a self-join.

SELECT 
       LAG(salary, 1) OVER (ORDER BY salary DESC) AS previous_salary,
       LEAD(salary, 1) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;

FIRST_VALUE() and LAST_VALUE(): Returns the first or last value in a window or partition.

SELECT 
      FIRST_VALUE(salary) OVER (ORDER BY salary DESC) AS HIGHEST_SALARY
      LAST_VALUE(salary) OVER (ORDER BY salary DESC) AS LOWEST_SALARY
FROM employees;

PERCENT_RANK(): Calculates the relative rank of the current row within a partition, expressed as a percentage.

SELECT 
       PERCENT_RANK() OVER (ORDER BY salary DESC) AS percentile_rank
FROM employees;

5.5 Functions comparing columns:

These functions all operate on a per-row basis, evaluating or combining values from multiple columns within the same row.

GREATEST(cola, colb, 0): Returns the greatest (largest) value from the list of columns for each row.
LEAST(cola, colb, 0): Returns the smallest value among the list of arguments within a single row.
COALESCE(cola, colb, 0) : Returns the first non-null value in the column, or the default_value if all are null.

SELECT
    GREATEST(SALARY_2000, SALARY_2010) AS GREATEST_SALARY,
    LEAST(SALARY_2000, SALARY_2010) AS LEAST_SALARY,
    COALESCE(SALARY_2000, SALARY_2010) as SALARY
FROM employees;

5.6 Some Advanced Functions

5.6.1 LISTAGG

concatenates values from multiple rows into a single string.

This query lists all medications in each ID, ordered by their name of medications, with the names concatenated into a single string.

SELECT 
    ID, 
    LISTAGG(DISTINCT MEDICATIONS, ', ') WITHIN GROUP (ORDER BY MEDICATIONS) AS MED_ARRAY
FROM table_name
GROUP BY ID

5.6.2 LATERAL

This SQL query involves a few advanced concepts, including the use of the LATERAL join and the FLATTEN function, which are specific to certain SQL platforms like Snowflake.

Here’s a breakdown of the query:

ID_ARRAY: LIST OF IDS IN EACH ROW

FLATTEN function is used to “unpack” or “expand” the ID_ARRAY column into individual elements. Essentially, it takes an array and returns each element in the array as a separate row.

SELECT F.MY_ID::VARCHAR AS MY_ID
FROM TABLE_1 A,  LATERAL FLATTEN(A.ID_ARRAY) AS F

5.6.3 UPDATE

you want to update the salary of an employee with an EmployeeID of 5 to $75,000.

You would write:

UPDATE MY_TABLE

SET Salary = 75000, JobTitle = 'Senior Analyst'

WHERE EmployeeID = 5;

5.6.4 ALTER

DROP COLUMN “MiddleName” FROM Employees TABLE

ALTER TABLE Employees
DROP COLUMN MiddleName;

5.6.5 DELETE ROWS

This will delete all rows but keep data structure

TRUNCATE TABLE table_name;

This will delete all rows but keep data structure. And also log row deletion.

DELETE FROM table_name;

5.6.6 INSERT

You can now insert new data into the table:

INSERT INTO table_name (column1, column2, ...)

SELECT column1, column2, ...
FROM source_table
WHERE condition;

INSERT INTO table_name 

SELECT *
FROM source_table