Chapter 5 MOST USED FUNCTIONS
5.1 string functions
LENGTH()
: Returns the length of the string in the column.LOWER()
andUPPER()
: Converts all characters in the string column to lowercase (LOWER) or uppercase (UPPER).TRIM()
: Removes leading and trailing spaces from the string in the column.SUBSTR(string, start_position, [length])
: Extracts a substring from a string, starting at a specified position, and optionally for a specified number of characters.LEFT(column, 5)
: Extracts a specified number of characters from the left (beginning) of a string.CONCAT(col1, col2)
: Concatenates two or more strings together.
SELECT
LENGTH(first_name) AS lenght_of_first_name,
LOWER(first_name) AS lower_case,
UPPER(first_name) AS upper_case,
TRIM(first_name) AS NAME,
SUBSTR(first_name, 1, 5) AS NAME_5_CHAR,
SUBSTR(last_name, 3) AS remaining_name,
LEFT(first_name, 3) AS short_name,
CONCAT(first_name, '_', last_name) AS FULL_NAME
FROM employees;
5.2 Aggregate Functions:
These functions can also serve as window functions.
AVG()
: Computes the average value of a set of rows within the window or partition or across all rows
SELECT
AVG(SALARY) AS AVERAGE_SALARY,
AVG(SALARY) OVER (PARTITION BY DEPT) AS AVG_DEPT_SALARY
FROM employees;
COUNT()
: Counts the number of rows in the window (partition) or across the entire result set.
SELECT
COUNT(DISTINCT ID) AS DIISTINCT_ID_COUNT,
COUNT(EMP_ID) OVER (PARTITION BY DEPARTMENT) AS DEPT_EMP_COUNT
FROM employees;
SUM()
: Computes the sum of values within the window (partition) or across the entire result set.
SELECT
SUM(SALARY) AS SALARY_TOTAL,
SUM(SALARY) OVER (PARTITION BY DEPT) AS DEPT_SALARY_TOTAL
FROM employees;
MIN()
andMAX()
: Returns the minimum or maximum value within the window or partition.
5.3 Non-Aggregate Functions:
ABS()
: Returns the absolute value of each number in the column.CEIL()
orCEILING()
: Returns the smallest integer value greater than or equal to each number in the column.FLOOR()
: Returns the largest integer value less than or equal to each number in the column.ROUND(col, 2)
: Rounds the values in cola to 2 decimal places.POWER(col, n)
: Returns the value of the column raised to the power n.SQRT()
: Returns the square root of each number in the column.EXP()
: Returns e raised to the power of each number in the column.LOG()
: Returns the natural logarithm of each number in the column.TRUNCATE(col, n)
: Truncates (cuts off) the values in the column to n decimal places, withoutrounding.
5.4 Window Functions
ROW_NUMBER()
: Assigns a unique sequential integer to rows within a partition of a result set, starting at 1 for the first row in each partition.
This assigns a unique rank based on the salary, with the highest salary getting rank 1.
SELECT
ROW_NUMBER() OVER (PARTITION BY XYZ ORDER BY salary DESC) AS rank
FROM employees;
RANK()
: Assigns a rank to each row within a partition of a result set, with the same rank assigned to rows with equal values. The next distinct value(s) are given a rank that skips numbers if there are ties.
If two employees have the same salary, they will receive the same rank, and the next rank will skip accordingly (e.g., 1, 2, 2, 4).
SELECT
RANK() OVER (PARTITION BY XYZ ORDER BY salary DESC) AS rank
FROM employees;
DENSE_RANK()
: Similar toRANK()
, but it does not skip ranks after ties.
If two employees have the same salary, they will receive the same rank, but the next rank will continue sequentially (e.g., 1, 2, 2, 3).
SELECT
DENSE_RANK() OVER (PARTITION BY XYZ ORDER BY salary DESC) AS rank
FROM employees;
NTILE(n)
: Divides the rows in an ordered partition into n buckets or groups, and assigns a bucket number to each row. This is useful for distributing rows into quantiles.
This divides the employees into four quartiles based on their salary.
SELECT
employee_id, salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;
LAG()
andLEAD()
: Accesses data from previous (LAG) or subsequent (LEAD) rows in the result set, without using a self-join.
SELECT
LAG(salary, 1) OVER (ORDER BY salary DESC) AS previous_salary,
LEAD(salary, 1) OVER (ORDER BY salary DESC) AS next_salary
FROM employees;
FIRST_VALUE()
andLAST_VALUE()
: Returns the first or last value in a window or partition.
SELECT
FIRST_VALUE(salary) OVER (ORDER BY salary DESC) AS HIGHEST_SALARY
LAST_VALUE(salary) OVER (ORDER BY salary DESC) AS LOWEST_SALARY
FROM employees;
PERCENT_RANK()
: Calculates the relative rank of the current row within a partition, expressed as a percentage.
5.5 Functions comparing columns:
These functions all operate on a per-row basis, evaluating or combining values from multiple columns within the same row.
GREATEST(cola, colb, 0)
: Returns the greatest (largest) value from the list of columns for each row.LEAST(cola, colb, 0)
: Returns the smallest value among the list of arguments within a single row.COALESCE(cola, colb, 0)
: Returns the first non-null value in the column, or the default_value if all are null.
5.6 Some Advanced Functions
5.6.1 LISTAGG
concatenates values from multiple rows into a single string.
This query lists all medications in each ID, ordered by their name of medications, with the names concatenated into a single string.
5.6.2 LATERAL
This SQL query involves a few advanced concepts, including the use of the LATERAL join and the FLATTEN function, which are specific to certain SQL platforms like Snowflake.
Here’s a breakdown of the query:
ID_ARRAY
: LIST OF IDS IN EACH ROW
FLATTEN
function is used to “unpack” or “expand” the ID_ARRAY column into individual elements. Essentially, it takes an array and returns each element in the array as a separate row.
5.6.3 UPDATE
you want to update the salary of an employee with an EmployeeID of 5 to $75,000.
You would write: