Amazon Redshift to Databricks

Conversion information

Transpiler: BladeBridge - Target: Databricks SQL (experimental)
Transpiler: Switch - Target: Databricks Notebook ()experimental)
Input format: SQL files (.sql)

Supported Redshift Versions

Amazon Redshift (all versions)
Amazon Redshift Serverless

Input Requirements

Export your Redshift SQL scripts:

Schema Scripts: DDL statements (CREATE TABLE, CREATE VIEW, etc.)
Stored Procedures: Procedural SQL code
Query Scripts: SELECT, INSERT, UPDATE, DELETE statements
ETL Scripts: Data transformation logic

Export from Redshift:

-- Export table DDL
SELECT
    schemaname,
    tablename,
    ddl
FROM pg_get_table_def('schema_name', 'table_name');

-- Export view definitions
SELECT
    schemaname,
    viewname,
    definition
FROM pg_views
WHERE schemaname = 'your_schema';

SQL Features

Data Types - Supported

Redshift Type	Databricks Type	Notes
SMALLINT / INT2	INT	2-byte integer
INTEGER / INT / INT4	INT	4-byte integer
BIGINT / INT8	BIGINT	8-byte integer
DECIMAL / NUMERIC	NUMERIC	Precision preserved
REAL / FLOAT4	FLOAT	Single precision
DOUBLE PRECISION / FLOAT8 / FLOAT	FLOAT	Double precision
BOOLEAN / BOOL	BOOLEAN	True/false
CHAR / CHARACTER	STRING	Fixed-length string
VARCHAR / CHARACTER VARYING	STRING	Variable-length string
NCHAR / NVARCHAR	STRING	Unicode strings
TEXT	STRING	Unlimited length
DATE	DATE	Date only
TIMESTAMP	TIMESTAMP	Date and time
TIMESTAMPTZ	TIMESTAMP	Timestamp with timezone converted
TIMETZ	TIMESTAMP	Time with timezone converted
VARBYTE / VARBINARY	BINARY	Binary data
IDENTITY columns	GENERATED AS IDENTITY	Auto-incrementing columns

Data Types - Unsupported

Redshift Type	Reason	Workaround
SUPER	Semi-structured data type	Use STRING or VARIANT with manual parsing
HLLSKETCH	HyperLogLog sketches	Use approx_count_distinct() or custom UDFs
GEOMETRY	Spatial data	Migrate to Databricks spatial functions with STRING representation

SQL Functions

Date & Time Functions - Supported

Redshift Function	Databricks Equivalent	Notes
GETDATE()	CURRENT_TIMESTAMP	Current timestamp
SYSDATE	CURRENT_TIMESTAMP()	Current timestamp
DATE_PART('year', date)	YEAR(date)	Extract year
DATE_PART('month', date)	MONTH(date)	Extract month
DATE_PART('day', date)	DAY(date)	Extract day
DATE_PART('hour', date)	HOUR(date)	Extract hour
DATE_PART('minute', date)	MINUTE(date)	Extract minute
DATE_PART('second', date)	SECOND(date)	Extract second
DATE_PART('quarter', date)	QUARTER(date)	Extract quarter
DATE_PART('week', date)	WEEKOFYEAR(date)	Week number
DATE_PART('dow', date)	DAYOFWEEK(date)	Day of week
DATE_PART('doy', date)	DAYOFYEAR(date)	Day of year
DATE_PART('epoch', date)	UNIX_TIMESTAMP(date)	Seconds since epoch
DATEDIFF('day', start, end)	DATEDIFF(DAY, start, end)	Difference in days
DATEDIFF('month', start, end)	MONTH(end) - MONTH(start) + 12 * (YEAR(end) - YEAR(start))	Difference in months
DATEDIFF('year', start, end)	YEAR(end) - YEAR(start)	Difference in years
DATEDIFF('hour', start, end)	(UNIX_TIMESTAMP(end) - UNIX_TIMESTAMP(start)) / 3600	Difference in hours
DATEDIFF('second', start, end)	UNIX_TIMESTAMP(end) - UNIX_TIMESTAMP(start)	Difference in seconds
DATEADD('day', n, date)	DATE_ADD(date, n)	Add days
DATEADD('month', n, date)	ADD_MONTHS(date, n)	Add months
DATEADD('year', n, date)	ADD_MONTHS(date, n * 12)	Add years
DATEADD('hour', n, date)	TIMESTAMPADD(HOUR, n, date)	Add hours
EXTRACT(YEAR FROM date)	EXTRACT(YEAR FROM date)	Extract year
TRUNC(date)	TRUNC(date, 'DD')	Truncate to day
CONVERT_TIMEZONE(tz, ts)	FROM_UTC_TIMESTAMP(ts, tz)	Timezone conversion (2 args)
CONVERT_TIMEZONE(src_tz, tgt_tz, ts)	FROM_UTC_TIMESTAMP(TO_UTC_TIMESTAMP(ts, src_tz), tgt_tz)	Timezone conversion (3 args)

String Functions - Supported

Redshift Function	Databricks Equivalent	Notes
LEN(str)	CHAR_LENGTH(str)	String length
CHAR_LENGTH(str)	CHAR_LENGTH(str)	String length
TEXTLEN(str)	CHAR_LENGTH(str)	String length
STRPOS(str, substr)	POSITION(substr IN str)	Find substring position
CHARINDEX(substr, str)	INSTR(str, substr)	Find substring position
REPLICATE(str, n)	REPEAT(str, n)	Repeat string
REPLACE_CHARS(str, old, new)	REPLACE(str, old, new)	Replace characters
SPLIT_PART(str, delim, part)	SPLIT(str, delim)[part-1]	Split and get part
REGEXP_SUBSTR(str, pattern)	REGEXP_EXTRACT(str, pattern, 0)	Extract using regex
REGEXP_COUNT(str, pattern)	LENGTH(str) - LENGTH(REGEXP_REPLACE(str, pattern, ''))	Count regex matches
CHR(n)	CHAR(n)	ASCII code to character
QUOTE_LITERAL(str)	CONCAT(''', REGEXP_REPLACE(str, ''', ''''), ''')	Quote string literal

Aggregate Functions - Supported

Redshift Function	Databricks Equivalent	Notes
LISTAGG(col, delim)	ARRAY_JOIN(COLLECT_LIST(col), delim)	Concatenate strings
LISTAGG(DISTINCT col, delim) WITHIN GROUP (ORDER BY x)	ARRAY_JOIN(ARRAY_DISTINCT(...), delim)	Distinct concatenation with ordering
MEDIAN(col)	PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY col)	Median value
STDDEV_POP(col)	STDDEV_POP(col)	Population standard deviation
STDDEV_SAMP(col)	STDDEV_SAMP(col)	Sample standard deviation
VAR_POP(col)	VAR_POP(col)	Population variance
VAR_SAMP(col)	VAR_SAMP(col)	Sample variance

JSON Functions - Supported

Redshift Function	Databricks Equivalent	Notes
JSON_EXTRACT_PATH_TEXT(json, 'key')	GET_JSON_OBJECT(json, '$.key')	Extract JSON value (2 args)
JSON_EXTRACT_PATH_TEXT(json, 'k1', 'k2')	GET_JSON_OBJECT(json, '$.k1.k2')	Extract nested JSON value
JSON_EXTRACT_ARRAY_ELEMENT_TEXT(json, idx)	GET_JSON_OBJECT(json, '$[idx]')	Extract array element
JSON_PARSE(str)	FROM_JSON(str, schema)	Parse JSON string
JSON_QUERY(json, path)	JSON_EXTRACT_SCALAR(json, path)	Query JSON
IS_VALID_JSON(str)	CASE WHEN TRY_PARSE_JSON(str) IS NOT NULL THEN TRUE ELSE FALSE END	Validate JSON

Conditional Functions - Supported

Redshift Function	Databricks Equivalent	Notes
NVL(expr1, expr2)	COALESCE(expr1, expr2)	Null value replacement
ISNULL(expr1, expr2)	COALESCE(expr1, expr2)	Null value replacement
DECODE(expr, val1, res1, def)	CASE WHEN expr=val1 THEN res1 ELSE def END	Conditional expression
DECODE(expr, v1, r1, v2, r2, def)	CASE WHEN expr=v1 THEN r1 WHEN expr=v2 THEN r2 ELSE def END	Multiple conditions

Mathematical Functions - Supported

Redshift Function	Databricks Equivalent	Notes
DLOG1(x)	LN(x)	Natural logarithm
CEILING(x)	CEILING(x)	Round up
FLOOR(x)	FLOOR(x)	Round down
ROUND(x, n)	ROUND(x, n)	Round to n decimal places
TRUNC(x, n)	TRUNC(x, n)	Truncate to n decimal places
ABS(x)	ABS(x)	Absolute value
SIGN(x)	SIGN(x)	Sign of number
MOD(x, y)	MOD(x, y)	Modulo
POWER(x, y)	POWER(x, y)	Exponentiation

Array Functions - Supported

Redshift Function	Databricks Equivalent	Notes
ARRAY[1,2,3]	ARRAY(1,2,3)	Array literal
ARRAY_TO_STRING(arr, delim)	ARRAY_JOIN(arr, delim)	Array to string
ARRAY_UPPER(arr, dim)	SIZE(arr)	Array size
GET_ARRAY_LENGTH(arr)	SIZE(arr)	Array length
SPLIT_TO_ARRAY(str, delim)	SPLIT(str, delim)	String to array

Window Functions - Supported

Redshift Function	Databricks Equivalent	Notes
ROW_NUMBER() OVER (...)	ROW_NUMBER() OVER (...)	Row number
RANK() OVER (...)	RANK() OVER (...)	Rank
DENSE_RANK() OVER (...)	DENSE_RANK() OVER (...)	Dense rank
FIRST_VALUE(col) IGNORE NULLS	FIRST_VALUE(col) IGNORE NULLS	First value ignoring nulls
LAST_VALUE(col) IGNORE NULLS	LAST_VALUE(col) IGNORE NULLS	Last value ignoring nulls
LAG(col, offset) OVER (...)	LAG(col, offset) OVER (...)	Previous row value
LEAD(col, offset) OVER (...)	LEAD(col, offset) OVER (...)	Next row value

Utility Functions - Supported

Redshift Function	Databricks Equivalent	Notes
GENERATE_SERIES(start, end)	EXPLODE(SEQUENCE(start, end, 1))	Generate integer series
GENERATE_SERIES(start, end, step)	EXPLODE(SEQUENCE(start, end, step))	Generate integer series with step
URLPARSE(url, part)	PARSE_URL(url, part)	Parse URL
PG_BACKEND_PID()	CURRENT_USER()

DDL Statements

CREATE TABLE - Supported Features

Redshift:

CREATE TABLE customers (
    customer_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE,
    birth_date DATE,
    registration_date TIMESTAMP DEFAULT GETDATE(),
    loyalty_points DECIMAL(10,2) DEFAULT 0,
    status VARCHAR(20)
)
DISTSTYLE KEY
DISTKEY (customer_id)
SORTKEY (registration_date, customer_id);

Converted Databricks SQL:

CREATE OR REPLACE TABLE customers (
    customer_id INT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
    -- FIXME databricks.migration.unsupported.feature PRIMARY KEY
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    email STRING,
    -- FIXME databricks.migration.unsupported.feature UNIQUE Constraint
    birth_date DATE,
    registration_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    loyalty_points NUMERIC(10,2) DEFAULT 0,
    status STRING
)
ZORDER BY (registration_date, customer_id);

CREATE TABLE - Conversion Notes

Redshift Feature	Databricks Conversion	Status
IDENTITY(start, increment)	GENERATED ALWAYS AS IDENTITY	✅ Supported
PRIMARY KEY	Commented with FIXME	⚠️ Not enforced (commented)
FOREIGN KEY	Commented with FIXME	⚠️ Not enforced (commented)
UNIQUE constraints	Commented with FIXME	⚠️ Not enforced (commented)
CHECK constraints	Commented with FIXME	⚠️ Not enforced (commented)
DEFAULT values	DEFAULT	✅ Supported
NOT NULL	NOT NULL	✅ Supported
DISTSTYLE KEY/EVEN/ALL	Removed	ℹ️ Not applicable in Databricks
DISTKEY (column)	Removed	ℹ️ Not applicable in Databricks
SORTKEY (columns)	ZORDER BY (columns)	✅ Converted to Z-ordering
COMPOUND SORTKEY	ZORDER BY	✅ Converted to Z-ordering
INTERLEAVED SORTKEY	ZORDER BY	✅ Converted to Z-ordering
ENCODE (compression)	Removed	ℹ️ Databricks handles compression automatically

CREATE VIEW

Redshift:

CREATE VIEW active_customers AS
SELECT
    customer_id,
    first_name || ' ' || last_name AS full_name,
    email,
    DATEDIFF(year, birth_date, CURRENT_DATE) AS age
FROM customers
WHERE status = 'active';

Converted Databricks SQL:

CREATE OR REPLACE VIEW active_customers AS
SELECT
    customer_id,
    first_name || ' ' || last_name AS full_name,
    email,
    YEAR(CURRENT_DATE) - YEAR(birth_date) AS age
FROM customers
WHERE status = 'active';

CREATE TABLE AS SELECT (CTAS)

Redshift:

CREATE TABLE customer_summary AS
SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id;

Converted Databricks SQL:

CREATE OR REPLACE TABLE customer_summary AS
SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id;

CREATE TABLE LIKE

Redshift:

CREATE TABLE customers_backup (LIKE customers INCLUDING DEFAULTS);

Converted Databricks SQL:

CREATE OR REPLACE TABLE customers_backup AS
SELECT * FROM customers;

DML Statements

UPDATE with JOIN - Converted to MERGE

Redshift:

WITH customer_segments AS (
    SELECT
        customer_id,
        UPPER(TRIM(first_name)) || ' ' || UPPER(TRIM(last_name)) AS full_name,
        DATEDIFF(year, date_of_birth, CURRENT_DATE) AS age,
        CASE
            WHEN DATEDIFF(year, date_of_birth, CURRENT_DATE) < 25 THEN 'Young'
            WHEN DATEDIFF(year, date_of_birth, CURRENT_DATE) BETWEEN 25 AND 45 THEN 'Adult'
            ELSE 'Senior'
        END AS age_segment,
        DATE_PART(month, registration_date) AS registration_month
    FROM customers
    WHERE status = 'active'
)
UPDATE customers
SET
    first_name = cs.full_name,
    updated_at = CURRENT_TIMESTAMP
FROM customer_segments cs
WHERE customers.customer_id = cs.customer_id
    AND cs.age_segment = 'Senior'
    AND cs.registration_month IN (1, 12);

Converted Databricks SQL:

WITH customer_segments AS (
    SELECT
        customer_id,
        UPPER(TRIM(first_name)) || ' ' || UPPER(TRIM(last_name)) AS full_name,
        YEAR(CURRENT_DATE) - YEAR(date_of_birth) AS age,
        CASE
            WHEN YEAR(CURRENT_DATE) - YEAR(date_of_birth) < 25 THEN 'Young'
            WHEN YEAR(CURRENT_DATE) - YEAR(date_of_birth) BETWEEN 25 AND 45 THEN 'Adult'
            ELSE 'Senior'
        END AS age_segment,
        DATE_PART(month, registration_date) AS registration_month
    FROM customers
    WHERE status = 'active'
)
MERGE INTO customers
USING customer_segments cs
ON customers.customer_id = cs.customer_id
   AND cs.age_segment = 'Senior'
   AND cs.registration_month IN (1, 12)
WHEN MATCHED THEN UPDATE SET
    first_name = cs.full_name,
    updated_at = CURRENT_TIMESTAMP;

DELETE with JOIN - Converted to MERGE

Redshift:

DELETE FROM orders
USING customers
WHERE orders.customer_id = customers.customer_id
    AND customers.status = 'inactive'
    AND orders.order_date < CURRENT_DATE - 365;

Converted Databricks SQL:

MERGE INTO orders
USING customers
ON orders.customer_id = customers.customer_id
   AND customers.status = 'inactive'
   AND orders.order_date < CURRENT_DATE - INTERVAL 365 DAYS
WHEN MATCHED THEN DELETE;

INSERT INTO SELECT

Redshift:

INSERT INTO customer_archive
SELECT * FROM customers
WHERE status = 'deleted'
    AND updated_at < CURRENT_DATE - 730;

Converted Databricks SQL:

INSERT INTO customer_archive
SELECT * FROM customers
WHERE status = 'deleted'
    AND updated_at < DATE_ADD(current_date, -730);

Verified Conversion Examples

The following examples are taken directly from the functional test suite and show actual input/output conversions.

Example 1: Date Functions

Source Redshift SQL:

SELECT
    o.order_number,
    CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
    TO_CHAR(o.order_date, 'YYYY-MM-DD') AS formatted_order_date,
    DATEDIFF(day, o.order_date, o.estimated_delivery) AS delivery_days
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= DATEADD(month, -1, CURRENT_DATE)
ORDER BY o.order_date DESC;

Converted Databricks SQL:

SELECT
    o.order_number,
    CONCAT(c.first_name, ' ', c.last_name) AS customer_name,
    DATE_FORMAT(o.order_date,'y-MM-dd') AS formatted_order_date,
    DATEDIFF(DAY, o.order_date, o.estimated_delivery) AS delivery_days
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= ADD_MONTHS(CURRENT_DATE, -1)
ORDER BY o.order_date DESC;

Key Conversions:

TO_CHAR(date, format) → DATE_FORMAT(date, format) with format translation
DATEDIFF(day, ...) → DATEDIFF(DAY, ...)
DATEADD(month, n, date) → ADD_MONTHS(date, n)

Example 2: UPDATE with FROM → MERGE

Source Redshift SQL:

UPDATE customers
SET last_login = CURRENT_TIMESTAMP,
    updated_at = CURRENT_TIMESTAMP
WHERE UPPER(email) LIKE '%@GMAIL.COM'
AND last_login < DATEADD(month, -3, CURRENT_DATE);

Converted Databricks SQL:

UPDATE customers
SET last_login = CURRENT_TIMESTAMP,
    updated_at = CURRENT_TIMESTAMP
WHERE UPPER(email) LIKE '%@GMAIL.COM'
AND last_login < ADD_MONTHS(CURRENT_DATE, -3);

Key Conversions:

DATEADD(month, -3, date) → ADD_MONTHS(date, -3)
Simple UPDATE statements remain as UPDATE (no MERGE needed)

Example 3: CTEs with UPDATE FROM → MERGE

Source Redshift SQL:

WITH recent_orders AS (
    SELECT
        o.customer_id,
        COUNT(*) AS order_count,
        SUM(o.total_amount) AS total_spent,
        MAX(o.order_date) AS last_order_date,
        SUBSTRING(o.order_number, 1, 4) AS order_prefix
    FROM orders o
    WHERE o.order_date >= DATEADD(month, -6, CURRENT_DATE)
        AND o.status IN ('completed', 'shipped')
    GROUP BY o.customer_id, SUBSTRING(o.order_number, 1, 4)
    HAVING COUNT(*) >= 3
),
loyalty_updates AS (
    SELECT
        ro.customer_id,
        FLOOR(ro.total_spent / 100) * 10 AS bonus_points,
        CONCAT('VIP-', LPAD(ro.customer_id::VARCHAR, 6, '0')) AS new_customer_code
    FROM recent_orders ro
)
UPDATE customers
SET
    loyalty_points = loyalty_points + lu.bonus_points,
    customer_code = lu.new_customer_code,
    updated_at = CURRENT_TIMESTAMP
FROM loyalty_updates lu
WHERE customers.customer_id = lu.customer_id;

Converted Databricks SQL:

WITH recent_orders AS (
    SELECT
        o.customer_id,
        COUNT(*) AS order_count,
        SUM(o.total_amount) AS total_spent,
        MAX(o.order_date) AS last_order_date,
        SUBSTRING(o.order_number, 1, 4) AS order_prefix
    FROM orders o
    WHERE o.order_date >= ADD_MONTHS(CURRENT_DATE, -6)
        AND o.status IN ('completed', 'shipped')
    GROUP BY o.customer_id, SUBSTRING(o.order_number, 1, 4)
    HAVING COUNT(*) >= 3
),
loyalty_updates AS (
    SELECT
        ro.customer_id,
        FLOOR(ro.total_spent / 100) * 10 AS bonus_points,
        CONCAT('VIP-', LPAD(ro.customer_id::STRING, 6, '0')) AS new_customer_code
    FROM recent_orders ro
)
MERGE INTO customers
USING loyalty_updates lu
ON customers.customer_id = lu.customer_id
WHEN MATCHED THEN UPDATE SET
    loyalty_points = loyalty_points + lu.bonus_points,
    customer_code = lu.new_customer_code,
    updated_at = CURRENT_TIMESTAMP;

Key Conversions:

UPDATE ... FROM ... → MERGE INTO ... USING ... WHEN MATCHED
::VARCHAR → ::STRING
DATEADD(month, -6, date) → ADD_MONTHS(date, -6)
CTEs preserved with updated syntax

Example 4: REGEXP_COUNT Function

Source Redshift SQL:

UPDATE customers
SET loyalty_points = loyalty_points +
    CASE
        WHEN REGEXP_COUNT(email, '[A-Z]') > 3 THEN 100
        WHEN REGEXP_COUNT(email, '[0-9]') > 2 THEN 50
        ELSE 25
    END,
    status = DECODE(
        REGEXP_COUNT(phone, '[0-9]'),
        10, 'VERIFIED',
        11, 'VERIFIED',
        'PENDING'
    )
WHERE customer_id IN (
    SELECT DISTINCT o.customer_id
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days'
    AND o.total_amount > 500
);

Converted Databricks SQL:

UPDATE customers
SET loyalty_points = loyalty_points +
    CASE
        WHEN LENGTH(email) - LENGTH(REGEXP_REPLACE(email, '[A-Z]', '')) > 3 THEN 100
        WHEN LENGTH(email) - LENGTH(REGEXP_REPLACE(email, '[0-9]', '')) > 2 THEN 50
        ELSE 25
    END,
    status = case when LENGTH(phone) - LENGTH(REGEXP_REPLACE(phone, '[0-9]', ''))=10 then 'VERIFIED' when LENGTH(phone) - LENGTH(REGEXP_REPLACE(phone, '[0-9]', ''))=11 then 'VERIFIED' else 'PENDING' END
WHERE customer_id IN (
    SELECT DISTINCT o.customer_id
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days'
    AND o.total_amount > 500
);

Key Conversions:

REGEXP_COUNT(str, pattern) → LENGTH(str) - LENGTH(REGEXP_REPLACE(str, pattern, ''))
DECODE(expr, v1, r1, v2, r2, default) → CASE WHEN expr=v1 THEN r1 WHEN expr=v2 THEN r2 ELSE default END

Example 5: INSERT with REGEXP_COUNT and Complex Functions

Source Redshift SQL:

INSERT INTO addresses (customer_id, address_type, street_address, city, state_province, postal_code, country, is_primary, created_at)
SELECT
    c.customer_id,
    'BILLING',
    TRANSLATE(c.first_name || ' ' || c.last_name, 'AEIOU', '12345') || ' Street',
    CASE
        WHEN REGEXP_COUNT(c.email, '@gmail') > 0 THEN 'New York'
        WHEN REGEXP_COUNT(c.email, '@yahoo') > 0 THEN 'Los Angeles'
        ELSE 'Chicago'
    END,
    DECODE(
        REGEXP_COUNT(c.phone, '^1'),
        1, 'NY',
        'CA'
    ),
    LPAD(ABS(RANDOM() * 99999)::INT, 5, '0'),
    'USA',
    1,
    CURRENT_TIMESTAMP
FROM customers c
WHERE c.customer_id NOT IN (
    SELECT DISTINCT customer_id
    FROM addresses
    WHERE address_type = 'BILLING'
)
AND c.registration_date >= CURRENT_DATE - INTERVAL '90 days';

Converted Databricks SQL:

INSERT INTO addresses (customer_id, address_type, street_address, city, state_province, postal_code, country, is_primary, created_at)
SELECT
    c.customer_id,
    'BILLING',
    TRANSLATE(c.first_name || ' ' || c.last_name, 'AEIOU', '12345') || ' Street',
    CASE
        WHEN LENGTH(c.email) - LENGTH(REGEXP_REPLACE(c.email, '@gmail', '')) > 0 THEN 'New York'
        WHEN LENGTH(c.email) - LENGTH(REGEXP_REPLACE(c.email, '@yahoo', '')) > 0 THEN 'Los Angeles'
        ELSE 'Chicago'
    END,
    case when LENGTH(c.phone) - LENGTH(REGEXP_REPLACE(c.phone, '^1', ''))=1 then 'NY' else 'CA' END,
    LPAD(ABS(RANDOM() * 99999)::INT, 5, '0'),
    'USA',
    1,
    CURRENT_TIMESTAMP
FROM customers c
WHERE c.customer_id NOT IN (
    SELECT DISTINCT customer_id
    FROM addresses
    WHERE address_type = 'BILLING'
)
AND c.registration_date >= CURRENT_DATE - INTERVAL '90 days';

Key Conversions:

Multiple REGEXP_COUNT() conversions in CASE statements
DECODE() with 3 parameters → inline CASE
TRANSLATE() function preserved (supported in Databricks)
LPAD(), ABS(), RANDOM() preserved

Example 6: DELETE with CTEs

Source Redshift SQL:

WITH expired_products AS (
    SELECT
        p.product_id,
        p.product_name,
        REPLACE(LOWER(p.product_name), ' ', '_') AS slug_name,
        DATEDIFF(day, p.launch_date, CURRENT_DATE) AS days_since_launch,
        DATE_TRUNC('quarter', p.launch_date) AS launch_quarter
    FROM products p
    WHERE p.launch_date < DATEADD(year, -2, CURRENT_DATE)
        AND p.stock_quantity = 0
        AND p.is_active = 1
),
category_info AS (
    SELECT
        c.category_id,
        INITCAP(c.category_name) AS formatted_name
    FROM categories c
    WHERE c.is_active = 1
)
DELETE FROM products
WHERE product_id IN (
    SELECT ep.product_id
    FROM expired_products ep
    JOIN category_info ci ON ep.product_id IN (
        SELECT product_id FROM products WHERE category_id = ci.category_id
    )
    WHERE ep.days_since_launch > 730
        AND CHARINDEX('discontinued', LOWER(ep.product_name)) > 0
);

Converted Databricks SQL:

WITH expired_products AS (
    SELECT
        p.product_id,
        p.product_name,
        REPLACE(LOWER(p.product_name), ' ', '_') AS slug_name,
        DATEDIFF(DAY, p.launch_date, CURRENT_DATE) AS days_since_launch,
        DATE_TRUNC('quarter', p.launch_date) AS launch_quarter
    FROM products p
    WHERE p.launch_date < DATEADD(year, -2, CURRENT_DATE)
        AND p.stock_quantity = 0
        AND p.is_active = 1
),
category_info AS (
    SELECT
        c.category_id,
        INITCAP(c.category_name) AS formatted_name
    FROM categories c
    WHERE c.is_active = 1
)
DELETE FROM products
WHERE product_id IN (
    SELECT ep.product_id
    FROM expired_products ep
    JOIN category_info ci ON ep.product_id IN (
        SELECT product_id FROM products WHERE category_id = ci.category_id
    )
    WHERE ep.days_since_launch > 730
        AND INSTR(LOWER(ep.product_name), 'discontinued') > 0
);

Key Conversions:

DATEDIFF(day, ...) → DATEDIFF(DAY, ...)
CHARINDEX(substr, str) → INSTR(str, substr) (note: argument order swapped)
INITCAP() preserved (supported in Databricks)
DATE_TRUNC() preserved
CTEs with DELETE preserved

Advanced Features

Common Table Expressions (CTEs)

Redshift:

WITH monthly_sales AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(total_amount) AS monthly_total
    FROM orders
    GROUP BY 1
),
customer_orders AS (
    SELECT
        customer_id,
        COUNT(*) AS order_count,
        AVG(total_amount) AS avg_order_value
    FROM orders
    GROUP BY customer_id
)
SELECT
    ms.month,
    ms.monthly_total,
    COUNT(DISTINCT co.customer_id) AS active_customers,
    AVG(co.avg_order_value) AS avg_customer_value
FROM monthly_sales ms
CROSS JOIN customer_orders co
GROUP BY ms.month, ms.monthly_total;

Converted Databricks SQL:

WITH monthly_sales AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        SUM(total_amount) AS monthly_total
    FROM orders
    GROUP BY 1
),
customer_orders AS (
    SELECT
        customer_id,
        COUNT(*) AS order_count,
        AVG(total_amount) AS avg_order_value
    FROM orders
    GROUP BY customer_id
)
SELECT
    ms.month,
    ms.monthly_total,
    COUNT(DISTINCT co.customer_id) AS active_customers,
    AVG(co.avg_customer_value) AS avg_customer_value
FROM monthly_sales ms
CROSS JOIN customer_orders co
GROUP BY ms.month, ms.monthly_total;

Window Functions with IGNORE NULLS

Redshift:

SELECT
    customer_id,
    order_date,
    total_amount,
    FIRST_VALUE(payment_method) IGNORE NULLS
        OVER (PARTITION BY customer_id ORDER BY order_date) AS first_payment_method,
    LAST_VALUE(payment_method) IGNORE NULLS
        OVER (PARTITION BY customer_id ORDER BY order_date
              ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_payment_method
FROM orders;

Converted Databricks SQL:

SELECT
    customer_id,
    order_date,
    total_amount,
    FIRST_VALUE(payment_method) IGNORE NULLS
        OVER (PARTITION BY customer_id ORDER BY order_date) AS first_payment_method,
    LAST_VALUE(payment_method) IGNORE NULLS
        OVER (PARTITION BY customer_id ORDER BY order_date
              ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_payment_method
FROM orders;

LISTAGG with DISTINCT

Redshift:

SELECT
    customer_id,
    LISTAGG(DISTINCT product_category, ', ')
        WITHIN GROUP (ORDER BY product_category) AS categories
FROM order_items
GROUP BY customer_id;

Converted Databricks SQL:

SELECT
    customer_id,
    ARRAY_JOIN(
        ARRAY_DISTINCT(
            SPLIT(
                LISTAGG(product_category, ', ')
                    WITHIN GROUP (ORDER BY product_category),
                ', '
            )
        ),
        ', '
    ) AS categories
FROM order_items
GROUP BY customer_id;

Timezone Conversions

Redshift:

SELECT
    order_id,
    order_timestamp,
    order_timestamp AT TIME ZONE 'America/New_York' AS ny_time,
    CONVERT_TIMEZONE('America/New_York', order_timestamp) AS ny_converted,
    CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', order_timestamp) AS la_time
FROM orders;

Converted Databricks SQL:

SELECT
    order_id,
    order_timestamp,
    FROM_UTC_TIMESTAMP(order_timestamp, 'America/New_York') AS ny_time,
    FROM_UTC_TIMESTAMP(order_timestamp, 'America/New_York') AS ny_converted,
    FROM_UTC_TIMESTAMP(TO_UTC_TIMESTAMP(order_timestamp, 'UTC'), 'America/Los_Angeles') AS la_time
FROM orders;

Lateral Column References

Redshift allows referencing previously defined columns in the same SELECT clause. Databricks conversion handles this automatically.

Redshift:

SELECT
    customer_id,
    first_name || ' ' || last_name AS full_name,
    LENGTH(full_name) AS name_length,  -- Lateral reference to full_name
    UPPER(full_name) AS full_name_upper  -- Another lateral reference
FROM customers;

Converted Databricks SQL:

SELECT
    customer_id,
    first_name || ' ' || last_name AS full_name,
    LENGTH(first_name || ' ' || last_name) AS name_length,
    UPPER(first_name || ' ' || last_name) AS full_name_upper
FROM customers;

Redshift-Specific Features

Distribution Styles

Redshift distribution styles (DISTSTYLE, DISTKEY) are removed during conversion as Databricks handles data distribution automatically.

Redshift:

CREATE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
)
DISTSTYLE KEY
DISTKEY (customer_id);

Converted Databricks SQL:

CREATE OR REPLACE TABLE orders (
    order_id INT,
    customer_id INT,
    order_date DATE,
    total_amount NUMERIC(10,2)
);

Sort Keys

Redshift SORTKEY is converted to Databricks ZORDER BY for optimized query performance.

Redshift:

CREATE TABLE events (
    event_id BIGINT,
    event_date DATE,
    user_id INT,
    event_type VARCHAR(50)
)
COMPOUND SORTKEY (event_date, user_id);

Converted Databricks SQL:

CREATE OR REPLACE TABLE events (
    event_id BIGINT,
    event_date DATE,
    user_id INT,
    event_type STRING
)
ZORDER BY (event_date, user_id);

Compression Encoding

Redshift ENCODE clauses for column compression are removed as Databricks manages compression automatically.

Redshift:

CREATE TABLE products (
    product_id INT ENCODE az64,
    product_name VARCHAR(200) ENCODE lzo,
    description TEXT ENCODE zstd,
    price DECIMAL(10,2) ENCODE raw
);

Converted Databricks SQL:

CREATE OR REPLACE TABLE products (
    product_id INT,
    product_name STRING,
    description STRING,
    price NUMERIC(10,2)
);

Vacuum and Analyze

Redshift VACUUM and ANALYZE commands are removed as Databricks handles optimization automatically.

Redshift:

VACUUM orders;
ANALYZE customers;
VACUUM FULL products;

Converted Databricks SQL:

-- VACUUM and ANALYZE commands are not needed in Databricks
-- Databricks performs automatic optimization

Known Limitations

Features Not Supported

SUPER Data Type
- Redshift's semi-structured SUPER type
- Workaround: Use STRING type and parse with JSON functions, or use VARIANT type
HLLSKETCH Data Type
- HyperLogLog sketches for approximate counting
- Workaround: Use APPROX_COUNT_DISTINCT() or custom UDFs
GEOMETRY Data Type
- Spatial/geographic data
- Workaround: Store as STRING and use Databricks spatial functions
Referential Integrity Constraints
- PRIMARY KEY, FOREIGN KEY constraints are commented out
- Workaround: Implement validation logic in application or ETL code
CHECK Constraints
- Column-level CHECK constraints are commented out
- Workaround: Implement validation in application code or use Delta table constraints where applicable
UNIQUE Constraints
- UNIQUE constraints are commented out
- Workaround: Use MERGE operations or application-level validation
Cursor Operations
- DECLARE CURSOR, OPEN, FETCH, CLOSE
- Workaround: Rewrite using set-based operations or Databricks SQL stored procedures
UNLOAD Command
- Redshift UNLOAD to S3
- Workaround: Use Databricks COPY INTO or DataFrame write operations
COPY Command
- Redshift COPY from S3/files
- Workaround: Use Databricks COPY INTO or DataFrame read operations
Redshift Spectrum External Tables
- Querying data directly in S3
- Workaround: Use Databricks External Tables or Delta Lake

Next Steps

Export Redshift SQL scripts to .sql files

Run conversion using Lakebridge CLI:

databricks labs lakebridge transpile \
  --source-dialect redshift \
  --input-source /path/to/redshift/scripts \
  --output-folder /output/databricks-sql \
  --target-technology databricks-sql

Review generated SQL for FIXME comments
Address unsupported features (constraints, SUPER types, etc.)
Test converted SQL in Databricks workspace
Optimize with Delta Lake features (OPTIMIZE, Z-ORDER)
Deploy to production

For more information, see:

Conversion information​

Supported Redshift Versions​

Input Requirements​

SQL Features​

Data Types - Supported​

Data Types - Unsupported​

SQL Functions​

Date & Time Functions - Supported​

String Functions - Supported​

Aggregate Functions - Supported​

JSON Functions - Supported​

Conditional Functions - Supported​

Mathematical Functions - Supported​

Array Functions - Supported​

Window Functions - Supported​

Utility Functions - Supported​

DDL Statements​

CREATE TABLE - Supported Features​

CREATE TABLE - Conversion Notes​

CREATE VIEW​

CREATE TABLE AS SELECT (CTAS)​

CREATE TABLE LIKE​

DML Statements​

UPDATE with JOIN - Converted to MERGE​

DELETE with JOIN - Converted to MERGE​

INSERT INTO SELECT​

Verified Conversion Examples​

Example 1: Date Functions​

Example 2: UPDATE with FROM → MERGE​

Example 3: CTEs with UPDATE FROM → MERGE​

Example 4: REGEXP_COUNT Function​

Example 5: INSERT with REGEXP_COUNT and Complex Functions​

Example 6: DELETE with CTEs​

Advanced Features​

Common Table Expressions (CTEs)​

Window Functions with IGNORE NULLS​

LISTAGG with DISTINCT​

Timezone Conversions​

Lateral Column References​

Redshift-Specific Features​

Distribution Styles​

Sort Keys​

Compression Encoding​

Vacuum and Analyze​

Known Limitations​

Features Not Supported​

Next Steps​

Conversion information

Supported Redshift Versions

Input Requirements

SQL Features

Data Types - Supported

Data Types - Unsupported

SQL Functions

Date & Time Functions - Supported

String Functions - Supported

Aggregate Functions - Supported

JSON Functions - Supported

Conditional Functions - Supported

Mathematical Functions - Supported

Array Functions - Supported

Window Functions - Supported

Utility Functions - Supported

DDL Statements

CREATE TABLE - Supported Features

CREATE TABLE - Conversion Notes

CREATE VIEW

CREATE TABLE AS SELECT (CTAS)

CREATE TABLE LIKE

DML Statements

UPDATE with JOIN - Converted to MERGE

DELETE with JOIN - Converted to MERGE

INSERT INTO SELECT

Verified Conversion Examples

Example 1: Date Functions

Example 2: UPDATE with FROM → MERGE

Example 3: CTEs with UPDATE FROM → MERGE

Example 4: REGEXP_COUNT Function

Example 5: INSERT with REGEXP_COUNT and Complex Functions

Example 6: DELETE with CTEs

Advanced Features

Common Table Expressions (CTEs)

Window Functions with IGNORE NULLS

LISTAGG with DISTINCT

Timezone Conversions

Lateral Column References

Redshift-Specific Features

Distribution Styles

Sort Keys

Compression Encoding

Vacuum and Analyze

Known Limitations

Features Not Supported

Next Steps