Travel Website Development

TripFro provides best Travel Website Development with global travel supplier connectivity to sell flight, hotel, tour package, transfer and car rental services for Travel Agent in their own online…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Distribuition Styles

As we know the AWS Redshift uses computer nodes to distribute and process data and it CAN do it in parallel, but I said CAN because it is not always an automatic feature. Sometimes when we perform complex queries(and in bigdata we do it a lot) we need to help Redshift to do it.

https://unsplash.com/@chuttersnap

There are two major bottlenecks in databases processing, JOIN tables and SORT things, for those two we have the DISTKEY and SORTKEY

DISTKEY

Using the DISTKEY is basically you letting RS know the way it will distribute the data(i.e a table) through the slices, in this case using a column as a KEY and storing the rows where the column has the same values physically in the same place, in the same slice.

Imagine the following join with those two tables, using the fields we have chosen as key, the row where the values are, for example in the range of 0–100 will be in the same node/slice for both tables, so will not be necessary to ask to another node, avoiding the data movement.

ALL

Another style we have is the ALL, this style will create a copy for the table in each node. In general, it is used for small tables(less than 2M rows) and tables that aren’t updated so often, once it are in differentes places(nodes) and update it will have a higher cost because we need to updated all those places and also all those copies will take more storage space. Examples of candidates tables are that ones storing City or country codes and names for standardization.

EVEN

The style is used when we have tables where we won’t use in JOINs and they are highly de-normalized. The leader node distributes using round-robin style.

AUTO

Case you don’t define any of that styles the leader node will try distribute it in a optimal way according to the table size.

The other pain point when we are talking about database processing is the ORDER BY and functions that need to sort things(i.e Window functions) and we also have a feature to improve that, which is the SORTKEY, it is similar to the DISTKEY, it will grant that the data will be stored ordered by the chosen field/column. It helps and improves a lot in cases you need to sort or use windows functions, especially in a large amount of data.

If you are looking to improve your queries performance in Redshift, I think the best features to use are these one presented here

What if my table already has a different DISTKEY I need to?

The important thing to keep in mind is that sometimes we can’t cover all queries/processes just using one DISTKEY and one SORTKEY or maybe we can not change the table storage keys, and for that, we can use the TEMP TABLES to improve the query/processes time.

Now imagine that you have to join two tables but one or both of them aren’t distribute using the column you will use, I mean the data for these tables are distributed over the slices by the an key that isn’t the best for your join, so Redshift will need to make the nodes talk to each other and move data between them in order to join those tables and the query will take too much time to process.

One attempt to decrease thise query time is use the CREATE TEMP TABLE command, so basically, instead of using the source table directly, we create a temporary table that points to the source and this temp table has the distribution key we will use in the query. We explicitly tell Redshift to redistribute it before we start our processing.

Keep in mind that this “copy”(in quotes because you can already make some filters to bring only relevant data for your scenario) of the table can take some time, so is your responsibility to see if that strategy is worth for your scenario, in general when we have a huge amount of data, is worth because the time we take to create the temp table will be compensated when you use it on the join or window functions.

Add a comment

Related posts:

Is it safe to trade cryptocurrencies on exchanges?

The security of cryptocurrency exchanges can vary depending on the security measures and practices adopted by the exchange. While some exchanges employ highly secure measures to protect their users’…