Redshift Research Project

Blog

2021-08-09

Update

The site is now approaching the runway.

I’m doing the final, pre-publication reviews of the initial set of five of so PDFs and their scripts.

I’ve made the first rough version of the System Table Explorer. I have a bunch of historical dumps of the system tables, and I want to get them onto the site. First though I need to get the current dump (made this morning) to full release quality.

2021-08-17

And We’re Live

Well, that’s it - the site is now public, with the first white paper. Going to publish one white paper per day for the first five days, and then kick back to the normal rate, which I expect to be about one a week.

2021-08-19

Unsorted Tables

I think I’ve just realised it is no longer possible to create unsorted tables.

It used to be a table would be unsorted if sorting was not specified.What happens now is that if sorting is not specified, you get auto, and Redshift is now at liberty to change the table sorting.

There is no longer a way to actually specify you want and want to keep a table unsorted.

This totally breaks a ton of my test work. I’m now going to have to specify sorting, and then when working with the table handle the unsorted segment so it doesn’t mess up the results.

This is also going to mess up everyone who is actually deliberately using unsorted tables because they know it’s what they want, often to save disk space

I’ve said it before and I’ll say it again now : the Redshift dev team know about making Redshift, but they do not know about making systems with Redshift.

2021-08-20

Temp Tables and K-Safety

Temp tables do not participate in k-safety.

I’m improving the test method for one of the white papers, and as a part of this I’ve just had the opportunity to compare temp vs non-temp.

I have a 1600 column test table, 4 slices, 5 blocks per column.

With k-safety (so a normal table), populating took 572 seconds.

Without (so a temp table), populating took 449 seconds.

A very worthwhile improvement, since I have to create and populate that test table 20 times.

2021-08-23

Update

Just published “The Commit Queue”.

I had intended to post one white paper per day for the first five days, then kick back to one weekly.

After the second white paper was published, there was an excellent review by a mod in r/aws, and I spent about four days making improvements to the test method (basically moving to use exactly full blocks rather than rows, when populating tables and/or inserting data, and also for that white paper in particular, dropping and remaking the test table before every test SQL query - it all took a much longer than it should have because of issues which were uncovered in Redshift and had to be worked around).

Having done this, I wanted to propagate the improvements to the other white papers, which I’ve just finished doing, and then having re-run the tests, finished updating the text for “The Commit Queue”.

I should now be able to go back to my intended schedule of one post per day for a few days.

2021-08-25

Work In Progress

It’s been taking a while, in part because of other demands on my time, and in part because there’s been quite a bit of exploration work involved, but I’m working currently on a paper about query compilation performance.

I think I’m on the home straight now, more or less - I need to review a contract this evening though as well, so we’ll see how it goes. I really want to publish tonight if I possibly can, since I’ve been working on this for some days now.

It’s actually quite a tricky piece of work, because the compiled segment cache has quite a long memory, so you only really get one clear shot at a full test run; if you were to run it twice, you’d find a lot of compilation cached, and there go your results.

The compiled segment cache absolutely need to be something users can disable, so they can test their systems for the events where the cache has been cleared (cluster upgrades), which would also help me here.

In my experience though, the devs never think about this king of thing. I’m very much of the view they don’t have experience when it comes to actually making systems using Redshift.



Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers