Redshift Research Project

Blog

2022-08-07

Static and Dynamic Column Ordinals in System Tables

I’ve been working on the second version of the Cluster Toolkit.

As part of this, I want to know how many blocks are consumed per slice, per column, for a table.

stv_blocklist has one row per block, per slice, per column; it counts from 0, and numbers the three system columns (insertxid, deletexid and oid dynamically, which is to say, these are given the ordinal of the number of user defined columns plus one, plus two and plus three, respectively, so their numbers vary by the number of columns in the table.

Postgres numbers these columns statically, and gives them negative numbers (i.e. is sane).

So in other words (remembering stv_blocklist counts from 0);

Column Postgres Ordinal Redshift Ordinal
insertxid -8 number of user columns + 0
deletexid -9 number of user columns + 1
oid -2 number of user columns + 2

You get the names of columns from pg_attribute, so when you join stv_blocklist to pg_attribute, you have to figure out the numbers being used in stv_blocklist (as they vary by table) and then join when the ordinal in use in stv_blocklist equals the static, negative ordinal in Postgres.

So you end up with this (as a simpler example, getting sorted and unsorted blocks per column);

select
  pg_namespace.nspname               as schema_name,
  pg_class.relname                   as table_name,
  pg_attribute.attname               as column_name,
  subquery_blocklist.blocks_sorted   as blocks_sorted,
  subquery_blocklist.blocks_unsorted as blocks_unsorted
from
       pg_class
  join pg_namespace on pg_namespace.oid      = pg_class.relnamespace
  join pg_attribute on pg_attribute.attrelid = pg_class.oid
  join pg_type      on pg_type.oid           = pg_attribute.atttypid
  (
    select
      tbl                                         as table_id,
      col                                         as column_ordinal,
      slice                                       as slice_id,
      stv_blocklist_column_counts.column_count -3 as column_user_column_count,
      sum( 1-unsorted )                           as blocks_sorted,
      sum(   unsorted )                           as blocks_unsorted
    from
      stv_blocklist,
      (
        select
          tbl                   as table_id,
          count( distinct col ) as column_count
        from
          stv_blocklist
        group by
          tbl
      ) as stv_blocklist_column_counts
    where
      stv_blocklist.tbl = stv_blocklist_column_counts.table_id
    group by
      tbl,
      col,
      slice,
      column_user_column_count
  ) as subquery_blocklist
  on     subquery_blocklist.table_id = pg_class.oid
     and (
              ( subquery_blocklist.column_ordinal   = column_user_column_count+0 and pg_attribute.attnum = -8 )
           or ( subquery_blocklist.column_ordinal   = column_user_column_count+1 and pg_attribute.attnum = -9 )
           or ( subquery_blocklist.column_ordinal   = column_user_column_count+2 and pg_attribute.attnum = -2 )
           or ( subquery_blocklist.column_ordinal+1 = pg_attribute.attnum                                     )
         )
where
  pg_class.relkind = 'r';

What I should have is this;

select
  pg_namespace.nspname            as schema_name,
  pg_class.relname                as table_name,
  pg_attribute.attname            as column_name,
  sum( 1-stv_blocklist.unsorted ) as blocks_sorted,
  sum(   stv_blocklist.unsorted ) as blocks_unsorted
from
       pg_class
  join pg_namespace  on pg_namespace.oid      = pg_class.relnamespace
  join pg_attribute  on pg_attribute.attrelid = pg_class.oid
  join stv_blocklist on     pg_class.tbl        = stv_blocklist.tbl
                        and pg_attribute.attnum = stv_blocklist.col
where
  pg_class.relkind = 'r'
group by
  schema_name,
  table_name,
  column_name;

My general view of Redshift is that the devs are very much lacking in real-world practical experience, and this is an archetypal outcome of that, which is also representative of the usual issues throughout the system tables. To my thought, anyone with experience would instantly perceive making column numbers dynamic was bad, and that it would then further be egregiously wrong to have the leader-node being static with column ordinals in some tables and dynamic in others.

Kyiv Week Two

Well, I intended to blog daily but the first two weeks have been too bus with settling in. That’s probably not a bad thing : there’s a lot of minutiae in life which isn’t worth the writing :-)

So…

Speaking generally, it is and naturally usually harder to notice when something is absent than when something is present.

During the week, one morning, I consciously realized something which I had perceived but not actually thought about because it was obviously the case : in the two weeks I’ve been here, in this city of about three million people, there has not been one single aeroplane, not one, in the sky.

Those of you reading, whether you live in a city or not, you will see aeroplanes or contrails most days. Imagine then living in London, or New York, and there being not one single aeroplane, ever.

Something else which particularly caught my eye in town is that I’ve seen one or two healthy young men with whole-leg casts, and crutches, and I saw a man in his early 30s, walking with his attractive partner, missing an arm.

I suspect these are soldiers.

I’ve visited a number of supermarkets since coming here, to find the foods, or replacements for them, that I usually consume. Everywhere supplies seem absolutely fine - the shelves are packed - with perhaps one except : I have a bit of a feeling certain fresh foods are in rather erratic supply - in particular I notice (as I buy them) broccoli and golden kiwi and just recently, oranges. The latter I can understand - it’s not so common - but the formers are surprising. Broccoli and oranges are ordinary and common.

As I’m writing of food, the bananas here are amazing; big, golden, ripe. The supply of blueberries is fantastic - you buy huge packs of them for about two euros (I’d say you’d be looking at 10 euros in Europe) and they’re always good, never, not once, squishy.

Different foods in different countries, and your diet adapts on where you go.

English is more common here than I expected, it’s not only the younger generation : I was in the swimming pool on Friday, some middle aged woman (ha - my age, scarily) and I were sorting out how to swim in the lane we shared, and lo and behold, perfectly good English.

I’m picking up Ukrainian, at my usual rate of about one word per day. I’m not willingly learning Cyrillic, it just had too many letters and I just feel it’s a bad design, so you get transliteration into Latin : “zaraz” is “now”, “tse” is “this” and “tut” is “here”, “dobre” is “good”.

Cyrillic is derived from Greek, and I can read Greek, so I get some letters for free. Pi and rho, for example, have come over unchanged.

Here in Kyiv, the air raid sirens go off about once every three days. I was just thinking to write about this and lo and behold, there they go - sounding now as I write.

The official advice is get two walls between you and the outside - so, in my case, this would be getting into my bathroom. In reality, Kyiv is enormous and the chances of being hit by one or two missiles is inseparable from zero. I ignore the sirens, and everyone else does, too, except those people who are required to pay attention to this - and by this I mean people working in shops, or for example at my swimming pool.

They all have orders from their bosses to shut the store (or clear the pool) and move to a place of safety, and they have to do it.

The first time the sirens went off was in the first week and I was in a shopping mall. All the shops were shut and the mall was closed, we all went outside… …where everyone was simply carrying on with their business.

As its, the incoming missiles while I’ve been here have been aimed at various this-and-that’s some kilometers from Kyiv. Once I start socializing here I’d be interested to see if I can hear from the locals what’s being hit.

I see few soldiers in town, when I’m out and about doing various bits of routine shopping, or rather, I see few people in the camouflage outfits worn by the army who on the face of it are going to be soldiers; typically I’d say one to three per hour of walking about. Kyiv is about two, three hundred kilometers from the frontier, and the Russians have pulled back to the frontier, so it’s quite out of the way. I suspect these soldiers I see here are just arriving on leave, or just returning from leave.

There’s a law in force which is that males in the age range for conscription cannot leave the Ukraine. An unintended consequence of this is that all males in that age range who are outside of the Ukraine now cannot return at all, because they will not be able to leave again. This is fine if you return to fight, but it’s a problem if you say live in Germany normally; you’ve not been conscripted, you have a life to be getting on with, you enter the Ukraine and you cannot leave.

Related to this is that where I am not Ukrainian, I can come into the Ukraine, put money into the economy, but then must leave when my visa-waiver runs out.

There have been in this second week a final few items which I’ve needed to buy - my flip-flops died (had them for a year, no complains for 20 euro), I’ve needed new t-shirts (merino wool), and… Marmite. Important source of B vitamins for vegetarians.

When you know exactly what you want, ordering internationally on the Internet is not really a thing these days. It’s fragile - lots and lots of factors have to work correctly, and that of course is inherently unlikely.

Marmite I’ve concluded is not sold in Ukraine. However, by what can be only be considered a fairly hefty miracle, I found a local, an individual, who has a supply which they themselves have brought into the Ukraine to resell. I have bought now Marmite, enough and with some buffer for my entire stay, at what can only be called “war prices”, 10 euros per 250g :-)

It was in fact impossible to order internationally, and on one hand I find that rather shocking in this day and age, but on the other, as I say, international ordering is fragile, too fragile to be viable.

(And sales tax shows up a profound unintended consequence. If I order in say the USA, but they will not ship to Ukraine, and send to a friend in Germany, who sends to me, I pay sales tax and import duty first in Germany and then again in Ukraine. Sales taxes and import duties should be abolished, everywhere, because they have major unintended consequences and are complex taxes to administer. They also, for those who want to redistribute wealth, fall harder the poorer the buyer is. If I have almost no money, 20% tax is a lot; and every poor person must pay that tax. If I am wealthy, I don’t care.)

So, finally, I am minded to write a little about the progress in the war and the current military situation.

I’ve read military history since I was child, and I have then something of the knowledge to at least imagine I can, to the extent of what I do know of what is going on, gauge the state and development of the war.

To quote Churchill in I think it would have late 1941 or early 1942 : “this is not the beginning of the end, but it is the end of the beginning”.

The Russians have lost the strategic initiative; they no longer have the strength to attack. When you have the strength to attack, you by launching attacks dictate to your opponent their actions. It’s like chess. If I in chess attack a high value piece with a low value piece, you are almost certainly going to need to move that high value piece; my attack demands your defence.

Attacking though in war is expensive. Defence is much more powerful, and attacking units suffer heavy losses. Additionally, and particularly so for Russia which is weak in these matters, there are logistical problems; Russia attacked from its own territory, with preparation, and so had as much in the way of supplies available to fighting units as was possible. As those units advance, you then need to move supplies forward, and this becomes progressively harder as distances increase. There is also the problem of running out of supplies, be it fuel, food or ammunition.

Russia has now shot its bolt. The offensive has run its course. The Ukrainians have survived, and now, unimpeded by needing to respond to attacks, they themselves attack - and in doing so, the Russians now must respond, and so we see as we now do that the Ukrainians are moving to recover Kherson, and the Russians have had to move considerable forces to that region to assist in defence.

What then of the future?

The Ukraine needs to re-equip with heavy Western weapons, because there is no way to produce enough ammunition for the heavy Soviet weapons it is currently largely equipped with. The problem here is that the West generally has allowed its military to fall into “bare bones” states. They don’t have that much to give. There’s a couple of thousand kilometers of front line, and Germany sends say five really modern artillery pieces. Five. What’s more, of these, two or three them - these are all very complex machines - will be in maintenance at any given time.

There needs to be a massive infantry training programme. This will take at least a year, probably two. NATO is getting on with this now.

The USA and EU need to maintain economic and military support. My concern here is Trump. He is happy to subvert and destroy the democratic process to stay in power and as such he supports Putin, because he thinks Putin is trying to help him get into power (which he was, and will do). Still, we have two more years at least without Trump. I would be completely happy if someone shot him, or he died from a heart attack. He is directly responsible even when we think of nothing else of the nine deaths in the Capitol riot, let alone that catastrophe of a return to power. I pray the Justice department in the end level criminal charges and this bars him from becoming president; but even then he has so much influence amongst the naive, gullible and deluded in the USA that he will continue to strongly influence the Republican Party. Only his death will see the Republicans begin to return to sanity and democracy.

So, given weapons, infantry and economic support, in time, the Ukraine pushes Russia back to its borders. Putin controls the population and the country, so he has nothing to fear from this; like Hitler, he is ensconced, and no catastrophe, no matter how large, will lead to his ejection. We must wait for his death also for change. Russia is a nation of the surveilled, brainwashed and beaten; those who do disagree and make it known get picked up and beaten up.

All in all, the situation is pretty good and hopeful.

2022-08-08

System Table Functions

So to get information about late binding view you must use the function pg_get_late_binding_view_cols(). I’ve seen a number of new sets of information in the system table being published via functions, rather than tables.

A problem with this is that as far as I can see there’s no way to indicate that you are, if you are, and I am, interested in only one view.

You have to get every damn row for every damn column for every single late binding view, and this shows up as a problem when you have thousands of late-binding view; queries which need information about them grind to a halt.

2022-08-10

Real-Time Video of a Table Being VACUUMed

I’m developing tooling for Redshift.

Part of this is real-time monitoring of tables. You can see how their sorted and unsorted block counts are changing, on a per-column basis, across all slices, in real-time.

I have used this functionality to produce a video of VACUUM.

So, first, I create a brand new cluster, 2 x dc2.large.

I then create a table, like so;

create table table_1
(
  column_1  int8  encode raw
)
diststyle even
compound sortkey( column_1 );

And I populate it with random int8 integers, using insert and self-join.

I end up with this table;

There’s are two sorted blocks because the very first insert to an empty table is sorted, and that insert was for one row, so there’s one sorted block in column_1 and one sorted block in the system column oid (the deletexid and insertxid columns behave rather oddly, differently to their alleged encoding, and the insertxid block is unsorted, and we see here there’s no deletexid as no delete has occurred).

And here are the columns;

There’s no raw estimate (the final column) for oid as I’ve not yet written code to compute this or that data type (this column generally is useful to let you see quickly if the current encoding is doing any good).

You can see here the system tables seem to be behaving quite differently to their purported encoding. We’ll see more on this as the VACUUM proceeds.

And now here is a screenshot of the monitor;

The values in each cell are unsorted blocks / sorted blocks.

The update is from the web-server clock, not the javascript clock in the browser.

So, the table having been created, the vacuum command is issued;

vacuum full table_1 to 100 percent;

Now the good stuff. The video. The update rate is about once per 2.5 seconds, and the video frame rate is 1 per second. The video is 7m 40s long, beginning a few seconds before the vacuum command is issued until a few seconds after the command ends.

There are long periods of minimal action, so I recommend you play at about 5x and watch it a couple of times to pick up all the events.

If the browser has trouble playing, the files are here;

https://amazonredshiftresearchproject.org/slblog/files/vacuum_video/vacuum.mp4
https://amazonredshiftresearchproject.org/slblog/files/vacuum_video/vacuum.ogg
https://amazonredshiftresearchproject.org/slblog/files/vacuum_video/vacuum.webm

(The mp4 is the original. It was when I came to write this page I re-discovered it will not play in Firefox. The others are re-encodings of the mp4, so quality will be a bit less.)

Now, observations about what’s seen in the video.

First, the biggie : vacuum runs concurrently on one slice per node. I’ve yet to try this on node types with more slices to see what happens.

Vacuum begins by slowly increasing the number of deletexid and insertxid blocks (system columns which behave oddly) and then at the 1:55 mark, we suddenly find we have an extra 36 unsorted blocks. The number of insertxid and deletexid blocks continued to gradually increase, reaching a maximum of 46 and 47, respectively, and then there’s some delay and then - pow - at 3:39 all the blocks become sorted, except one (off-by-one error somewhere? or could it relate to the single sorted block when we began?), and the number of deletexid and insertxid blocks drops to 9 and 20 sorted blocks, respectively, and both have a single unsorted block (being used by the single unsorted block in column_1).

This process repeats for the next two slices and then we’re done.

Too late now to think about or write about the deletexid and insertxid. I’ll blog once I’ve thought it through. What we see there is telling us about how VACUUM is working internally.

2022-08-11

ATMs

Today I went to collect a package from Nova Poshta, which I believe is an independent postal company. They have pick-up shops all over Kyiv. You have your package sent to one of those shops, and then you go pick it up.

I may be wrong, but I have the impression they’re a lot better than Ukrposhta, which I think is the State postal service. The sender of the package preferred NP, and the NP office is packed with packages and people, where the UKRP office, next door, was empty except for the people behind the counters.

So, in I went, and the young girl at the counter spoke fluent English, which made everything easy.

I needed to pay cash on delivery, and both my cards were not accepted. I think this is because of anti-fraud set up by NP; the first card is working fine everywhere else, the second card (which is an account in Sterling, but which can be billed in other currencies) was rejected for not being denominated in UAH.

I left to find an ATM, to withdraw cash.

The eighth ATM I visited worked.

Four were I think out of cash (and give the wrong error messages; rather than saying “out of cash”, they say “withdrawal limit exceeded”), two were broken, one was not fully in English and provided me with a screen fully in Ukrainian and asking me yes/no, which was a non-starter =-)

The final ATM has a UI designed by a rabid, dyslexic myopic. I suspect it might have been an attempted Russian sabotage :-)

I suspect the cash-less machines might be an effect of the war; ATMs are not being kept full of notes as much as before, or kept repaired.

Having obtained cash, I then picked up package, took it home, went out again and bought some nice chocolate and delivered it to the English-speaking young girl. It’s a lot of work to learn a language, and I benefited from her work :-)

2022-08-12

Signs of War

Something I’ve just realised; there are no - none - not one - street nameplates in Kyiv.

I suspect they were all removed when the Russians were on the outskirts of the city, to hinder their navigation if fighting moving into the city.

2022-08-15

Kyiv Photos and Audio

Normally, no photos are taken in town. Russians were using them to correct missile fire.

However, Sunday I went for a long walk in the city, exploring, and happened to come by a square (Saint Michael’s Square, I believe) where wrecked Soviet vehicles are on display.

Photos there were okay.

The upper photo is what I think is an abandoned tank. I walked around it looking carefully and could find no munition entry point.

The lower photo is a 152mm self-propelled artillery piece.

I made an audio recording also of the air raid sirens. Starts quiet, gets louder.

If the browser has trouble playing, the file is here;

https://amazonredshiftresearchproject.org/slblog/files/kyiv_1/air_raid_sirens.m4a

2022-08-20

Minor Update

So…

I saw a helicopter today, a military helicopter, first aircraft I’ve seen since coming here.

Getting money to the owner of the apartment, to pay rent, has been difficult.

The owner tells me SWIFT is erratic; and an alternative he suggested, WayForPay, did not work, charged 7.5% (!), and informed me that I could safely email a photo of my passport as “passports are not confidential documents”.

I then thought to withdraw cash from ATMs, which would cost 1.75%.

It turns out ATMs in Ukraine have very low withdrawal limits - apparently, 3000 UAH - about 80 euro - is the maximum for any one withdrawal.

I wanted about 1800 euro (in UAH).

I found a bank with no charges and a max withdrawal of 5000 UAH, and withdrew eight times from the two ATMs in the foyer - seven times from one, and then I think it was out of 500 UAH notes, and once more from the second.

It looks like the 500 UAH note is the largest available, which is about 12.5 euro.

I think, but I do not know, that this has been done to cut down on the black economy.

What it has also done is neutered a general purpose and flexible system (cash) which can work effectively when other systems fail.

2022-08-23

Air Raid Warnings

I think I’ve figured out how these work now.

Yesterday, as usual, I went swimming - as I walked up to the doorway of the pool building, the air raid sirens went off.

Now, I knew already - I’d been told by the English speaking girl at the pool - that when the sirens go off, the pool is evacuated and I can either come with the staff (and, presumably, other swimmers) to the “safe place”, or leave the building.

So I knew swimming was off - and I was most grateful that this had occurred as I approached the pool, rather than when I was in the pool, and perhaps half-way through a session.

The way it works then is this : the city-wide sirens go off to indicate an incoming raid or missile strike. Also, the air raid warning app kicks off. This causes all public buildings to close - the malls close, the pools close.

Anyone inside has to leave, or head to a safe place.

At the pool, as I made it into reception, I saw the people there heading downstairs, into the basement. I and some others simply went outside and sat on benches and enjoyed the hot sunshine =-)

I think the city-wide sirens are used only to indicate warning; they are not used to indicate the all-clear. I think this must come via the app.

So the pool was closed, and I have no idea how long closure normally lasts for. I bailed on swimming on continued with my errands for the day, and then went back to the pool at about 7pm.

Those errands then took me about ten minutes later to the local big supermarket, and also to the local decent chocolate shop (“Lviv Chocolates” - Lviv seems to be a byword in Ukrainian culture for fancy sweet foods) and then the dry cleaner to pick up shirts. These were all open. Either the all-clear had sounded, or they had not closed, and I am inclined to the latter; if the supermarket had shut, I do not think it would have repopulated so quickly.

2022-08-24

Independence Day

A hundred men died today.

A hundred more were maimed.

Over those men, Nature cast upon the heavens whole in their arcing splendour blue and yellow; the colours of Independence Day.

For those men, for their wives, their children, there are from my heart into my eyes tears.

I am very glad to say I now live in Kyiv, I work remotely into the EU, and all the money I make goes to the military.

2022-08-27

Query Compilation

I am not yet certain, I have not formally investigated, but I am coming to the view that query compilation behaviour has changed. It used to be query off-load would happen once, at the beginning of a query, taking 4 to 8 seconds, and that would be it; no additional delays. Now it looks to me like query off-load is being invoked while a query runs, and at the beginning of each stream. This means compilation can occur many times - each time taking 4 to 8 seconds - and, necessarily, queries will pause while compilation occurs (as the segments they need to proceed do not yet exist).

Back in the old days, compilation was on the leader-node and was serial; about 4 seconds per segment. You could with complex BI queries be waiting many minutes for compilation to complete.

Query off-load was a game changer : now the entire query, no matter what it was, took 4 to 8 seconds.

This is adequate for BI tool use.

However, now, to my eye, the devs have blundered; from a BI tool you easily get 10 segments, and quite possibly get two or three calls to query off-load at 4 to 8 seconds each.

Query off-load is no longer suitable for BI tool use.

2022-08-28

Updates

  1. Bring-Up Times have been updated.

    https://www.amazonredshiftresearchproject.org/bring_up_times/index.html

  2. Cross-Region Benchmarks have been updated.

    dc2.large in ap-east-1 has doubled the read benchmark time (the halving which was applied in many regions has gone away).

    Hmm. Something I’ve just noticed; for dc2.large in ap-southeast-1, the standard deviation for the disk-read-write benchmark is large, often about 2.5s. In other regions, which have much the same mean, the standard deviation is maybe 0.2s (each benchmark runs five times, and the slowest and fastest runs are discarded, the mean and SD come from the remaining three).

    So, broadly speaking, there are two groups of regions; a fast group and a slow group.

    Slow group is; af-south-1, ap-northeast-3, ap-southeast-3, eu-north-1, eu-south-1 and me-south-1 (for dc2-large, but not ra3.xlplus, the only other node type in that region).

    Note also last time I tested, most of these regions also lack query compilation off-load (af-south-1, ap-northeast-3, ap-southeast-3, eu-north-1, eu-south-1 and me-south-1).

    https://www.amazonredshiftresearchproject.org/cross_region_benchmarks/index.html

  3. Price Tracker has been updated, new prices came out, but as ever, prices are unchanged.

    https://www.amazonredshiftresearchproject.org/redshift_price_tracker/index.html

  4. The version tracker shows no change now for two full weeks. This is almost unprecedented, and I suspicious. I’ve discovered this week changes in query execution, with queries executing before segments are all compiled, and along with this are changes in compilation off-load, so compilation now occurs not once at the beginning of a query but can occur once per stream, which is not what you want; each compilation takes about 4 to 8 seconds, and there used to be just one, and now there can be many.

    https://www.amazonredshiftresearchproject.org/redshift_version_tracker/index.html



Home 3D Друк Blog Bring-Up Times Consultancy Cross-Region Benchmarks Email Forums IRC Mailing Lists Reddit Redshift Price Tracker Redshift Version Tracker Redshift Workbench System Table Tracker The Known Universe Twitter White Papers