Jump to content

Why does the ArriScan oversample by 50%, per axis?


cole t parzenn

Recommended Posts

Ahh - I misunderstood. I was under the impression the Arriscan used a full 6k sensor and downsampled to resolutions like 2k or 4k. Instead it's using a 3k sensor, taking two images to make a 6k composite, then downsampling that?

 

Is there a technical advantage to making a 6k scan this way, or is it a result of the lack of 6k sensors at the time the scanner was designed? on its face, it sounds kind of kludgy.

 

-perry

Link to comment
Share on other sites

What I'm not really understanding is how it's making a 6k image from a 3k x 2k sensor. From their web site, on the technical specs:

 

Custom CMOS area sensor with piezo actuator for microscanning
Native resolution: 3K x 2K
Max. resolution with microscanning: 6K x 4K

 

 

The way I interpret this is that the sensor is taking 4 shots of the frame, to result in a 6k x 4k image. There are two ways one could do this, I'd think:

 

1) Move the sensor down 2k or across 3k, depending on the image being taken, stitch them together.

 

However, based on what you said earlier in the thread about it moving a half pixel, it sounds like:

 

2) Take multiple images with a micro-shift in the sensor position, and use those to interpolate a 6k image.

 

Now, that's not the same as blowing up a 3k x 2k image to 6k, which would involve making up a lot of image data. A lot of accurate information can be derived by subpixel changes in an image, so you'd probably get a significantly better interpolation to 6k this way than with a single shot at 3k x 2k.

 

But I'm not understanding how this is truly a 6k image if it's doing that. It's still interpolated.

 

Or am I completely misunderstanding what's going on inside the machine?

 

-perry

Link to comment
Share on other sites

2) Take multiple images with a micro-shift in the sensor position, and use those to interpolate a 6k image.

 

Now, that's not the same as blowing up a 3k x 2k image to 6k, which would involve making up a lot of image data. A lot of accurate information can be derived by subpixel changes in an image, so you'd probably get a significantly better interpolation to 6k this way than with a single shot at 3k x 2k.

 

But I'm not understanding how this is truly a 6k image if it's doing that. It's still interpolated.

 

 

Why do you suggest that this is an interpolation? Data isn't being inferred, it's being directly sampled.

 

At it's essence it's taking advantage of the fact that the image is static in time by trading money spent for time spent - assuming elemental sensor parts - i.e. pixels - linearly correspond to $ and excluding market forces like supply demand etc. a 6k sensor would cost 4x as much as a 3k, but the time spent doing two scans is closer to 2x worse.

 

4 > 2

 

Maybe I got my logic and math backwards somewhere, but I still think it's a simple case of trading 'space' complexity for time complexity - quite common in industry.

Link to comment
Share on other sites

 

Why do you suggest that this is an interpolation? Data isn't being inferred, it's being directly sampled.

 

Assume the image is projected onto a plane that the sensor is focused on. If the sensor is 3x2k, and the image fills the sensor area, and there's a miro-movement in the sensor position between images that it takes of the film frame (all of which is what I'm getting from this thread), then how is a 6k image a direct sample? if the sensor is 3000x2000 and the output is 6000x4000, then it must be using those multiple exposures from slightly different positions to interpolate up a new image. Again, this would be more accurate than a single 3000x2000 upscale, but it's still scaling, no?

 

Or is the sensor taking an image, moving 3000 to the left and taking another, then 2000 down for a third, then 2000 to the right for the last, then stitching? That would produce a 6000x4000 image, directly sampled. If that's the case, then I'm not understanding the 'half-pixel offset' that David Mullen describes, thus my confusion.

 

-perry

Edited by Perry Paolantonio
Link to comment
Share on other sites

  • Premium Member

It's not scaling nor data interpolation, it's an actual 6K scan. Scaling means resizing and interpolation means estimating data that doesn't exist based on existing data.

 

Here's a quick drawing I did of a grid of photosites. I took the first grid, let's say it represent the 3K sensor, and then offset it by half a pixel horizontally, then half a pixel vertically, then half a pixel diagonally -- so four scans of the same piece of film:

 

3Kto6K.jpg

 

The blue circles are the first scan and the red circles are from the second, third, and fourth scan. So you essentially get the SAME thing as if you had built a scanner that had four times the number of photosites. There is no data interpolation, no scaling, it's an actual scan of the piece of film with four times the resolution, which is what a 6K scan is instead of a 3K scan. There is no difference than if you had actually built a sensor with four times as many photosites.

  • Upvote 1
Link to comment
Share on other sites

  • Premium Member

Subpixel repositioning like that is something that's been done quite a bit in cameras, particularly things like the Panasonic HVX100 where the green sensor was slightly offset. It does improve resolution quite nicely (perhaps mainly by allowing better antialiasing) but there probably is an argument that it isn't quite as nice as having the bigger sensor to begin with.

 

P

Link to comment
Share on other sites

  • Premium Member

The query I'd have with it is that the sheer size of the pixels and the optical low-pass filtering ought to mean that each pixel "sees" an area that more or less adjoins the adjacent one (in reality they will cross over slightly, and need to). Without modification, you'd be seeing overlapping pixels. So I wonder if the Arriscan has some mechanical way of altering its OLPF for offset scans.

 

P

Link to comment
Share on other sites

Here's a quick drawing I did of a grid of photosites. I took the first grid, let's say it represent the 3K sensor, and then offset it by half a pixel horizontally, then half a pixel vertically, then half a pixel diagonally -- so four scans of the same piece of film

Yes,

 

and the drawing makes it clear there was a flaw in my logic - you'd need four scans not two.

 

Which means that the time spent is 4 times as much - which in turn implies that the cost of pixel density (in dollars or other associated 'bad' factors) isn't linear. There must be something that makes having larger sensor elements worth trading the extra time for...

Edited by Chris Millar
Link to comment
Share on other sites

  • Premium Member

I'm not sure of the exact method, but when I drew the first grid and when I tried a 1/2 pixel offset diagonally overlaid on that, I realized that it wouldn't be the same as an increase from 3K to 6K, which is four times the data, it would be only twice as much data and become in some ways a 45 degree rotated sensor array like in the Sony F65.

 

So I'd have to do some research to confirm how the Arriscanner goes up to 6K, but it seems like it would have to do four scans at 3K with a half-pixel offset up, down, and diagonally to achieve that.

Link to comment
Share on other sites

 

3Kto6K.jpg

 

 

Are "photosites" adequately represented by the small cicles with much space between? If they are larger then the sensor offsets may produce overlaps rather than just an increased density of circles.

 

Over the complete sensor one could look at or imagine a distribution of photon impacts, or counted photons. Ignoring the filters, one might see a distribution that resembles a monochrome pointillist painting, with the "gaps" between pixels being visible by the varying density of impacts approaching the pixel edge, and no impacts in the gaps.

Link to comment
Share on other sites

For the sake of discussion let's assume two cases:

 

a pixel pitch equal to the sensor size divided by the resolution(s) - i.e. no zero boundary around the pixels

 

Now imagine black and white stripes that are focused on the sensor exactly in line with the sensor pixels - take a sample, you'll see black and white stripes. Offset the sensor by 1/2 pixel pitch, you'd get a grey image...

 

Obviously one sample is 'correct' and the other 'wrong' - key point#1 >> it's up to the randomness of the initial placement of the image on the sensor that determines this.

 

Solution - assume it's the 'wrong' sample every time, do the half offset, then combine images - you'd get black-grey-white-grey-black-grey-white... and so on - key point#2 >> you'd approach result this every time.

 

Pretty much it's allowing the intensity to be districted more accurately over space.

 

Now pixels with surrounding nothingness (like real sensors) - this might be a bit harder to intuitively perceive how this affects the results, but consider the half offset affords the acquisition pixel to actually be exposed to a signals that part incorporates image intensity information it never had access to previously. That's quite a plus.

 

Try it in 1D on paper (pretty much audio actually) - quite interesting.

Link to comment
Share on other sites

  • Premium Member

 


a pixel pitch equal to the sensor size divided by the resolution(s) - i.e. no zero boundary around the pixels

Now imagine black and white stripes that are focused on the sensor exactly in line with the sensor pixels - take a sample, you'll see black and white stripes. Offset the sensor by 1/2 pixel pitch, you'd get a grey image...

Hence optical low pass filtering.

P

Link to comment
Share on other sites

Obviously my drawing is not accurate, it's just a way of showing how a 1/2 pixel offset can allow increased resolution for the scan.

 

I'm just trying to suggest that the physical pixel distribution and the actual distribution of photon impacts may affect what is possible when compounding these displaced sensor outputs. Normally, cinematographers use averaged or counted values for each pixel. But if the photon impact distribution is not uniform over its area, then this may affect the way that displaced sensor methods may work in practice.

Edited by Gregg MacPherson
Link to comment
Share on other sites

David M's construction is best understood with the little circles being little squares of a size so they altogether fill 1/4 the image area. Then the four displaced exposures are equivalent to one exposure made on a image sensor having four times as many square pixels that fill the whole image area. Image sensors normally strive for that high "fill factor" for light efficiency. CCDs can have 100% fill factor, but CMOS sensors always have much less. Arri can afford to use a CMOS with just 25% fill factor because a scanner is not subject to available light. A scanner can simply increase the illumination (until heat becomes a factor). So four shots with the low-fill 3K CMOS simulates one shot with a high-fill 6K CCD. I suspect that the higher effective fill factor offers some reduction in aliasing too.

 

If Arri had used an image sensor with high fill factor the micropositioning trick would be imperfect. Exact 6K can't be derived from four exposures with a high-fill 3K. Proof: imagine a 6K consisting of a checkerboard of 1's and 2's. Then the high-fill 3K always reads 1+2+1+2=6 wherever it is placed. In that case the trick doesn't work at all though it usually works somewhat.

 

Chris M's suggestion that Arri traded 4× time for 4× space (~cost) overlooks that the larger sensor would be slower. Sensors take time to unload the image. I don't know if this time is proportional to the number of pixels, but if so there's no time lost in making four exposures, each with 1/4 as many pixels, excepting the time for the half-pixel motions by the piezo drivers. (How fast are they?)

 

I suspect the ArriScan makes separate R, G, B exposures of each frame. Why would it suffer with a Bayer-pattern sensor? I think it aims to control the R, G, B spectral responsivities by means of an array of umpteen different colored LEDs. Then in 6K mode the Arriscan makes 12 exposures from each frame: 3 colors × 4 positions. The number of exposures can increase further when it is aiming at maximum dynamic range. This is how scanning should be done.

 

It's funny that the question launching this topic has been overlooked. The question was written as if it were well known why the Arriscan oversamples -- e.g., uses a 3K sensor for its 2K scanning and an effectively 6K sensor for its 4K scanning -- and asked why 50% oversampling. I don't think there is any magic in the choice of 1.5×. If Arri could have used an approximately 13K sensor it could have met the Nyquist condition so no OLPF would be needed, and then it could extract a 2K, 4K, or whatever scan from the full optical data. The topic question might better have been put: how much is gained by just 50% oversampling, per axis?

Edited by Dennis Couzin
  • Upvote 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...