Jump to content
Daniel D. Teoli Jr.

How does the Internet Archive / Cloud store all that data?

Recommended Posts

They use a SUN data center in California, with a mirrored backup in Egypt. (https://web.archive.org/web/20090326200212/http://www.sun.com/aboutsun/pr/2009-03/sunflash.20090325.1.xml)

As of last year, they had 25 petabytes of information (25,000 terabtyes.)  As someone who helps manage an offline database of around 250tb of assets, I honestly can't imagine the amount of staff and work that goes into managing a live database that large.  

Like most data centers, I would assume they're using  a huge amount of large hard drive RAIDs in servers.  Longevity of drives is probably handled with RAID redundancy and in-depth hardware monitoring to stay ahead of drive failure (BackBlaze actually publishes an interesting report detailing drive performance and failure for their data centers each year: https://www.backblaze.com/blog/hard-drive-stats-for-2018/)  Solid state drives are used as well, but usually only as a small part of the whole that may require higher performance, such as in caches or frequently accessed data.  As solid state storage prices fall, we will definitely see them in use more and more (again, Backblaze also has an article about their use of SSDs now and in the future: https://www.backblaze.com/blog/hdd-vs-ssd-in-data-centers/)

  • Upvote 1

Share this post


Link to post
Share on other sites

Interestingly, it also seems like tape storage may be on the rise, though more popular for longterm and offline storage:

https://www.datacenterknowledge.com/industry-perspectives/how-tape-storage-changing-game-data-centers

https://spectrum.ieee.org/computing/hardware/why-the-future-of-data-storage-is-still-magnetic-tape

 

As someone who's too young to have really worked with LTO tape backups, I'm kind of surprised by the speed performance of tapes: "By 2025, tape transfer rates are predicted to be five times faster than HDDs."

  • Upvote 1

Share this post


Link to post
Share on other sites

Racks upon racks of hard disks.

There's warehouses all over the world which provide storage to places like Amazon, Google and Microsoft for their cloud computing platforms, as well as things like YouTube.

P

Share this post


Link to post
Share on other sites

I've worked with Iron Mountain to setup solutions here in Hollywood. Where they do have some online storage solutions, the vast majority of storage is done on tapes. There are generally a set of backup's made at different locations across the country. Customers pay X amount for online, nearline or offline storage. If you want the files to be accessed immediately, they generally store on raid arrays. If you want the files to be accessed within a few minutes OR if all you're doing is transferring files, then nearline works fine and it's on tape. Then of course offline storage are generally tapes not located in the robotic tape library at all. So it would take a while to get the media back, but it's safely on tape and this is the lowest cost solution. 

  • Upvote 1

Share this post


Link to post
Share on other sites
On 4/1/2019 at 7:55 AM, Adam Froehlich said:

Interestingly, it also seems like tape storage may be on the rise, though more popular for longterm and offline storage:

https://www.datacenterknowledge.com/industry-perspectives/how-tape-storage-changing-game-data-centers

https://spectrum.ieee.org/computing/hardware/why-the-future-of-data-storage-is-still-magnetic-tape

 

As someone who's too young to have really worked with LTO tape backups, I'm kind of surprised by the speed performance of tapes: "By 2025, tape transfer rates are predicted to be five times faster than HDDs."

I thought they would be very slow. I remember back in the day Radio Shack used cassette tapes for early computer storage. That is my only experience with tape. I'd like to get into it for my material, but very $$. 

Share this post


Link to post
Share on other sites
Posted (edited)
20 hours ago, Tyler Purcell said:

I've worked with Iron Mountain to setup solutions here in Hollywood. Where they do have some online storage solutions, the vast majority of storage is done on tapes. There are generally a set of backup's made at different locations across the country. Customers pay X amount for online, nearline or offline storage. If you want the files to be accessed immediately, they generally store on raid arrays. If you want the files to be accessed within a few minutes OR if all you're doing is transferring files, then nearline works fine and it's on tape. Then of course offline storage are generally tapes not located in the robotic tape library at all. So it would take a while to get the media back, but it's safely on tape and this is the lowest cost solution. 

Never heard of a robotic tape library.  I looked it up. Fascinating stuff!

Can you imagine if a hacker infected a monster tape library with a virus? What a mess it would be.

 

https://www.youtube.com/results?search_query=+robotic+tape+library

...in the old days the only robotics we had was juke boxes.

 

See 2.20 min

https://www.youtube.com/watch?v=morLkJDxWyg

Edited by Daniel D. Teoli Jr.

Share this post


Link to post
Share on other sites
14 minutes ago, Daniel D. Teoli Jr. said:

I remember back in the day Radio Shack used cassette tapes for early computer storage. That is my only experience with tape.

There's no comparison.

LTO-8 tapes hold 30TB and read at 750MBps. They're a million times faster than cassettes at least.

  • Upvote 1

Share this post


Link to post
Share on other sites

I have not had much experience with SDD. I have had lots of HDD and only had 1 fail out of a few dozen HDD's. I have had 3 SDD's and 2 of the 3 failed. I was dead right out of the box. The other failed in a couple days.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



  • FJS International



    G-Force Grips



    CineLab



    Paralinx LLC



    Ritter Battery



    Visual Products



    Serious Gear



    Glidecam



    Rig Wheels Passport



    Abel Cine



    Tai Audio



    Broadcast Solutions Inc



    Metropolis Post



    Wooden Camera



    Just Cinema Gear



    Gamma Ray Digital Inc



    New Pro Video - New and Used Equipment



    Media Blackout - Custom Cables and AKS


×
×
  • Create New...