Jump to content

How does the Internet Archive / Cloud store all that data?


Daniel D. Teoli Jr.

Recommended Posts

They use a SUN data center in California, with a mirrored backup in Egypt. (https://web.archive.org/web/20090326200212/http://www.sun.com/aboutsun/pr/2009-03/sunflash.20090325.1.xml)

As of last year, they had 25 petabytes of information (25,000 terabtyes.)  As someone who helps manage an offline database of around 250tb of assets, I honestly can't imagine the amount of staff and work that goes into managing a live database that large.  

Like most data centers, I would assume they're using  a huge amount of large hard drive RAIDs in servers.  Longevity of drives is probably handled with RAID redundancy and in-depth hardware monitoring to stay ahead of drive failure (BackBlaze actually publishes an interesting report detailing drive performance and failure for their data centers each year: https://www.backblaze.com/blog/hard-drive-stats-for-2018/)  Solid state drives are used as well, but usually only as a small part of the whole that may require higher performance, such as in caches or frequently accessed data.  As solid state storage prices fall, we will definitely see them in use more and more (again, Backblaze also has an article about their use of SSDs now and in the future: https://www.backblaze.com/blog/hdd-vs-ssd-in-data-centers/)

  • Upvote 1
Link to comment
Share on other sites

Interestingly, it also seems like tape storage may be on the rise, though more popular for longterm and offline storage:

https://www.datacenterknowledge.com/industry-perspectives/how-tape-storage-changing-game-data-centers

https://spectrum.ieee.org/computing/hardware/why-the-future-of-data-storage-is-still-magnetic-tape

 

As someone who's too young to have really worked with LTO tape backups, I'm kind of surprised by the speed performance of tapes: "By 2025, tape transfer rates are predicted to be five times faster than HDDs."

  • Upvote 1
Link to comment
Share on other sites

  • Premium Member

Racks upon racks of hard disks.

There's warehouses all over the world which provide storage to places like Amazon, Google and Microsoft for their cloud computing platforms, as well as things like YouTube.

P

Link to comment
Share on other sites

  • Premium Member

I've worked with Iron Mountain to setup solutions here in Hollywood. Where they do have some online storage solutions, the vast majority of storage is done on tapes. There are generally a set of backup's made at different locations across the country. Customers pay X amount for online, nearline or offline storage. If you want the files to be accessed immediately, they generally store on raid arrays. If you want the files to be accessed within a few minutes OR if all you're doing is transferring files, then nearline works fine and it's on tape. Then of course offline storage are generally tapes not located in the robotic tape library at all. So it would take a while to get the media back, but it's safely on tape and this is the lowest cost solution. 

  • Upvote 1
Link to comment
Share on other sites

On 4/1/2019 at 7:55 AM, Adam Froehlich said:

Interestingly, it also seems like tape storage may be on the rise, though more popular for longterm and offline storage:

https://www.datacenterknowledge.com/industry-perspectives/how-tape-storage-changing-game-data-centers

https://spectrum.ieee.org/computing/hardware/why-the-future-of-data-storage-is-still-magnetic-tape

 

As someone who's too young to have really worked with LTO tape backups, I'm kind of surprised by the speed performance of tapes: "By 2025, tape transfer rates are predicted to be five times faster than HDDs."

I thought they would be very slow. I remember back in the day Radio Shack used cassette tapes for early computer storage. That is my only experience with tape. I'd like to get into it for my material, but very $$. 

Link to comment
Share on other sites

20 hours ago, Tyler Purcell said:

I've worked with Iron Mountain to setup solutions here in Hollywood. Where they do have some online storage solutions, the vast majority of storage is done on tapes. There are generally a set of backup's made at different locations across the country. Customers pay X amount for online, nearline or offline storage. If you want the files to be accessed immediately, they generally store on raid arrays. If you want the files to be accessed within a few minutes OR if all you're doing is transferring files, then nearline works fine and it's on tape. Then of course offline storage are generally tapes not located in the robotic tape library at all. So it would take a while to get the media back, but it's safely on tape and this is the lowest cost solution. 

Never heard of a robotic tape library.  I looked it up. Fascinating stuff!

Can you imagine if a hacker infected a monster tape library with a virus? What a mess it would be.

 

https://www.youtube.com/results?search_query=+robotic+tape+library

...in the old days the only robotics we had was juke boxes.

 

See 2.20 min

https://www.youtube.com/watch?v=morLkJDxWyg

Edited by Daniel D. Teoli Jr.
Link to comment
Share on other sites

14 minutes ago, Daniel D. Teoli Jr. said:

I remember back in the day Radio Shack used cassette tapes for early computer storage. That is my only experience with tape.

There's no comparison.

LTO-8 tapes hold 30TB and read at 750MBps. They're a million times faster than cassettes at least.

  • Upvote 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...