Project Nautilus emerged as Dell’s Streaming Data Platform

Disclaimer: I recently attended Storage Field Day 19. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Yesterday, Dell EMC’s Project Nautilus emerged as Dell EMC’s Streaming Data Platform. I wrote this post based on the presentation we were given at #SFD19, and decided to keep the Project Nautilus name throughout my report.

I love it when presenters tell us what world they are coming from, and tie our shared past to new products. Ted Schachter started his career at Tandem doing real-time processing with ATM machines. But as he pointed out, these days there is the capacity to store much more info than he had to work with back in his Tandem days. I loved how he drew a line from past to the present. We really need more of that legacy, generational information shared in our presentations to help us ground new technologies as they emerge.

A screen shot of a person

Description automatically generated
From the Project Nautilus #SFD19 presentation

Data Structures are Evolving

Developers are using the same data structures they’ve used for decades. There is an emerging data type called a stream.  Log files, sensor data, and image data are elements you will find in a stream. Traditional storage people think in batches, but the goal with streams is to move to transacting and interacting with all available data in real time, along a single path. By combining them all these data types into a stream you can start to observe trends and do things like the ones shown on the slide above.

Since the concept of streams is pretty new, the implementations you’ll see now are DIY. There are “accidental architectures” based on kafka. Kafka is an open source Apache platform for building real-time data pipelines and streaming apps.

Project Nautilus Emerged to Work with Streams

Project Nautilus from Dell EMC Storage is a platform that uses open source tools. They want to build on tools like Spark and Kafka) to do real time and historical analytics and storage. Ingest and storage is via Pravega. Streams come in, they are automatically tiered to long-time storage. Then it is connected to analytic tools like Spark and Flink (which was written specifically for streams). Finally, everything is glued together with Nautilus software to achieve scale (this is coming from Dell EMC Storage after all), and is built on VMware and PKS. More details were to be announced at MWC, so hopefully we’ll have some new info soon.

Real Talk

Product Nautilus emerged as a streaming data platform. This is another example of Dell EMC Storage trying to help their customers tame unstructured data. In this case, they are tying older technology that customers already use to newer technology – data streams. They see so much value in the new technology that they created a way for customers to get out of DIY mode, while at the same time taking advantage of existing technical debt.

This is also a reminder that we’re moving away from the era of 3-tier architecture. There have been hardware innovations, that has led to software innovations. We are going to see more and more architectural innovations. Those who open to learning how tech is evolving will be best positioned apply the lessons learned of the past couple of decades.

How are you learning about the new innovations?

Please follow and like us:
error0

Taming Unstructured Data with Dell EMC Isilon

Disclaimer: I recently attended Storage Field Day 19. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

Taming Unstructured Data

A common thread discussed by almost every vendor we visited was the issue of taming unstructured data. Vendors are building products that their customers can use to turn massive amounts of unstructured data into information. They all told us that their customers are demanding intelligent insights that are available anytime accessible anywhere. The groups from Dell EMC Storage were no different, they are also tackling this problem.

Four storage product teams came to chat with us during SFD19: Isilon, the Project Nautilus team, a team building devops tools, and PowerOne. What’s interesting is that in addition to tackling the challenge of taming unstructured data, each of these product groups are working on the innovations to traditional storage products that enable them to integrate with products and services we usually think of as with cloud native solutions, for example Kubernetes.

I’ll tackle each of the areas that I mentioned above, and this post will concentrate on Isilon.

Taming Unstructured Data with Isilon

Isilon Systems was founded in 2001 and acquired by EMC in 2010. Dell EMC Isilon is a scale-out NAS that is run on a file system called OneFS. The team has even won an Emmy for its early development of HSMs (hierarchical storage management).

Isilon’s definition of scale out is policy-based management. Every node is independent and able to access data coherently. The files aren’t being split, but you can keep snapshots in a diff tier. Users write the policies and the system takes care of it from there.

A screenshot of a cell phone

Description automatically generated
via this slidedeck on slideshare

CloudIQ (Dell EMC’s SaaS infrastructure management tool) now supports Isilon. They also acquired a tool called ClarityNow which is included with an Isilon license (as is CloudIQ) although you are charged for non-DellEMC storage.

OneFS Gets Data Closer to Cloud Compute

Isilon OneFS is also available to run with compute in the public cloud. Dell EMC partners with service providers to offer Isilon OneFS on Dell EMC metal at their co-los that are located close to public cloud providers. It’s offered as a SaaS service and is great for current on-premises Isilon customers who want to extend their Isilon implementation to the cloud for  DR, replication, or even to perform new types of compute like machine or deep learning.

But *why* would customers want to do this? If you’ve stored your unstructured data in an Isilon for even 10 years, that is a tremendous amount of data gravity. It’s going to be hard to move this data to the cloud, even if the services and tools you’d like to use are there. Isilon’s OneFS structure allows you to extend this data to other locations, and if the locations are connected via a fast pipe in a co-lo center to a cloud, you can design to take advantage of the best of both worlds.

Real Talk

This is a great example of how traditional storage product teams are working with cloud product teams to create offerings to support the customers who are writing apps and taming unstructured data. Customers realize to do that, they have to go beyond polarizing architectural attitudes like “everything cloud” or “cloud is evil”.

These customers understand that when it comes to taming unstructured data, the devil is in the details. It is still the responsibility of the architect to understand what you’ll be signing up for with any of these types of solutions. Ask lots of questions, and weigh you’re the risks and benefits to be sure this type of solution will work for your organization.

Please follow and like us:
error0

Tiger Technology Brings the Cloud to You

Disclaimer: I recently attended Storage Field Day 19. My flights, accommodation and other expenses were paid for by Tech Field Day. There is no requirement for me to blog about any of the content presented and I am not compensated in any way for my time at the event. Some materials presented were discussed under NDA and don’t form part of my blog posts, but could influence future discussions.

The first presentation of Storage Field Day 19 was Tiger Technology. They are a data management company that has been around since 2004, mainly providing solutions primarily for the media and entertainment industry.

This industry is interesting to modern storage because of their application requirements, in particular video. These applications are usually mission critical, and require high bandwidth and low latency. Because these applications are so diverse, there really isn’t a standard. One requirement they do all have in common is that they are intolerant of data loss. Think of video games suffering lag, or a live sporting event dropping frames or even pixels – these are just not acceptable performance in this industry.

The Tiger Technology team took us on the journey of how they built their new Tiger Bridge offering. Tiger Bridge is a cloud tiering solution for Windows (they are working on Linux) that brings cloud storage to current (and legacy) workflows in a way that is invisible to your workers.

Tiger Technology’s Journey to the Tiger Bridge

The customer problem that took them on their journey to create Tiger Bridge was surveillance for an airport. The airport wanted to upgrade their surveillance systems. They had 300 HD cameras with a retention time of 2 weeks and wanted to scale within 3 years to 10,000 4K cameras that would have a retention of 6 months. Tiger Technology computed that the capacity for this project would be ongoing at 15 petabytes of data.

Tackling this problem using standard file systems would be prohibitively expensive, not to mention that it wasn’t even possible to get Windows to that capacity at the time they started.  They knew object storage would work better. Because of the security implications, other requirements were no latency or BW impact, no tamper point, software only, and scalable.

If you think about surveillance cameras, you need a way to keep the data on-site for a while, then you need to send the data someplace that doesn’t cost as much to store it. But you need to be able to bring that data back with fidelity if you need to check the videos for something. These customer challenges are how they came up with the idea for Tiger Bridge.

What is Tiger Bridge?

Tiger Bridge is a hierarchical storage management (HSM) system. It installs in less than five minutes on a server. The agent installed on the servers is a Microsoft Filter Driver and sits between the application reads and writes and target storage.  Since it is integrated with the file system as a filter driver it also falls under Active Directory control, which is great for existing workloads and policies.

With Tiger Bridge, files are replicated and tiered automatically based on policies set on last access time and/or volume capacity. The agent does the tiering work in the background, so sending or retrieving the file from the cloud, even cold cloud storage, is transparent to the user.

Via the TigerBridge website

The team focused on providing this seamless experience to applications that are hosted on the Windows platform. Since they wanted this to also work for legacy apps, one thing they had to figure out is how to use all the commands that are common in a file system that aren’t replicated in the cloud, things like lock, move, rename, etc. They also wanted to support all the cloud storage features like versioning, soft delete, and global replication, since applications written for the cloud require these features.

The example they gave of bridging cloud and file system features was rename. You can rename any Windows file, no problem. But rename isn’t available on public cloud systems, you have to do a copy. For a couple of files, that’s probably no big deal. But if you rename a folder with lots of files in it, that could be a huge rename job. It may take time, and it will probably get expensive.

Their solution keeps track of where the files are, and any changes that have been made. This solves the problem of data being rendered useless because it’s no longer associated with its original application, a common issue that brings on lock-in anxiety. Files under the Tiger Bridge control maintain a link with the file system on premises and the public cloud. Users never know if they are hitting the data on premises or in the cloud.

Check out the demo from a user perspective:

What does Tiger Technology do for users?

What this means is that a user on their laptop can use the Windows file system they are familiar with, and the agent handles where the file actually is in the background.  Administrators can make rules that tier the data that make sense to the business. It allows organizations to use the cloud as an extension of their data storage.

Other use cases are disaster recovery. Having a location like the cloud so you can have a backup of your data in a different location without having to manage another site or tapes is a very attractive use case. Since it is so easy to bring files back from the cloud, Tiger Bridge is able to handle this use case as well.

Real Talk about Tiger Technology

 I think this is the year we’re going to see a lot more solutions bubble up that truly bridge on-premises and the cloud, and I think we’ll seem them from older companies like Tiger Technology. These companies understand application requirements and the technical debt that companies are battling with, and they are finding ways to make the cloud model fit into the reality of their customers’ current realities.

The Tiger Technology presentation reminded me of something we used to say at EMC: a disk, is a disk, is a disk. Users, and applications, don’t really care where the disk they are writing to is located, who manages it, and what it costs. They care about their application being easy to use, low latency, and security. Tiger Technology has figured out how to make that old storage saying work for public cloud and legacy applications.

What do you think? Let us know in the comments!

Please follow and like us:
error0

Is storage still relevant?

storage field day

Disclosure: I was invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in the Silicon Valley CA. My expenses, travel, accommodation and conference fees were covered by GestaltIT, the organizer. I was not obligated to blog or promote the vendors’ technologies. The content of this blog is of my own opinions and views.

Is storage still relevant in today’s cloud and serverless environments? At Storage Field Day 19 we spent several hours with Western Digital, and heard from ten different presenters. Did they show us that storage is still relevant?

Hardware Must Innovate for Software to Innovate

I think the industry often forgets that software innovation is impossible without hardware innovation. We’ve seen some pretty amazing hardware innovations over the last decade or so, and hardware companies are still at it.

You may be asking: how is an old hardware company able to keep up, let alone still be innovating? Well, Western Digital has 50 years of storage experience, and they are still innovating. Their heritage is highlighted in this slide.

Western Digital’s 50 year heritage via https://www.youtube.com/watch?v=Lqw3_HgiA9o

Western Digital is looking at how to solve the data storage challenges for emerging workloads. They already have tons of experience, so they know that the data must be stored, and that more data is being created now than ever before.

More data is being created today than ever before, and it all needs to be stored so it is available to have compute applied to it. Compute is what turns the data is turned into actionable information. But there is so much data now – how should it get stored? How will it be accessed? It’s becoming pretty obvious that the old ways of doing this will not be performant, or maybe not even scalable enough.

One workload they talked about throughout many of the presentations was video. Just think about what kinds of devices that now create streams of video. IoT devices, survellance cameras, cars, the general public, etc. Much of the new types of streaming video is being created at the edge. The edge cases are so diverse that even our understanding of “edge” may be antiquated.

So is storage still relevant? Maybe not the type I came up on – SANs and NASs. But the next evolution of storage has never been more relevant than now.

Composable Infrastructure

Western Digital also discussed composable infrastructure, and how technologies such as NVMe over Fabric make composable infrastructure possible. Don’t worry if you have no idea what I’m talking about – the standards for NVMe over Fabric weren’t pulled together until 2014, and the standard became real in 2016. Also, hardware standard boards are so peculiar – they don’t use the NVMe acronym, they use “NVM Express”. This makes it hard to find primary source information, so keep that in mind when you’re googling.

What can NVMe over Fabric do for composable infrastructure? First, let’s answer why would you need composable infrastructure?

Western Digital’s Scott Hamiliton walked us through this. First of all, new types of applications like machine learning and deep learning need the data to be close to where the compute is happening. Even after considering tradeoffs that must be made because of data gravity, traditional architecture slows things down because resources are locked in that traditional stack.

Composable infrastructure takes the resources trapped in traditional infrastructure, breaks them up and disaggregates them. After that’s done, the resources can be recreated into the leanest combination possible for a workload, virtually composed, creating a new type of logical server. The beauty is this can then be modified based on the dynamics of a workload.

According to Hamiliton, Western Digital believes NVMe will the foundation of next-gen infrastructures, and that eventually ethernet will be the universal backplane. It was an interesting session, check it out for yourself below.

Western Digital at Tech Field Day via https://www.youtube.com/watch?v=LuRI1TlBJgA

Zoned Storage

Western Digital is also championing the Zoned Storage initiative. This will be part of the NVMe standard. Zoned Storage creates an address space on disk (HDD or SSD) that is divided into zones. Data must be written sequentially to a zone, and can’t be overwritten sequentially. Here’s Western Digital’s explanation:

[Zoned Storage] involves the ability to store and retrieve information using shingled magnetic recording (SMR) in hard disk drives (HDDs) to increase the storage density and its companion technology called Zoned Name Spaces in solid state drives (SSDs).

via https://www.westerndigital.com/company/innovations/zoned-storage

Why does the industry need this? According to Swapna Yasarapu, Sr. Director of Product Marketing for Western Digital’s Data Center Business Unit, we’re moving into an era where large portions of unstructured data are being created. All of this data can’t be stored via traditional methods. Additionally, unstructured streams come from IoT edge devices, video, smart video, telemetry, and various other end devices. Many of these streams must be written sequentially to unlock the information the data contains.

Finally, this is an open source initiative that will help write this data in a more practical way for these types of data streams to HDDs and SSDs.

Watch the entire presentation here:

Acronyms as an innovation indicator

One way I can tell when there is innovation is when I come across acronyms I don’t know. After 3 years focusing on virtualization hardware, I found myself having a hard time keeping up with the acronyms thrown at us during the presentations.

The good news is that some of these technologies are brand new. So much for storage being old school! Plus, can you imagine what apps are waiting to be written on these new architectures that have yet to be built?

Here are the acronyms I didn’t know. How many can you define?

  • TMR: tunneling magnetoresistance
  • TPI: Track Per Inch (disk density)
  • PZT: Piezoelectric actuator (see this earlier Storage Field Day post)
  • VCM: Voice Coil Motor (see this video )
  • SMR: Shingled Magnetic Recording
  • SSA: Solid State Array
  • ZBC: SCSI Zoned Block Commands
  • ZAC: Zoned ATA Commands
  • ZNS: Zoned Named Storage

Is Storage Still Relevant? Final thoughts

I think you know my answer on the questions is storage still relevant: of course! We are just beginning to create the standards that will issue in the real digital transformation, so there is plenty of time to catch up.

Please follow and like us:
error0

Storage Field Day 19: Getting Back to My Roots

storage field day

I’m excited that I have been invited to be a delegate at Storage Field Day 19. This is a little different than the Tech Field Day I attended in 2019, because the focus of all the presentations at this event is data storage.

I am looking forward to this because I am a storage person. My career started as a Technical Trainer at EMC, I was a storage admin for a pharma company. I went back to EMC to develop technical training, I then went to work for Dell Storage, and then Inktank (a startup that provided services and support for Ceph). I guess you could say storage is in my blood, so Storage Field Day should be lots of fun.

What to expect at Storage Field Day

Here are the companies we’ll be visiting (in the order they will be presenting), and what I’m looking forward to hearing about from them. Remember, you can join in on this event too by watching the livestream and participating in the twitter conversation using the hastag #SFD19.  You can @ me during the livestream and I can ask a question for you.

Disclosure: I am invited by GestaltIT as a delegate to their Storage Field Day 19 event from Jan 22-24, 2020 in Silicon Valley. My expenses, travel, accommodation and conference fees will be covered by GestaltIT, the organizer and I am not obligated to blog or promote the vendors’ technologies to be presented at this event. The content of this blog represents my own opinions and views.

Tiger Technology

The first presentation we hear will be from Tiger Technology. Just looking at the website, they claim to do lots of stuff. When I look at their About page, they’ve been around since 2004 “developing software and designing high-performance, secure, data management solutions for companies in Enterprise IT, Surveillance, Media and Entertainment, and SMB/SME markets”. They are headquartered in Bulgaria and Alpharetta, and since my mom was born and raised in Alpharetta, they get extra points.

Skipping to their News page, it looks like they have a new solution that tiers data in the cloud. I’m looking forward to hearing how they do that!

NetApp

NetApp talked with us at TFD20 (my blog review of that presentation). They talked to us then a bit about their flavor of Kubernetes, and the work they are doing to make it easy for their customers to have data where they want it to be. Hoping they do a deeper dive on CVS and ANF, their PaaS offerings for the current public cloud offerings.

Western Digital

Western Digital has presented at previous Tech Field Day events, and have acquired many companies who are Tech Field Day presenting alums. The last time they presented back in February 2019 they talked about NVMe, and I love that topic.

One thing I think that doesn’t get enough attention is the incredible innovation that has happened over the last several years in storage hardware. The software is now catching up, and apps will follow. So there is cool tech stuff happening on prem too, not just in the public cloud domain.

I peeped their twitter account, and they have interesting things they are showing this week at CES. Like this 8TB prototype that looks like a cell phone battery bank.  That would be a pretty sweet piece of swag! 😊

Infrascale

This will be Infrascale’s first appearance at Storage Field Day. Their website says what they do right up front: they have a DRaaS (Disaster Recovery as a Service) solution that fails to a second site, booting from an appliance or the cloud.

After storage, the biggest time I’ve spent in my career has been with data protection and disaster recovery, so I’ll be looking forward to this presentation as well. Really looking forward to hear about how this solution can included in an architecture.

Dell EMC

Since I’ve worked in storage at Dell and EMC, and I’m just coming off a tour at VMware, of course I’m excited to sit in on presentations from my Dell Technologies federation homies! There will be presentations on Isilon and PowerOne, but the one I’m most curious about is one on DevOps.

Komprise

Komprise has presented at Storage Field Day before (in 2018). They are a data management and tiering solution. At AWS re:invent they unveiled a cloud data growth analytics solution. I hope we hear about that.

WekaIO

WekaIO’s  has presented at Tech Field Day a couple of times before. They have a distributed storage system for ML/AI, it looks like they directly access NVMe flash drives. It looks like they also have a solution on AWS. So this should be an interesting conversation. I’m just hoping we don’t have to listen to a “what is AI story” before they get to the good stuff.

Minio

This will be Minio’s first presentation at Tech Field Day. Minio sells high performance object storage. One of the other Tech Field day delegates, Chin-Fah Heoh, has already written a blog post about how Mineo is in a different class than other object storage providers. I’m really looking forward to this presentation.

Please follow and like us:
error0