Behind the Innovator: Jeff Glatstein, Senior Manager of Cyberinfrastructure, Woods Hole Oceanographic Institution
Welcome to our Q&A series: Behind the Innovator.
Behind the Innovator takes a peek behind the scenes with learnings and best practices from leading architects, operators, and developers building cloud-native, data-driven applications with Apache Cassandra® and open source technologies in unprecedented times.
This week, we spoke with Jeff Glatstein, the senior manager for cyberinfrastructure for the Ocean Observatories Initiative (OOI) program whose PMO is at Woods Hole Oceanographic Institution, a private, nonprofit ocean research, engineering, and education organization based in Massachusetts. OOI is a science-driven ocean observing network that delivers real-time data from more than 800 instruments to address critical science questions regarding the world’s oceans. Funded by the National Science Foundation to encourage scientific investigation, OOI data are freely available online to anyone with an Internet connection.
Here’s what he had to say.
1. Tell me a little bit about your technical background, your role at Woods Hole, key accomplishments, and achievements.
My technical background is in massively parallel computing systems for trading applications for commodities, stocks, bonds, et cetera. I did that for about 20 years and got into data warehousing in higher education, which led me to Woods Hole.
At Woods Hole, I was hired to take over the infrastructure that was seen as having some gaps that needed to be resolved. Over the last two years, we really have been working on making that system far more transparent, far more reliable, and more performant. That's really where our success has been in, and Apache Cassandra is the heart of our particular system.
At Woods Hole, we gather data from about 850 instruments placed in several locations across the oceans. So we have six years worth of data—about 90 terabytes worth of Cassandra data out there for folks to download for free. We’ve been focusing a lot of our efforts on making this work more effectively.
2. What is your experience using Cassandra in terms of ease of use, how it enables you to do your job better, and how it can be flexible and scalable for future uses?
To be fair, I didn't have Cassandra experience until I came to Woods Hole. My expertise was in Oracle, and I think Cassandra has been a bit of a mystery to us. The development team has good experience in how to get the data in. But really, when it came to configuring Cassandra, I think what we've learned over the last two years of taking over this project is that we probably weren't architected the way we really needed to be and weren’t configured the way we really needed to be. So, I would say the first year was not really marked with loving Cassandra. We were more focused on whether this was the direction we wanted to go in.
The DataStax Luna program really turned it around because Cassandra, the tool itself, is very flexible and usable when architected correctly. We are now going down that path of architecting it correctly, and we have seen a vast improvement in our understanding and the throughput in Cassandra now.
The DataStax Luna program really turned it around because Cassandra, the tool itself, is very flexible and usable when architected correctly. We are now going down that path of architecting it correctly, and we have seen a vast improvement in our understanding and the throughput in Cassandra now.
3. What was the impetus that really led you guys to discovering Luna?
Before Luna, we really had no eyes into Cassandra at all in terms of performance. Were we well architected or were we not? I had read a lot of DataStax documentation and saw that they really were leading or at the forefront of Cassandra. Again, coming from an Oracle background, I had access to DBAs. But also Oracle had a lot of tools that we understood on how to measure our performance and determine what our key configuration should be.
We didn't have that feeling with Cassandra. And I think one thing that Luna did bring to us was getting that insight.
At our first engagement, we provided a lot of information to DataStax through a tool that they had us run. And it just opened us up to the world in terms of realizing we had more tables than we probably should. That was news to us that there was a limit. We had known that our node sizes were a little on the large side. We were able to understand that from the skills that we had, but we had no idea that our partitions were too big. We didn't know that our configuration actually varied from node to node or why.
Luna was able to get in and work us through those issues, and it cleared up a lot of mystery there
Luna was able to get in and work us through those issues, and it cleared up a lot of mystery there.
4. What have been some of the other successes or wins that you have seen just since working with Luna?
I think we were able to deal with some of the issues that we saw on our own—particularly in nodes dropping and some issues with our equipment. Then we were noticing that it would take over 72 hours to bring a node back online because there is a particular bug in the version of Cassandra that we have. But we started to drop nodes a lot quicker. At first, it was one maybe every couple of months and then it came to a point where we dropped three nodes in two days. We were getting to the point where we knew we wouldn't be able to operate, and that’s when we contacted Luna support.
And our particular Cassandra cluster had not been bounced in three years—which is not good practice, at least not to the best of my knowledge—and they talked us through bouncing it. And by doing that, it alleviated us from that 72-hour waiting period.
So, now we can put nodes in as we need them or maintain them. But it also explained why our configuration was different across servers. And it also showed that we had a little bit of corruption in some of our files. It did not affect our data, but it was something that affected our performance.
Also, our garbage cleanup was taking way, way too long to complete. We fixed that so that our compaction and garbage cleanup was no longer impacting our performance. We measure that by looking at how much data we have in the queue to ingest into the system. Prior to this issue, we were getting queue alerts more and more often—to the point where we had to change the level at which we would alert on that. Once we went through that period of rebooting Cassandra, getting all the garbage cleanup configuration standardized across all our nodes, that alert went away.
5. Has Luna helped you free up any resources freed up or enabled someone on the team to focus more on other aspects of their role?
Well, we're a very small team. Any time we spend maintaining or searching health problems is time that someone's not spending on moving the system forward.
We absolutely have seen that. For example, I think it took about five hours to track down our configuration issues. What should have taken an hour from the Luna team’s point of view took five hours, and from our point of view would have taken us days just to track down the issues. So right there was a productivity gain.
6. What other technologies are involved in your solution?
Postgres, Python, and Java are our core development basis. We're fully open-sourced, so we use tools like Qpid. You probably have not heard of THREDDS or OPeNDAP or ERDDAP or anything like that, as they tend to be more oceanographic-specific. But that's the level of our architecture. I guess you would call it a bit of a homegrown system.
7. What does the future look like at Woods Hole in terms of your technology solutions? How does DataStax, how does Cassandra, and how does Luna play into that future vision?
Right now, we're moving our data center, so we're building a brand new hardware system. Our Cassandra cluster is going to get a nice new home with solid state disks, more nodes, and we see that continuing on for the foreseeable future.
Clearly, we are going to want to use the cloud more. Whether Cassandra goes out there or not remains to be seen. But we will certainly look to make cloud computing far more effective for us.
8. What are some key learnings that you've picked up on during your two years working at Woods Hole with Cassandra? What advice would you share to others in your shoes that are trying to navigate today's software landscape?
I can tell you what worked for us, and that is being open and listening. I think that's a skill that a lot of technical folks may not have because they get so invested in the solution, and no one wants to hear that the solution is maybe not meeting all of the goals that have been put forth.
I think that has been more important at Woods Hole than anything, which is bringing people together and just talking honestly and listening. And we've made a lot of changes based upon that.
I had one or two technology/language choices back when I started my career. Today, you've got easily eight or nine and then subsets of those, and they tend to have to work together, I suppose, to be just a single solution. So, you really have to listen and be data- and business-driven as opposed to, “Wow, that would really be cool if we did it that way.”
9. Any other thoughts or comments that you wanted to share about working with DataStax, or Luna, or Cassandra in general?
When you have such a small team, it's always nice to get someone to come in and do what I call a health check, which is what ultimately ended up happening with Luna.
When you have one or two people and you don't have deep experience in this, you do want that level of comfort. And I would encourage anybody out there—using any technology where you may not have as much experience—to access expert support when it’s available.
I recommend anybody using Cassandra who is unsure if they’re getting the most out of it to try Luna. It’s been great.
I recommend anybody using Cassandra who is unsure if they’re getting the most out of it to try Luna. It’s been great.