Skip to main content

Webinar: Starfish—Taking the Meh Out of Metadata

April 14, 2021

Presented by Jacob Farmer, Founder and Chief Evangelist, and Peter Galvin, Principal Solutions Architect and GUI Product Manager, Starfish Storage 

Starfish is the most sophisticated platform for unstructured data management. It gives all the stakeholders—management, storage administrators, content creators, and content curators—visibility to a rich array of both current and historical file information. In this session, we will explore how Starfish can enrich data to drive value for gateway users and simplify data management for gateway administrators. 

Jacob Farmer is the Chief Technology Officer of Cambridge Computer, a position he has held since co-founding the company in 1991. Jacob has been a familiar face in the data storage industry for over 25 years. He has written for a number of trade publications and has been a featured speaker at many conferences and educational events, most notably USENIX, LOPSA, and SNIA.

In the data storage industry, Jacob is best known for authoring best practices on enterprise data protection and for helping numerous early-stage companies define their target use cases and establish their first customers. In academic circles, he is highly regarded for his work in defining best practices to manage the life cycle of scientific research data and for identifying novel solutions to reduce costs and streamline operations related to digital preservation. Jacob is a graduate of Yale University.

Webinar Q&A

Q: How does the time travel interact with moving files over time? You can’t necessarily access moved data, right? Or can it be “restored” from where it was moved? Asking about the actual files, not the historical metadata.
A: To further clarify - we are a metadata management software product, not a file system or data storage product. So your files/objects are yours and stay where they currently are unless you ask Starfish to, for example, copy or move the data. Then we’ll put it onto one of your file systems or your cloud/object stores.

Q: Can jobs be triggered based on sensing that certain files are added or updated somewhere? And apart from sensing is there scheduling for jobs, or does one use e.g. cron to call the Starfish API to start jobs?
A: If Starfish moves the files, we know where we put them and can bring them back. We will try to show that in the demo if we have time. When we scan a file system? Yes, you can then ask Starfish what files are new, and do things to those files. Or you could create a signal file to say, “Hey, the instrument is done with this run, Starfish it’s okay to process them now." The time of detection depends on how often you have Starfish run scans on a given file system (it’s configurable). On Lustre and GPFS we can listen to events. Others (currently) we scan...

Q: So it seems that “scans” and “signal files” are terms Starfish uses?
A: "Scans" is a Starfish term. “Signal files” is a generic idea that you can create a file to indicate that says data is done being written.

Q: Do file systems have to be "public"? Otherwise, how does Starfish gain access?
A: We are a general-purpose unstructured data management system. So if you have files in a file system we can manage them. Separate from if they are public or private. You run us on site and point us at file systems, etc. We could, for example, copy files to some public gateway if you want, or use us for anything else...

Duke Health did a webinar to explain how Starfish helped them save over $1 million by enabling researchers to manage their own files and objects. The recording is available here:

Anyone with questions may contact Peter Galvin at

Webinar Slides