Aaron Bray, Co-Founder & CEO of Phylum

Tony Zayas 0:04
Hey, everybody, and welcome to the tech founders show. It’s Tony Zayas, joined by Andy Halko. And on this show, you know, we dive into some really interesting cutting edge technologies that are out there and here the stories from founders who are doing amazing things in this space. So, Andy, how you doing today?

Andy Halko 0:23
I’m doing fantastic. How are you, Tony?

Tony Zayas 0:25
Awesome. Yeah, dude.

Andy Halko 0:29
I’m excited for today’s show, another great tech founder. We’ve had some really interesting conversations the last couple of weeks, and then this one will be great, too.

Tony Zayas 0:39
For sure. So without further ado, we have Aaron Bray, who is the founder of phylum and phylum helps you uncover what’s lurking in your software’s dependencies. So we’ll talk all about that stuff, malware, backdoors, all that kind of stuff. Let me bring Aaron on. And we can jump right in. So hey, Aaron, great to meet you. Thanks for taking the time to join. And we’d love to hear what you what you guys are doing over at phylum, I would love to hear about the business just to get started. What you guys do and kind of go from there?

Aaron Bray 1:14
Absolutely. So great question. I guess the best way to sort of summarize that and kind of speak to what it is that we do and what makes us different and unique is we look at essentially, we shifted the conversation away from what most companies in this sort of devsecops space, specifically centered around package competitor analysis. You know, most of them, essentially focus on giving you back a software bill of materials and a list of vulnerabilities and, you know, commercial license viability issues that exist in packages you’re using from the open source world. And well, those things are certainly very important for both illegal and risk compliance perspective, we go much further than that, we really dig deep and analyze the entire software supply chain. So we analyze information about the author’s themselves the source code inside of the packages. And you know, we dig deep to hunt for effectively for attackers and malicious code, in the dependencies that are being used by organizations large and small.

Tony Zayas 2:21
Very interesting. I was just gonna ask, Aaron, where did what’s the origin story? Where did you know? Where did the idea come for phylum? And how did you guys start the business?

Aaron Bray 2:33
Great question. So one of my co founders, Lewis and I did some consulting on a contract a couple years back. And the organization we were working with, was really trying to get a good understanding of what their software supply chain look like. And you know, what we were working on that project, we found that there really weren’t any products that really spoke to the issue. at scale, in the commercial space, there were a lot of products that talks about software supply chain security, and, you know, giving you insight into into the open source you’re using. But for the most part, those products kind of fell flat when we actually dug under the hood and looked at what they were doing. And so fast forward a few years, we’ve seen just a pretty dramatic shift in the landscape of open source and third party packages and how they’re being used in in the wild. So, you know, the ecosystems themselves have grown massive. So if we look at JavaScript, which is one of the dozens of these ecosystems, it’s gone from about 4500 packages six years ago, to about a million and a half today. And it continues to grow at a rate of about 1000. net new packages every day. And really, the consequences of this is now there’s more packages being pulled into projects than ever before. And there’s more effectively anonymous authors that are contributing code that’s being pulled in and use in these in these projects. And really, it’s all untrusted. And there have been just a steep increase in major incidents and security issues, like you know, people inserting malware backdoors crypto miners, and what have you over the last couple of years. So we figured that the market was right for for a product like this, and we launched last year.

Tony Zayas 4:21
Very cool.

Andy Halko 4:23
So, you know, just for those out there that may not be familiar, just that to help, maybe, you know, package this and you can help me as well. But, you know, if you’re building an application, you’re going out there and finding a lot of different pieces of software that you’re bringing in what you’re calling packages. And it’s really making sure that the people that wrote that software were legitimate, the code is legitimate. There’s not risk of security and these other pieces, and it’s really like you said that supply chain piece of, you know, software, which I don’t think people think of very often.

Aaron Bray 4:59
Absolutely And I mean, just to sort of quantify this, we looked at react at the very beginning of this to just to see how bad the issue was for a specific piece of software. And what we found was that if you take and download all the dependencies for react, and all of those dependencies, dependencies, and the dependencies of those dependencies, because every package you import, has packages that it depends on, and those packages, in turn have additional packages they depend on, you end up with something like 7000 pieces of software that end up coming along with react. And the really interesting consequences of this is you don’t actually get all 7000 of those packages at once you get different sets of those 7000, depending on you know what other packages you have installed

Andy Halko 5:47
So I’m kind of curious to dive into what you know, for someone that’s building an application, what’s the real world risk? Like what, you know, are there scenarios that you can describe that have happened to organizations that are not paying attention to you know, where their packages are coming from, and what’s being wrapped into them and how they were built, etc.

Aaron Bray 6:10
Absolutely. So over the last couple of years, there have been, there have been a ton of packages that have been taken down every year for exhibiting essentially malicious behaviors. So a few good examples of that, or Event Stream, which happened maybe two or three years ago. It was very popular package in the JavaScript world, it was upstream from a lot of major projects. And effectively what happened is an account compromise of one of the maintainers led to some malicious code that would steal cryptocurrency being inserted upstream from some of those major patches. And so all of the developers essentially who were using that package and packages it depended on or that depended on it, for software development, were effectively compromised. There are also about, you know, a good chunk, I’d say, normally, probably 1000 to 2000 packages a year. But you know, just in the last month, we saw about 5000 packages get taken offline, across a couple different ecosystems for doing this. There are a lot of attackers that will exploit something called typo squatting, where they’ll take a popular package that’s used by a lot of people across a pretty broad spread of, of code bases. And what they’ll do is they’ll take this popular package, make a copy of it, and add a little something extra, it might be malware, it might be a backdoor, they might just re upload it with a very similar name. So they might, you know, we’ve seen them transposing characters or, you know, just name it something very similar, that would be confusing to a developer or something that a developer may inadvertently download by just accidentally typing the name of it. What will happen is, they’ll start getting hundreds to 1000s of downloads a week with this package after me uploading it, because, you know, there’s just such a large volume of people building software packages that a few of them will integrate and craft along once.

Andy Halko 8:09
Again, I remember hearing a story a couple years ago about a developer that just became unhappy with the community and like, pulled their package, and it completely affected like millions of different, you know, software applications out there. Absolutely, done two or three incidents of that.

Now, is there for a scenario like that, is there a way that you’re looking at, you know, constantly kind of monitoring, and making sure that people, you know, not just know, the background, but you know, see what’s coming?

Aaron Bray 8:43
Absolutely. So what we essentially do is, instead of looking at just vulnerabilities and license issues, we’ve split the risk that we consider more broadly into about five different domains. So we look at, you know, vulnerabilities, license issues, so those are certainly components of the of the supply chain and risk it represents. But we also look at authors and author reputations, and so some of that might be, does this package only have a single maintainer, and if that maintainer gets mad, they might pull it off line, just as you mentioned, or there have been at least a couple of cases we’ve observed over the last few months where a major package only had one maintainer, and that maintainer which j so then the package is now effectively abandoned in place. We also consider things like malicious code, as I mentioned before, so we have some heuristics and, and machine learning models that help us identify those sort of type of squatter packages that I mentioned earlier, as well as some other situations like, you know, malware droppers and things that have historically you know, patterns of behavior that power is historically exhibited. And we also examine the risk that the authors themselves bring, so, you know, does it appear that it office account has been compromised, for example, based on their commute behavior and deviations from that normal behavior. And, you know, some other factors that we also take into account.

Andy Halko 10:11
So there’s definitely isn’t a manual process you’ve developed, like you said, kind of machine learning AI that really, you know, take signals from these pieces.

Aaron Bray 10:22
Absolutely. And a lot of that really boils down to the fact that the ecosystem has grown so large, and continues to evolve so rapidly, that there’s really no way to manually police it anymore. Even if you manage to go and you validate every package in the entire ecosystems, which now, you know, there’s 10s, to hundreds of millions, if you count for all the different versions of every package, then you still, you know, you’ve already lost game as soon as you finish that audit, because a number of those packages have already updated.

Andy Halko 10:57
Yeah, how to be I’m kind of curious with the machine learning always how people will, you know, how do you start that in, in in a product like this? You know, where do you start the build, because I think a lot of people think about like AI and machine learning, it’s big topic. And I’ve actually talked with a number of founders, it’s like, okay, when you’re starting at zero, and you don’t have anything, how do you start building a machine learning, you know, solution, with no data and a lot of cases?

Aaron Bray 11:29
Great question. So, one of the great things is we actually have some data in this case, there’s, you know, at least a few academic articles that have been published over the last few years that have curated essentially, some samples of you know, the large sample data set. So previous malicious packages discovered in the open source ecosystem. The other thing, I think that that maybe makes our solution a little bit unique, is we’ve taken a slightly different approach for most of the other products in this space in this, we’re not so interested in trying to give you a thumbs up thumbs down about the risks that are in your, in your software supply chain, but rather, give you sort of a scores by, you know, looking for effectively outliers and things of that nature, which, you know, if you think about machine learning, you know, broadly there’s, there’s generally two models of building such models, right? There’s supervised models, which you have to sort of train and tell them, this is, this is A and this is B, and C, and you have to have a label data set. There’s also unsupervised learning, which typically allows you to do things like identify outliers. And so we’re more focused, we’re more focused on the latter, although there are certainly some cases where we’re doing the former. And there are other things that we’re doing that really aren’t machine learning in the traditional sense we’re taking and tying together a collection of models and heuristics based on past bad behavior or insights we’ve gathered by looking at the broad data set at the entire software ecosystem effectively, you know, to be able to weigh in on things that look like they may be malicious or bad.

Tony Zayas 13:11
Aaron, I’m really curious to hear, obviously, you know, your view and others that are out there advancing technology to prevent this, you know, the malicious packages that are out there and the bad actors. What are the other side? Do you guys see happening? Because I feel like it’s a cat and mouse game, that, you know, you’re always catching up to them. But do you see, you know, what, those malicious packages are looking? Like is that getting more advanced? Look like and how do you combat it? It seems like, pretty crazy challenge.

Aaron Bray 13:44
Great question. So I think that that is a very valid point, it’ll always be a cat and mouse game between the people who are building defensive tech and the people who are trying to break into user systems. I think the that at this stage, the people who are playing in this realm, you know, there effectively hasn’t been much automation in this space relative to say, you know, the input world where you have a lot of tech based around like EDR solutions, cloud analysis is sort of similar components. And so one of the things that, you know, that we’re really focused on is taking a lot of new sort of lessons learned from the endpoint and desktop world to this new frontier of packages in the software supply chain. And, interestingly enough, there are some significant advantages that we have by sort of playing in this space versus versus that one, where all you’re really looking at is an OK binary. You have a single snapshot in time and functionality, and you have to try and reason about very quickly, you know, whether or not this thing is good or bad, or you know what its intent and that’s a very difficult problem. But with in our case, we have you know, not only access to All the source code and author information and everything else. But we have the entire, in most cases, the entire development history from beginning to end for all the packages we’re looking at. And so we can make a lot more inferences about, you know, how things have changed over time? And what its intent might have been? If that make sense.

Andy Halko 15:20
Yeah, for sure. So how did you guys go from, you know, an idea to a product? What was the process of kind of ideating? And then creating, and then even getting into, you know, finding customers?

Aaron Bray 15:38
Great question. So, I, and both of my co founders who have spent collectively, a long time working in the security space, I’ve been in this space for probably about 15 years now, in various capacities. You know, my co founder, Lewis, who I mentioned before, and I sort of started out in on the federal government side. I spent quite a few years there. I ended up working with my other co founder, Pete, who kind of came up on the commercial security research and consulting side, at a at another startup that I worked at a couple years back. And really, you know, at the, at the very beginning, when we first kind of got this idea and got exposed to the problem space, we were really sure if the market was right for us. But you know, as we sort of saw the shift and patterns of behavior, how people were using open source, and, you know, the overall increase in number of attacks and issues in the open source ecosystem. And I guess all things related more broadly to software supply chains. You know, we sort of figured the time was right to launch. And so we would raise money, you know, spent quite a bit of time putting together all of the market data and information necessary to do that. And, you know, we built out a team and and here we are.

Andy Halko 17:01
So you raise money before you had a product? Or did you already have kind of a beta or starter product? We had a small PLC. Okay. That’s good. How about from a customer perspective? How did you get those first clients and customers on board?

Aaron Bray 17:20
Great question. So a lot of those at the stage has come essentially, through organic connections, either within the founding team or your investors. And, you know, what we’re working to do now is really bring the product to market and essentially broaden the scope of the folks we’re interfacing with, and who we’re able to reach and get in front of

Andy Halko 17:45
kind of curious to talk a little bit about open source. You know, I, years ago, when I was building my company, I taught myself how to program and like PHP, and Python and some of these other languages. And I remember, you know, 15 years ago, everybody wanted dotnet, because they were afraid of open source. You know, how have you seen the open source world evolve? I see today that it feels like at the enterprise level, it’s a much more comfortable conversation, and expected, but I’d love to hear how you kind of view the world of open source as it’s evolved and how it stands today?

Aaron Bray 18:27
That’s a great question. And you’re absolutely right. You know, even a couple of years ago, it was, it was a much different story, right. And it’s funny, you mentioned PHP, actually, it actually had its own supply chain attack a couple days ago, someone actually backdoored.php proper, maybe within within the last few weeks. But over the last few years, organizations have really broadly began adopting open source and have become, as you mentioned, much more comfortable with it, I think I saw a survey that subsisted last year. And they found that across all market verticals, everyone is using open source, or at least all the ones they they examined for their survey. And in fact, all of the code bases they looked at, were comprised primarily an open source. So they found that something like 70%, on average of most of these projects were comprised of open source software, with you know, the minority of that being code that was written in house.

Andy Halko 19:27
Why do you think that’s evolved and changed? What do you think’s gotten, you know, these enterprises more comfortable with that world?

Aaron Bray 19:35
I think it’s probably two factors. So one is better understanding the risks of using it, at least in theory. So as I mentioned before, the landscape has shifted a lot in the last few years because the ecosystems have grown so much and more people have been adopting open source. But you know, prior to that, the ecosystems were relatively small and easy to reason about. And, you know, once people sort of got green pea license compliance hurdle, it became a lot more palatable to use open source. And, you know, once we got to step past that, it’s just such an accelerant over trying to develop everything in house. Modern software projects are so complex that, you know, almost nothing is built from wholecloth. so to speak, today, right. And, you know, I mean, these are this is sort of the logical outgrowth of a lot of trends that started even back in the 90s with, you know, reusable code and capsulation.

Andy Halko 20:33
I’m kind of curious how many companies you think today are either at risk or using packages that probably have some sort of promise or malware something built into them? You know, what do you what do you kind of see is like, the the state of what your product solves, like, how many people are living with issues right now.

Aaron Bray 20:55
So far in, in our examination, I mean, almost everyone is living with some number of issues. And if they’re not living with issues today, there were issues in the not very distant past. And, you know, even more than that, effectively, everyone is now at risk. Because, you know, even by including a popular package, or I mean, any package really, that’s, that’s non trivial. And in size and functionality, you’re effectively giving indirect commit access to 1000s, or 10s of 1000s, in a cases of upstream authors to your codebase.

Tony Zayas 21:35
So how do you take that message? Aaron, because, you know, obviously, you can inject a lot of fear there, that probably works for marketing, how do you communicate that to your target audience and let them know that there’s this legitimate threat that is there? Or will be there, you know, shortly down downstream, you know, how do you paint that picture?

Aaron Bray 21:59
Well, fortunately, for us, and, you know, unfortunately, for the world and the rest of the community, the news cycle has actually been doing a pretty good job of that over the last few months. You know, obviously, solar winds, you know, we don’t know if that was related to open source or not, but it’s a very similar attack to the type of things that we would observe through an open source factors, you know, where something like a build server being compromised. And so, you know, between things like that, Twilio SDKs off afterward, just last year. And, you know, as I mentioned, PHP itself got backdoored, maybe like a week or two ago, and there just been a number of other pretty major incidents, you know, even if we look back the last two to three months, so, you know, now things are sort of coming to a head and, and people are starting to pay a lot more attention to this space.

Andy Halko 22:53
Yeah, it’s so well, I was just thinking, your best marketing is just a stand outside a company with a placard that says, There’s malware in your code, I mean, because everybody’s got it. So, you know, how are you guys I guess, growing the company, what’s your go to market? How are you trying to reach people and, and scale?

Aaron Bray 23:16
Great question. So today, we’re focused, you know, our offering is sort of limited in terms of the audience we’re targeting, you know, as we sort of mature the product and get ready for more of a general launch. Over the next few months, we’re going to be rolling out a more general release. And at that time, I, you know, we’re going to be trying to get our product and components of our product in front of a much broader audience, especially the technical implementers. So, you know, developers, security engineers, you know, DevOps and SRE type personas, the people that are actually, you know, building devstack ops and apset programs, and, you know, making sure that their code is secure, and robust. And, you know, given that we’re also really examining, I don’t want to call the technical debt, but the, you know, the engineering risk that a lot of these packages bring to the table. So, you know, how often does the API change over time? You know, what is the test coverage, things of that nature that certainly software developers and people that are implementing solutions will care a great deal about?

Tony Zayas 24:28
What does the team look like right now, Aaron?

Aaron Bray 24:31
So we are about I believe we were about 13 plus a couple contractors right now. Great,

Andy Halko 24:39
what’s your role within the organization?

Aaron Bray 24:43
So I’m the CEO. So right now, you know, I and my co founder Peter, mostly focused on things like strategy, product, business development and sales and we’re pretty engineering heavy right now. But a lot of that is a consequence of our product just being rather complex.

Andy Halko 25:06
Yeah. Now, are you involved in the engineering process? Or do you still really stay at that higher level of those things that you just mentioned?

Aaron Bray 25:16
Great question. So I hope out a little here and there. But I’ve mostly removed myself a bit from that. One thing that I’ve observed in other companies over time, is that it’s usually harmful to some degree, if you have, you know, somebody who’s sitting in like a CEO or president type role, that’s also trying to tackle the product, because you end up inadvertently, in some cases, speaking with a disproportionately loud voice, when it comes to things like design decisions, and similar similar related attributes with as the product develops, immatures

Tony Zayas 25:52
that’s certainly, you know, a smart observation and move probably on your part was that hard to do to remove yourself from that?

Aaron Bray 26:02
It was a bit at first, I mean, it was obviously a significant paradigm shift. But, you know, it’s one of those things that I think was for the better, it also becomes difficult, as you know, we’ve gone from a very small team to, you know, not quite a small team. Because, you know, now my time gets monopolized by a lot of other things. And it’s, it’s very difficult, it actually causes I think, more harm than benefit. If you have somebody who’s trying to contribute in a technical fashion, that puts in a little time and then gets pulled off for a week, and puts in a little time and gets pulled off for a week, because then, you know, now you create essentially roadblocks for other developers and folks in the organization.

Andy Halko 26:48
Yeah, I think a lot of technical founders have a hard time moving away from, you know, continuing to be in that technical role moving forward. So it’s good that you, you know, we’re able to identify that early on and kind of, you know, adjust things, but I’m sure it’s so a challenge. How about on the product planning side? How do you guys as a team, talk about, you know, what is the roadmap, and I think one of the biggest challenges that I tend to see when I talk to people in the technical space is, there’s a lot of directions we can go, and there’s a lot of great ideas, but we can only do so much at one time. So how do you guys kind of work through your product roadmap?

Aaron Bray 27:34
Great question. So I think there are a few different factors that influence that one of those is, you know, we work to try and get our product in front of customers as quickly as possible. And one of the reasons for that, you know, to establish essentially design for corelationships across a pretty broad spectrum of different different market verticals, use cases, and, you know, CICD setups, was one of the big challenges from an app set product perspective, is that there’s not really a gold standard of what an app sec program looks like, there’s, you know, probably 1000s, if not 10s of 1000s of different combinations of, you know, existing product solutions, workflows. Some organizations have, like an open source governance organization, where they have a few security engineers that are actually, you know, responsible for handling what open source gets used in the organization. There are others that, you know, having very limited, rudimentary, you know, systems where they may include some the output of scanning tools in like, you know, you know, pull requests or merge requests. So this is just part of the code review process. And, you know, there’s essentially everything in between. And one of the really interesting things about this is, it doesn’t even really seem to be strongly correlated with organization size, it’s really more about app set program maturity. And so, you know, one of our, one of our goals was just to make sure that we were as flexible as we needed to be to fit into as broad of a spectrum of use cases as possible.

Andy Halko 29:19
Now, is that so that’s purposeful, you know, some companies try and narrow down and really say, Okay, well, we’re going to be a great solution for this. So you’re really thinking that you want to be able to, you know, work with a wide range of products that are in different life cycle or scenarios.

Aaron Bray 29:38
Now, I just will caveat this by saying that there’s, I don’t think there’s really a silver bullet here. You know, every every type of product is going to have a different sort of appeal. And there’s going to be different trade offs from the engineering perspective in terms of, you know, whether whether it makes more sense to go wide or to go narrow. In some cases, products might require water tailoring in order to fit, you know, a narrow use case. And so you might be leaving a lot on the table, but going to go really wide. In our case, though, interestingly enough, that really doesn’t seem to be the case so far. You know, we designed our product through flexibility. So we started out with a very automation friendly command line tool, we’re in the process of rolling out a UI that speaks more to the, you know, economic purchaser of the product. So like the Cisco app, sec directory CIO. And so, you know, effectively, the, the consequences of that is it sort of fits into any scenario. And we just want to make sure that, you know, we have good use cases for all those things, and the insights we provide back are useful and meaningful, if that makes sense.

Andy Halko 30:49
Yeah. What’s been one of the most challenging parts of building this product? You know, I mean, has there been something that was like a big roadblock or something that you and your team really have had to spend a lot of time figuring out how to, you know, get over that hurdle?

Aaron Bray 31:07
Great question. So, I think the answer to that really is it requires a very broad spectrum of skill sets to really execute on effectively. And, you know, that’s because we’re dealing in something that, you know, is very centered around security, you know, static analysis, but also more broadly, you know, statistical modeling, and big data analysis, data management, data engineering, and, you know, so I think that’s probably the biggest challenge is just making sure that we can fill the roles. And you know, that that cut across the spectrum of things that we need to be concerned with. Yeah, it’s, it’s, I’ve been very fortunate in my career to have, you know, a nice collection of people that I can call on at the very beginning when we were at launch. You know, and we’ve had a few folks come on, since that have been extraordinarily talented, that have really helped get us to where we’re at. And, and you know, where we’ll be in the coming months.

Tony Zayas 32:11
Oh, Aaron, did you say that there’s two other co founders? Yes, there are. So how do you what is that relationship, like amongst the three of you, and kind of how do you guys balance responsibilities and things?

Aaron Bray 32:23
Great question. So, Lewis, who I mentioned before, you know, he’s extraordinarily technical. And so he’s sort of taken over as CTO. And so he is essentially managing the engineering efforts, and, you know, product development as we go on. Pete, my other co founder, of course, has a lot more commercial and enterprise sales experience. So he’s more focused on Yes, sort of that role. And, you know, he’s also extraordinarily technical, so he’s able to help out a lot on bridging the gap effectively between the customers and the product roadmap and development. So you know, helping them out in terms of cassville. And, essentially, being a static standard, instead of products almost.

Andy Halko 33:14
I’m really interested in having, you know, someone like you on and where your product is, is just understanding the landscape and where it’s going in the future. So, you know, I’d say, bucketing that kind of open source, and security within applications. What do you think the next, like, two to three years looks like, from a standpoint of how those things are gonna change how they’re gonna impact the market, the growth and, you know, either risk or use, and so both for that, like open source, as well as, again, more of this application security? What are you predicting?

Aaron Bray 34:02
Great question. So we’ve seen a pretty dramatic shift already, just over the last year with the pandemic. And yeah, the increase in focus on the application in cloud security, there’s been a lot more spend in this space, even over the last year, and it seems like that’s going to continue to grow into the future. As far as how our product is likely to evolve. You know, we’re sort of starting out in this, you know, open source analysis realm, but really, our goal is to then grow effectively into customers faces and be able to provide what essentially amounts to the full spectrum of supply chain security. So we can also take those same insights, the same analysis that we’re applying on the open source side, and also apply that internally. Yeah, I think that will give us a tremendous advantage, as we, you know, are able to not only perform static analysis of, you know, the graph of software, that’s being pulled in, we’ll also be able to take and trace that line effectively from within customer code all the way to the edge of the graph.

Andy Halko 35:08
Do you plan to stay focused on code? Or would it get into infrastructure and other aspects of, you know, the space?

Aaron Bray 35:15
Great question, I think, to some degree that the infrastructure pieces, especially things like containers are almost a logical outgrowth of what we’re doing today. Because they’re very similar in nature to the things that we’re doing packages. They have, you know, they’re hierarchical. So often, you’ll have one container layer inherit from others, they have the same collection of maintainers. And, you know, frankly, a lot of the same concerns. I think it was just a few months ago that a bunch of Docker images were taken offline because they contain crypto miners.

Andy Halko 35:52
Yeah, what are other really interesting scenarios that you’ve seen in this space? I find these stories interesting, but, you know, are there you’ve mentioned already, probably three or four. But what other really, you know, kind of big things that folks that are out there may not have heard of, or even should be aware of, you know, these are the type of threats that are out there.

Aaron Bray 36:14
Great question. So, you know, even beyond the scope of just vulnerabilities in the, at least in the traditional sense, we’ve seen a lot of attacks that have been almost targeted at the, I won’t call them flaws, but, you know, things that are sort of related to design decisions around package managers themselves. So there was, there was a pretty major event in the news, just a couple, I think it was maybe within the last month to two months. But a dependency confusion was something that popped up not too, not too long ago. And essentially a security researcher was able to find a way to get their code to run inside of the networks of some of the really big tech companies. So you know, Apple and Microsoft and a few others. And they were able to collect several $100,000 in bug bounties by executing this. And it was sort of an interesting flaw, essentially, they were able to, they found that essentially, they were able to take an identified private packages, by looking at open source software from some of these big companies or references on websites or things of that nature. And they were able to publish a package that had the same name as this internal package, but there was no external public equivalent of, and they were able to bump the version number up until they started getting downloads and installations. And so effectively, they were able to get their code to run inside of apple or Microsoft’s networks by by exploiting this. And, you know, certainly that’s not a vulnerability in the traditional sense, it’s sort of an artifact of how dependency resolution happens. Right? Another great example, is something called repo jacking that popped up within the last few months as well, somebody discovered that a lot of package managers will allow you to link directly to version control system service. So you know, instead of having a regular package that’s hosted in a place like pi pi, Ruby gems, or NPM, you actually have a package that is simply a Git repo on, let’s say, GitHub. Now, what some, somebody discovered was effectively that over time, many of these will become abandoned. So the author might delete their account, or, you know, they might move the package someplace else. And so what will happen is, they will basically be what amounts to a stale pointer there. So you’ll have a file someplace that references this Git repository that no longer exists. And so what they found was that it went and then create an account with that name, and then created a package with the effective name, then they were able to now effectively provide whatever they wanted. And the package manager that was going in resolving dependencies would just go in blindly pull down the code that they put in that new GitHub repo that they create.

Andy Halko 39:17
How much responsibility do you think, you know, the groups like Bitbucket and GitHub and some of these other organizations should have in being responsible for the security side and an understanding this or, you know, is that where products like yours and you know, they’re doing what they do, and you people also really need a product like yours?

Aaron Bray 39:41
Great question. I think the answer to that really is it seems like it would be complicated for them to please this because, you know, there’s obviously some countries that actually have right to be forgotten loss. And so if you have a right to be forgotten a lot, then it seems like it would be difficult to say, you know, you’re no longer allowed To delete your GitHub account, yeah, but you can just delete your GitHub account. And there’s no real way for somebody like GitHub to know that, you know, there are still pointers still out there for some of these packages, because, you know, they’re frankly, not the information owner of all that data. So, you know, many of these language ecosystems are, you know, might be hosted separately, the package itself might be hosted on Git lab or Bitbucket, or you know, someplace else. And so, you know, there’s, there’s not really a good way for a centralized repository like GitHub to complete this new meaningful way. If that makes sense. Yeah. And so that’s, we’re sorry, guys. I was just gonna say that’s sort of where a product like ours, essentially stands in.

Andy Halko 40:52
Yeah, and I mean, that’s, that’s part of the ecosystem, as you have, you know, these tools are doing that, but obviously, bringing something like yours in, you’d mentioned bug bounty? I’m curious, does does your methodology or, you know, machine learning or system think around the idea of kind of almost white hat, you know, measures for detecting these things of like, you’re trying to go out and, and, you know, infiltrate define what the issues are?

Aaron Bray 41:29
Absolutely. And, you know, certainly that’s the long term goal of ours. Yeah, next few months, we’re actually looking to roll out an API that will allow, you know, customers, and potentially if we open it up a bit for things like bug bounties. Yeah. individual contributors to go and potentially find bigger issues in the open source ecosystem.

Andy Halko 41:49
Mm hmm. So do you think that there is eventually like a community aspect of the product where where folks are, you know, helping your machine learning platform? Like you said, almost previously, helping train that?

Aaron Bray 42:05
Absolutely. Yeah, there are certainly almost a million different directions that we could go in terms of what insights we could look at, and you know, what things we could try and gather, and what information we can provide back. And you know, I think really, opening the aperture more is is only going to be beneficial to the broader community at large.

Tony Zayas 42:27
It’s great. So for the rest of, you know, 2021 here, and what is in store for phylum? Like, what are the plans and the goals that you guys have?

Aaron Bray 42:38
Great question. So, we’re planning a more general release in the next few months. And part of that is likely going to include our API rollout, as mentioned. So that’s kind of the big highlights, you know, we’re looking at essentially broadening the products appeal and, you know, improving the catalogue of heuristics and models that we currently have to date. So, I think that’s a big chunk of what the rest of this year will look like.

Andy Halko 43:13
Very cool. Yeah. Yeah. You know, I’m kind of interested in hearing, you know, how you expect your very technical product. But you mentioned selling this seaso, and some of these other roles, which they tend to be a little bit technical. But, you know, for other founders that are out there, that their technical their product is very technical, you know, how do you see articulating and talking to the marketplace? Can you do you try and dump it down for those are maybe in the business side of things and, and don’t quite understand all the aspects of packages? Or or, you know, have you tried to stay really tactical with that? Make sure that those folks that know what you’re talking about, know that you know, what you’re talking about?

Aaron Bray 44:03
Great question, I think there’s definitely a careful balance there. Because you want to make sure that, you know, people do know what that you know what you’re talking about, and that your product isn’t pure snake whip, especially if you say things like machine learning, because yeah, there are a wide range of products that have a lot of buzzwords in them. But you know, they don’t really mean anything, there’s not like a tangible, understandable application of the technology. So, you know, we try and strike a bit of a careful balance there between going a little bit too deep, and making sure that we accurately portray what our what our product does. One of the things that we found to be immensely helpful and really telling the story of what our product does, and how it works is we you know, put some time early on into building some Jupyter notebooks, that show, you know, effectively an action how, how some of our heuristics work and how they run and we found that was really helpful when explaining at a high level, how we would do something like stop the title swatted package from being used in production, or how we would identify malicious package by, you know, looking at patterns in the, in the source code itself. If that makes sense.

Andy Halko 45:18
Yeah, it does. It’s interesting. So, you know, one question that I’ve been asking a lot of the tech founders is, is more of a philosophical question, if you will. But, you know, over the next 10 years, as you you know, because you’re a very technical person, I’m sure you’re tied into a lot of different technologies. What are you most excited for in the future of what’s going on? Is it you know, things around blockchain machine learning? You know, even the automated driving or other unique technologies? What what kind of has you excited for the, you know, General future of, of, we’ll even say, society?

Aaron Bray 46:01
Great question. You know, I mean, there, there are so many things coming out right now. It’s, it’s sort of hard to pick just one. You know, obviously, across the board, there’s a lot of advances in the in the machine learning world happening now that are very exciting. I’m, maybe, you know, maybe not as excited about things like advances on the blockchain side, because, you know, there are a fairly limited number of real world applications that I’ve seen that seem like things that I really, you know, really dig into, not to say that there aren’t some out there. I’m sure there’s something right. You know, but I think advances in quantum computing are pretty exciting. Yeah, is that sort of rolls forward and starts becoming more available, and more useful. Rack-scale computing is also pretty exciting.

Andy Halko 47:03
But haven’t heard of rack-scale computing? Can you talk about that a

Aaron Bray 47:06
little bit? Sure. Absolutely. So the idea effectively is, you know, instead of having a single system that has, you know, the core components, that is that a system would have, like, its memory modules, and CPUs and things of that nature, it’s all sort of broken out into a much larger construct. So you can have many more resources than you would with just a single box for walking and biking single box. There are obviously some interesting scheduling and resourcing issues associated with that, because the distance between components is much greater.

Tony Zayas 47:49
Well, Aaron, can you just for those that are tuning in and want to learn more about you and find them, you want to tell them where they can check you out?

Aaron Bray 47:57
Sure, absolutely. So, you know, our website and blogger are sort of a good first pass, phylum.io. And, yeah, we also have, you know, some of the heuristics that we’ve already built, listed in our documents there. So we’re not quite open for, for general use just yet. But you know, certainly we’d love to hear from anyone who’s interested in sort of being an early design partner slash early adopter.

Andy Halko 48:27
It’s fantastic. That’s awesome. Yeah, we really appreciate you taking the time to talk about, you know, not only your technology, but where the spaces with software security and some of these other big challenges that I think almost almost all organizations are facing, and they may or may know it or not. Absolutely. Well, thank you so much for having me.

Tony Zayas 48:52
Yeah. Thank you, Aaron. Have a great day. And thanks, everyone, for tuning in. We’ll see you again next week. Take care, everybody.