Alternative data isn’t a new idea, but what’s new in alternative data and how it’s being used always makes for an interesting conversation. It’s especially enjoyable if that conversation is with Laurent Laloux, Chief Product Officer, Capital Fund Management (CFM). Laloux has a knack for explaining technically complex ideas in an engaging and informative manner, and probably has the most concise and memorable definition of “big data” you’ll ever hear or read about. At a firm that bases every investment strategy on a quantitative and scientific foundation, Laloux also knows that no matter how good the data, what really matters is how it fits into and improves investment strategy performance. II recently spoke with Laloux about big data, alt data, and how natural language processing (NLP) and other AI techniques can help reveal hidden indicators in the market that lead to investment opportunities.
Is it accurate to say that in order to fully leverage alternative data you must first understand the concept of big data? And that big data doesn’t just mean “lots of data?” Many people think big data is simply a huge data set, but the term is a misnomer. It’s more complex than sheer volume – big data is a continually evolving concept. At any point in time, a specific data set is “big” if it’s at the limit of what technology can do with it. Twenty years ago, analyzing tick-by-tick data from exchanges was big data because that was what the most advanced technology was capable of handling. Today, any modern computer system, whether in-house or cloud based, can efficiently deal with that type of data. So, even if the data set is huge in terms of size, it’s not big data if it’s not challenging the technology. The concept of big data has moved in concert over time with innovations in technology. As a quant manager, we’ve always been at the cutting edge of technology and the analytical techniques it enables along with the growing levels of data, so we’re moving in sync with both to heighten our capacity to measure or understand the economy and financial markets, which in turn helps shape our investment strategies.
What’s an example of the limits of technology and big data today?
We now have an internet of things – all kinds of devices have chips in them, and those devices are connected to networks and they generate a vast amount of data which is only available thanks to this technology. That’s a limit of what technology can manage, and that’s big data – new information that you can catch and analyze.
What’s the current-day relationship between big data and alternative data?
Alternative data is essentially data that is not directly related to financial markets, but which is a proxy or indicator about the economy. It can be data regarding ship or train cargo, a consumption aggregate from supermarkets, credit card information, and so on. The corpus of alternative data generated by businesses, humans, and the internet gives you an idea of the underlying activity of what companies, people, and machines are doing. If you are able to manage alt data and build a relationship between it and its impact on the economy and on assets, there’s the potential to build better asset management strategies.
It’s worth noting that alt data is not necessarily big data. Some alternative data can be pretty small and easy to manage, even in an Excel spreadsheet. Social media information is both alternative data and big data. There are more tweets than any structured financial news since the beginning of the 1990s, for example. Most of them are irrelevant regarding the economy, but you have to deal with a massive amount of data to find a needle in a haystack. Storing and managing the fire hose of data on Twitter requires market-leading data management, and extracting market moving information requires topnotch NLP technology.
Why is it important that a quant firm like CFM has made the evolutionary journey along with technology and data?
It’s by looking at historical data that you can find statistical patterns and statistical stability which allow the opportunity to try and predict a little bit of the future. Without data, quant wouldn’t exist. Quant and the emergence of big-time scientists working in financial markets were the result of the readily available and quality financial data that emerged in the 1980s and continues to grow even today. The point is that we’ve advanced along with big data and technology in identifying the highest quality data sets and the technology that will allow us to find an interesting pattern in the data. The goal is to build models to predict what a financial asset might do in the future.
Along the way of that journey with data and technology, we’ve learned how to stay on the cutting edge in using the latest statistical or IT technologies. You don’t just show up and say, “We’re on the cutting edge.” Staying on the cutting edge is a pursuit that never stops as long as there is new technology and new data that allows for a more refined perspective of the economy. As a consequence, you cannot stay put – you must constantly learn new methodologies, new technology, and hire new generations of IT experts and quant scientists so that you always stay on top of the newest developments from academia and big tech.
Presumably you’re always looking for that next generation talent?
Exactly, and the new generations constantly challenging the older ones to push in new directions.
So, where are we at now in the evolution of technology as it pertains to leveraging big data and alt data?
We’ve been through a big wave of complete virtualization, meaning that today you can go to a cloud or use a virtual machine and you don’t have to think all that much about hardware. That’s good for us because we don’t want to be hardware specialists – we want to consume the best hardware for any given task. We’ve been moving on from hardware to higher level software, and the next stack is standard tools, libraries, and languages that you can leverage. In the past, you had to develop a lot of your own libraries to implement statistical ideas and concepts. It was straightforward for mathematicians and statisticians. But you still needed to code it.
These days, there’s a high-quality stack of open source, open library software that will essentially implement the building blocks. That allows us to focus on higher level concepts, i.e. “What do I understand? And what do I want to model?” As time goes by, technology is becoming closer and closer to what you would write on the blackboard as a quant. Access to this easy hardware management and standard open-source, high-quality code allows us to focus on what makes our specialty and expertise critical for investors – namely, to build models and try to predict risk.
What types of specialized knowledge and expertise are required to pull all of this together on behalf of investors?
You have the people who design models by looking at data and inferring how it could be used to build an indicator of whether the price of an existing asset might go up or down. Financial markets are very efficient, so there is very little information buried in a huge amount of noise. A key competence is understanding the signal-to-noise ratio while being aware that there’s always a risk that what you are doing contains a lot of information which might bias your results. Some say there is more art to this element than science.
A second kind of expertise that is vital in the quant model is trade management and this requires a highly technical mind. The data is intense, very structured, and in massive amounts – prices and quantities – and you need to understand how to interact with the market, and what trade execution best minimizes price impact and fees.
The third type of talent is the people combining all of this information through portfolio construction to optimize our strategies. They take the signals from the first group I described, and the trade cost information from the second group, and they mix that with their risk model and they try to build a portfolio which is optimal in terms of size, quality, and Sharpe ratio.
The end goal of all of what you’ve been describing is not only identifying the signals, but also identifying what has a material impact on price, yes?
What you want is to predict how the capital will flow, and if you are able to predict that you will know how it will impact prices and you can position your portfolio accordingly. So, from a high-level view, that’s the big goal.
Getting back to alt data for a moment – how are you incorporating NLP into what CFM does?
Natural language processing has been on our radar for nearly a decade but it has been a big push for us over the past three years, during which time there has been a massive evolution in the capacity of neural networks [a series of algorithms] to analyze human text and generate text like a human. We’ve been doing text analysis for a fairly long time using less advanced technology – looking at vocabulary, at syntax, at the effect of certain sentences. However, today, starting with generic off-the-shelf neural networks and leveraging our quant market expertise to thoroughly retrain them on the specific corpus of financial news, we can obtain results that are much better than what we used to obtain in a more manual way. It’s another example of leveraging cutting-edge technology and adapting it to the specific context – using our experience and training in how markets and portfolios behave allows us to implement that alt data from NLP in a very efficient way and improve our investment strategies.
What’s an example of how you might use NLP?
We’ve looked at earning calls from corporations, when CEOs and CFOs are giving quarterly updates on how the company is doing, what it is doing, and so on. You can do human analysis, for instance, and try to guess what it all means. Or you can use the massive power of a new generation of neural network which is able to capture more subtle and refined information in a way that is extremely difficult for a human to do.
For example, one idea that has been discussed in financial circles addresses the idea that when a CFO gives a quarterly update, over time he or she will develop a certain style of delivery and word usage, etc. But imagine that this company is in the midst of an M&A deal. Typically, during M&A, the lawyers will tell executives what they can and cannot say, how they should phrase things, and so on. The type of algos we’ve been discussing can capture slight changes in the way the CFO is talking about the company, this is the kind of thing that such algos are potentially able to capture. Now we can grasp these subtle differences as an advanced indicator that something is happening within this company. You can’t be sure what is happening, but it’s a possible risk indicator that might prompt you to look at other indicators, such as price dynamics, or other types of news.
How do you see your use of NLP evolving?
I think it’s very promising for ESG. Typically, ESG data is self-reported and corporations like to have the greenest possible balance sheet. In truth, a lot of the green we see on balance sheets is sometimes more a reflection of the size of the marketing department than what the company is actually doing ESG-wise. That’s why it’s important to use external data sources instead of simply self-reported information.
In ESG there’s a lot of information in blogs, news, comments, and so on. That’s why there’s a real hope that by training the NLP system to look for specific carbon aspects, for example, or environmental and social aspects, you can get a more accurate real-time measure of the ESG quality of a company instead of relying purely on the self-reported balance sheet picture.
How do you see the use of alt data continuing to evolve?
This is a golden age for data emanating from all over the world. The big challenge currently is filtering all of it. One could possibly imagine that in the future we might hit a data lull where there is no new data or no new angle – a bit like an “AI winter.” That could be caused by economic recession or major changes in the world at large, or because technology development plateaus and new analysis of alt data slows. However, this all seems unlikely and right now there’s still plenty of information to keep us busy. One aspect which gives me great hope is that more and more public agencies are trying to collect and offer data to their citizens. Such open data initiatives allow us to aggregate and collect new information. There’s huge, untapped potential there because the details are messy – different standards, for example, and even within the same country each agency might have a different protocol and define things differently. But, if you’re able to collect this information, and rationalize and harmonize it, the potential is vast.
Data isn’t worth much if you don’t know how to optimize its use though, is it?
Right, the barriers to accessing data might be less severe than they once were, but knowing how to blend it with trading and portfolio construction still takes tremendous skill, knowledge, and experience. Simply having data doesn’t make you a data scientist any more than buying a piece marble makes you a sculptor. That’s where our nearly 30 years of experience as a quant firm and team come into play. Alt data is important and interesting, but it’s not the only way we can improve our models. Relying on quality price data and other financial data, and leveraging models and statistical tools to find deeper relationships within the data is just as important. We do both. We cover new data and existing data using the latest technology, and revisit and review what we’ve been doing with a critical eye. In this way we continue to improve and evolve what we do, and provide the service and products our clients are looking for.
Learn more about CFM and its strategies.
Any description or information involving investment process or allocations is provided for illustration purposes only. There can be no assurance that these statements are or will prove to be accurate or complete in any way. All figures are unaudited. This article does not constitute an offer or solicitation to subscribe for any security or interest.