Zato Novo engages in computational research and development to solve problems in alternative finance (P2P lending, hedge funds) and social media. We juxtapose human-centric interface design with data-centric system design As digital architects we model the mathematical flow and evolution of data through systems and logic just as physical architects model the flow of people through buildings and space. Our products and services focus on deriving Foresight From Insight. We accomplish this by applying a combination of mathematical models plus soft real-time systems to solve problems related to information overload and noise reduction, as well as decision support and optimization.
We use design in the broadest sense, representing an optimization process amongst competing constraints. In engineering design, the tradeoff is between efficiency and simplicity, whereas in product design it is between form and function. Computational engineers must design models that optimize between generalizability and accuracy.
Our home page is a symbol of our work ethic and design aesthetic. The illustration is dynamically generated using Lindenmayer systems to generate the landscape, background plants, and the bamboo. The overall design is reminiscent of a wood block print, which itself is a multi-step process of artistry, craftsmanship, and design tradeoffs.
We are uncompromising in asking questions and exploring ideas. We are students of history and culture and this breadth of perspective and understanding informs the way we look at the world and how we apply our scientific and technical wizardry to solve problems. We use artistic and creative exploration as a springboard to ask and solve complex mathematical and engineering problems.
Zato Novo was founded by Brian Lee Yung Rowe as a vehicle for novel research and development. In addtion to in-house research, our work for clients include:
Brian is also a professor at the CUNY School of Professional Studies, where he teaches various courses in mathematics, predictive analytics, machine learning, and big data for the M.S. in Data Analytics program. He is currently writing a book "Modeling Data With Functional Programming In R" to be published by Chapman & Hall/CRC Press.
Brian blogs at Cartesian Faith and is on Twitter as @cartesianfaith
Big data offers vast rewards to those able to harness it. But like a bucking bronco, big data is not easily tamed. Let Zato Novo guide you through the process, so you can get the most from your data.
Analytical models are at the heart of data science. With so many models to choose from, how can you lift the fog of uncertainty? Zato Novo is your strategic partner that can guide you through the model development process. We take a five step approach that includes:
What is a heart without arteries and veins? It's this intricate system that circulates blood to all the organs and muscles of the body. Your computational infrastructure plays a similar role, providing the data life-blood to all parts of your organization. These systems are powerful and also very complex, spanning hundreds if not thousands of machines. A poorly designed computational system can cost millions of dollars, whereas a well-designed system can become a revenue generator of your business. Let Zato Novo evaluate your infrastructure needs. We'll design a computational system that delivers immediate performance, while scaling for the future.
Contact us today to learn how we can help you get the most out of the data science revolution.
The Internet is alive with conversation. Everyone is sharing, giving their opinion, and making their voice heard. But whose voice do you listen to? Without proper tools, all these conversations can be an overwhelming cacophony of meaningless noise.
And yet, for those who know where and how to look, this sea of data is rich with insights that can optimize your marketing and user acquisition in real-time. Unlocking the vast treasures buried within the depths of social media requires the power of Panoptez.
Panoptez is a soft real-time analytics platform and data service for social media. Panoptez provides a collection of streaming models that reveal the secrets within social content.
Social media discussions can be hard to parse as many simultaneous conversations are flowing through the same stream. Whether it is a hashtag, search query, or a single user's timeline, there's a lot going on. This interlacing of many signals in a single feed is similar to what Electrical Engineers call multiplexing. While the advantages for multiplexiing are legion for signal propagation, usually there is a corresponding inverse process to demultiplex the stream to recover the individual signals. Via machine larning algorithms, Panoptez can demultiplex multiple simultaneous conversations into separate streams so you can focus on only the conversations that matter to you.
Isolating conversations is useful, but in a single demuxed thread there is still a lot of noise. This noise takes the form of multiple slightly modified tweets, automatic re-posting of the same link, or possibly just garbage messages with little intrinsic value. By removing these types of messages from a conversation, it is possible to increase the signal to noise ratio by focusing on data with known information content. This means no more missed conversations nor drowning in a sea of conversations.
When something happens that is unexpected it is considered an anomaly. This can take the form of a breaking news story, in which case the anomaly will eventually decay into a non-anomaly versus a hacked account where the behavior of a user suddenly changes. This is sometimes referred to as regime change and is an integral part of Panoptez.
Regimes can be time-based or value-based. In time, a regime might be the amount of activity of a particular topic on a social media channel. When a signficant event occurs, the activity will change (and eventually decay). This sudden change of interest is what drives the regime change. In value-based regimes, the process governing its value may suddenly change. The consequence of this is that a model optimized for one regime may no longer be valid in the new regime. How regime change is interpreted is up to the end user, but the mechanics of regime change are constant. Panopotez discovers regimes and detects their changes using methods developed at Zato Novo.
To learn more about how Panoptez can help you, contact us today.
Peer-to-peer lending is positioned to become one of the biggest disruptors to conventional finance. With well over $10 billion in loan originations in the U.S. alone, P2P lending can no longer ignored. Zato Novo offers quantitative credit models (Sarapos) that find favorable mispricings in P2P listings. This results in higher returns and a larger pool of investable securities with a low risk profile.
With yields so low, huge numbers of investors crave returns. This creates an unfortunate situation, where there is more supply of capital than demand for capital. Many investors use a combination of filters to look for "high-quality" loans. As the highest quality loans are taken, this leaves a pool of lower grade loans. For conventional models, this is problematic as there is no more notional left.
On the other hand, Sarapos credit models are resilient since they do not use filters. Instead, Zato Novo uses statistical methods to model credit risk and identify loans that have been mispriced by the originators. Combining traditional high-grade loans with mispriced loans increases overall returns while simultaneously increasing the available pool of investable securities.
To learn more about how Sarapos can help you, contact us today.
Model development has a curious twist: while creating a good model is the biggest intellectual challnege, working with data is usually the biggest operational challenge. Many model frameworks exist so that people don't have to create things from scratch. However, the interfaces between the models and the data are rarely compatible. This translates to a lot of data manipulation throughout the model development process.
Data comes in all shapes and sizes. Two core forms of representation are hierarchical and tabular structures. Hierarchical structures are essentially graphs and are also known as nested structures. Examples include formats like XML or JSON, where elements can represent either a node or a terminal element. This type of structure is very flexible in the sense that any arbitrary hierarchy can be represented in the same data set. In tabular data structures, data is represented as rows and columns. In this format, rows are discrete records of data and columns represent values for individual fields. Here the underlying principle is that all pieces of data have the same structure. Tabular structures are useful in data analysis since aggregation and many forms of analysis expect each record or sample has the same fields. From this perspective, tabular structures, and denormalized tabular structures in particular are efficient and convenient to work with for modeling. In general, Zato Novo advocates using denormalized tabular data as the standard structure for model development.
Obviously not all data comes prepackaged in a denormalized table structure. Most web APIs produce JSON, while many databases have normalized tables that must be joined and transformed to denormalized structures in memory. Models often require raw scalars or other primitive values, which requires further transformation.
Enter Odessa. The purpose of Odessa is to remove the headaches
associated with data interoperability, simplifying the process of
taking arbitrary datasets and intelligently denormalizing them.
JSON structures can quickly be converted into a data.frame
.
In addition, Odessa knows how to join datasets together based on
the fields contained in the datasets. With Odessa you can leave
the data munging swiss army knife at home and focus on discovery and insight.
The challenges associated with transforming hierarchical data into
a denormalized table structure is that there is no obvious mapping between
missing nodes and elements. Even more challenging is converting non-scalar
elements into a tabular structure. Odessa solves this with two
complementary approaches. The first is using reasonable defaults, such
as NA
for empty/missing data, assuming all associative arrays
are nodes, and arrays are elements. This works well for most cases.
Extraordinary cases require custom handling, which is the second approach.
This is similar to a SAX-like parser for XML that focuses on parsing
and node traversal while handing off actual transformation logic to
a callback function. In reality, SAX parsers don't manage the graph because
they pass events based on tags. Odessa tracks the node structure, as this
is used to provide context to the callback function.
The magic of Odessa appears in the form of automatic joining of tables. In an ideal world joining datasets together is a straight-forward process when each table has compatible primary keys. This is usually the case in a single database, but the moment disparate systems and datastores must come together, this vision of an idyllic world is shattered. In a decentralized world, it is rare that datasets will have perfectly matched keys. Setting aside when it is appropriate to join disparate datasets together for analysis, the mechanics of creating compatible keys is a messy process that requires parsing, regular expressions, casting, and even lookups.
Odessa makes the messiness of data interoperability a quaint anachronism.
Behind the scenes, Odessa is both a library and a repository that contains
metadata about datasets. This metadata specifies the structure of the
data and a declarative syntax for describing how keys are defined.
The structure essentially defines a graph that describes a compositional
relationship between keys.
This means that even when the relationship between two tables involves
a composite key or a transformation on a key, Odessa can infer the
common structure between two keys and join the sets together.
In most situations it is only necessary to tell Odessa that you want
to join two data.frames
together, and Odessa will take care
of the rest.
Explore the existing datasets and the metadata at the Odessa repository. To learn how Odessa can ease your data interoperability challenges, contact us at discover@zatonovo.com.
As part of our research and development, Zato Novo produces numerous tools to aid our research. Some of these end up in formal research and publications, while others become open source libraries and tools.