Zato Novo is a computational design studio that creates learning machines. We juxtapose human-centric interface design with data-centric system design. As digital architects we model the mathematical flow and evolution of data through systems and logic just as physical architects model the flow of people through buildings and space. Our products and services focus on deriving Foresight From Insight. We accomplish this by applying a combination of mathematical models plus soft real-time systems to solve problems related to information overload and noise reduction, as well as decision support and optimization.
We use design in the broadest sense, representing an optimization process amongst competing constraints. In engineering design, the tradeoff is between efficiency and simplicity, whereas in product design it is between form and function. Computational engineers must design models that optimize between generalizability and accuracy.
Our home page is a symbol of our work ethic and design aesthetic. We are uncompromising in asking questions and exploring ideas. We are students of history and culture and this breadth of perspective and understanding informs the way we look at the world and how we apply our scientific and technical wizardry to solve problems. We use artistic and creative exploration as a springboard to ask and solve mathematical and engineering problems.
The illustration is dynamically generated using Lindenmayer systems to generate the landscape, background plants, and the bamboo. The overall design is reminiscent of a wood block print, which itself is a multi-step process of artistry, craftsmanship, and design tradeoffs. The text on the bamboo is generated by a real-time Twitter feed powered by our flag ship Panoptez product for search intelligence and Ocellus for streaming infrastructure.
Zato Novo was founded by Brian Lee Yung Rowe as a vehicle for novel research and development. In addtion to in-house research, our work for clients include creating models to forecast consumer spending for a mobile banking startup and identifying and classifying legal clauses within contracts for a law firm.
Brian is also a professor at the CUNY School of Professional Studies, where he teaches mathematics and data analysis for the M.S. in Data Analytics program as well as the B.S. in Information Systems. He is currently writing a book "Modeling Data With Functional Programming In R" to be published in late 2014/early 2015 by Chapman & Hall/CRC Press.
Zato Novo offers professional services for model development. Whether you wish to create a custom model to leverage our hosted models (such as Panoptez), or are interested in a standalone model, we can help. We take a strategic approach ensuring that you understand the data needs of a model, how to validate models, and dealing with regime change. We guide you through the model development process from problem definition to model specification through to deployment.
Zato Novo provides bespoke training covering analytical methods and also programming languages. We can conduct multi-day workshops as well as individually tailored mentoring and tutoring.
Learn the modeling process from hypothesis to data extraction to model building to validation. We look at the different types of problems that are addressed in data science and the common aproaches. We review the fundamentals of linear algebra, probability and statistics. We then explore various machine learning techniques and discuss concepts such as feature extraction, variable importance, precision, and recall.
Erlang is a popular functional programming language known for its unprecented handling of concurrency. Getting is life in telephony, erlang now has broad acceptance for general server applications. For people new to functional programming, erlang can be a mystery. Throw in a bit of concurrency, and it fast becomes mystical. Zato Novo can help you gain mastery in erlang. We have published a number of erlang libraries (e.g. bunny_farm) as well as contributed to some prominent open source projects.
R is a popular statistical programming language. Like erlang, the syntax and semantics of the language can present a challenge to newcomers. Zato Novo can help you gain mastery in R. We have published numerous libraries in R ranging from general utilities like futile.logger (which is used in BioConductor), to domain specific applictaions like random matrix theory and portfolio optimization. We also publish lambda.r, which is a language extension that lets you write R programs with a functional syntax similar to erlang and Haskell (with pattern matching, guards, and type declarations).
Contact email@example.com to discuss how we can propel you over the learning curve.
The Internet is alive with conversation. Everyone is sharing, giving their opinion, and making their voice heard. But whose voice do you listen to? Without proper tools, all these conversations can be an overwhelming cacophony of meaningless noise.
And yet, for those who know where and how to look, this sea of data is rich with insights that can optimize your marketing and user acquisition in real-time. Unlocking the vast treasures buried within the depths of social media requires the power of Panoptez.
Panoptez is a soft real-time analytics platform and data service for social media. Panoptez provides a collection of streaming models that reveal the secrets within social content.
We all wear multiple hats and play different roles depending on the situation. Our persona is defined in part by how we communicate as well as what we talk about. Panoptez identifies the roles people play on social media and how to best utilize that information for your needs. Whether for marketing, product feedback, outreach, or content discovery, a user's role is vital in determining their credibility and whether their voice should be heard.
Another driver of credibility is a user's influence. Like personas, a user's influence varies depending on their situation. Influence is a component of credibility but also useful in its own right for measuring expected impact of communication. Arguably a tweet from The New York Times has more influence than a tweet from Beyonce, despite the latter having 8 times the number of followers, unless you happen to be a die-hard fan. Panoptez exposes contextualized influence so you know with whom to engage and with whom to listen.
So far we've only looked at the users but not the content. Aside from the general topic of a message, it is important to know the sentiment associated with a message. Is someone talking favorably about your brand or not? Over time and in aggregate, this information acts as the pulse of your brand, campaign, or angle on a story.
Most companies stop at sentiment, but Panoptez goes further. How important is a negative viewpoint that doesn't have a lot of conviction? Probably not much, but usually this information is hidden. Conviction is an integral part of Panoptez, so you can know if someone loves you or just likes you.
Social media discussions can be hard to parse as many simultaneous conversations are flowing through the same stream. Whether it is a hashtag, search query, or a single user's timeline, there's a lot going on. This interlacing of many signals in a single feed is similar to what Electrical Engineers call multiplexing. While the advantages for multiplexiing are legion for signal propagation, usually there is a corresponding inverse process to demultiplex the stream to recover the individual signals. Via machine larning algorithms, Panoptez can demultiplex multiple simultaneous conversations into separate streams so you can focus on only the conversations that matter to you.
Isolating conversations is useful, but in a single demuxed thread there is still a lot of noise. This noise takes the form of multiple slightly modified tweets, automatic re-posting of the same link, or possibly just garbage messages with little intrinsic value. By removing these types of messages from a conversation, it is possible to increase the signal to noise ratio by focusing on data with known information content. This means no more missed conversations nor drowning in a sea of conversations.
When something happens that is unexpected it is considered an anomaly. This can take the form of a breaking news story, in which case the anomaly will eventually decay into a non-anomaly versus a hacked account where the behavior of a user suddenly changes. This is sometimes referred to as regime change and is an integral part of Panoptez.
Regimes can be time-based or value-based. In time, a regime might be the amount of activity of a particular topic on a social media channel. When a signficant event occurs, the activity will change (and eventually decay). This sudden change of interest is what drives the regime change. In value-based regimes, the process governing its value may suddenly change. The consequence of this is that a model optimized for one regime may no longer be valid in the new regime. How regime change is interpreted is up to the end user, but the mechanics of regime change are constant. Panopotez discovers regimes and detects their changes using methods developed at Zato Novo.
To learn more about how Panoptez can help you, contact us at firstname.lastname@example.org.
Model development has a curious twist: while creating a good model is the biggest intellectual challnege, working with data is usually the biggest operational challenge. Many model frameworks exist so that people don't have to create things from scratch. However, the interfaces between the models and the data are rarely compatible. This translates to a lot of data manipulation throughout the model development process.
Data comes in all shapes and sizes. Two core forms of representation are hierarchical and tabular structures. Hierarchical structures are essentially graphs and are also known as nested structures. Examples include formats like XML or JSON, where elements can represent either a node or a terminal element. This type of structure is very flexible in the sense that any arbitrary hierarchy can be represented in the same data set. In tabular data structures, data is represented as rows and columns. In this format, rows are discrete records of data and columns represent values for individual fields. Here the underlying principle is that all pieces of data have the same structure. Tabular structures are useful in data analysis since aggregation and many forms of analysis expect each record or sample has the same fields. From this perspective, tabular structures, and denormalized tabular structures in particular are efficient and convenient to work with for modeling. In general, Zato Novo advocates using denormalized tabular data as the standard structure for model development.
Obviously not all data comes prepackaged in a denormalized table structure. Most web APIs produce JSON, while many databases have normalized tables that must be joined and transformed to denormalized structures in memory. Models often require raw scalars or other primitive values, which requires further transformation.
Enter Odessa. The purpose of Odessa is to remove the headaches
associated with data interoperability, simplifying the process of
taking arbitrary datasets and intelligently denormalizing them.
JSON structures can quickly be converted into a
In addition, Odessa knows how to join datasets together based on
the fields contained in the datasets. With Odessa you can leave
the data munging swiss army knife at home and focus on discovery and insight.
The challenges associated with transforming hierarchical data into
a denormalized table structure is that there is no obvious mapping between
missing nodes and elements. Even more challenging is converting non-scalar
elements into a tabular structure. Odessa solves this with two
complementary approaches. The first is using reasonable defaults, such
NA for empty/missing data, assuming all associative arrays
are nodes, and arrays are elements. This works well for most cases.
Extraordinary cases require custom handling, which is the second approach.
This is similar to a SAX-like parser for XML that focuses on parsing
and node traversal while handing off actual transformation logic to
a callback function. In reality, SAX parsers don't manage the graph because
they pass events based on tags. Odessa tracks the node structure, as this
is used to provide context to the callback function.
The magic of Odessa appears in the form of automatic joining of tables. In an ideal world joining datasets together is a straight-forward process when each table has compatible primary keys. This is usually the case in a single database, but the moment disparate systems and datastores must come together, this vision of an idyllic world is shattered. In a decentralized world, it is rare that datasets will have perfectly matched keys. Setting aside when it is appropriate to join disparate datasets together for analysis, the mechanics of creating compatible keys is a messy process that requires parsing, regular expressions, casting, and even lookups.
Odessa makes the messiness of data interoperability a quaint anachronism.
Behind the scenes, Odessa is both a library and a repository that contains
metadata about datasets. This metadata specifies the structure of the
data and a declarative syntax for describing how keys are defined.
The structure essentially defines a graph that describes a compositional
relationship between keys.
This means that even when the relationship between two tables involves
a composite key or a transformation on a key, Odessa can infer the
common structure between two keys and join the sets together.
In most situations it is only necessary to tell Odessa that you want
to join two
data.frames together, and Odessa will take care
of the rest.
As part of our research and development, Zato Novo produces numerous tools to aid our research. Some of these end up in formal research and publications, while others become open source libraries and tools.