Every architect should know that when it comes to Big Data projects, distorted goals can lead to insufficient roadmaps.
I have been repeatedly faced with situations in which scalability or Big Data capacity were considered key architecture drivers. However, the real challenge on the project could be far from a “three Vs” (volume, variety, velocity) problem.
Common Misconceptions
At the beginning of the project, certain people may be considering scalability or capacity as key drivers, but such periodization may be misleading. If you are not familiar with Big Data challenges, but have experience in other domains, you may think about “configurability” as a key driver when you are starting from scratch – but this could be surprisingly inefficient.
For example, a US startup recently decided to introduce a new feature for their SaaS product: advanced analytics and reporting over events collected from online/web services. Based on the sales forecast for next two years, they estimated the required capacity and found that they would be faced with scalability issues. With that in mind, the decision was made to build an SaaS-based multi-tenant solution using a Big Data technology stack (Hadoop, Spark, etc.).
As it turned out, the initial sales forecast was very optimistic, making high scalability a lower priority. Additionally, they signed a few customers who were interested in on-premise deployment because the SaaS model didn’t work for them. While they now had an SaaS platform that could host new tenants, they admitted that flexibility and extensibility (ability to add new features with low efforts) ended up being more important to them than scalability had been to them at the early stages of the project. Distributed systems also brought some implementation and deployment overhead. As a result, the customer engagement process was constrained, and sponsors were not satisfied with terms and costs.
Finding the Appropriate Roadmap
The primary goal of the architect is not only to suggest a technical solution and implementation strategy, but also to figure out the driving requirements, assess the risks, and suggest a sufficient roadmap, taking into account all business aspects.
Let’s consider two project roadmaps:
Conclusion
I think the advice “seek the value in requested capabilities” from 97 Things Every Software Architect Should Know can be very helpful in this case. Instead of focusing on the technical solution for the end product, analyze what is most demanded on each project stage and define how you can get to the final goal, taking into account the driving requirements or objectives as well as the available resources.
To learn more about how to make Big Data work for you, check out our latest whitepaper, “Machine Learning: Making Big Data Work for the Enterprise,” by Serge Haziyev, Vice President, Intelligent Enterprise and Iurii Milovanov, Data Science Practice Lead.