In part one of this blog, we covered the first two stages of grief and left stage two having, from the data scientist’s perspective, inched forward in the battle to gain access to open source tools and functionalities on the corporate network.
However, even with success, the ‘free’ open source software requires installation and management by IT, as well as change control. All this requires budget – a difficult sell given the initial expectations of no capital expenditure requirement and instant gratification on the value achieved!
This leads us to our next stage in the grief process: depression.
Stage three: depression
Fighting for the solution that fits all needs at once – such as the latest and most flexible solutions that allow data scientists to get quick and up-to-date insights, but still fit into the requirements of the IT department – is time-intensive and exhausting. Unexpected events, such as additional costs which have not been considered before, on its last mile, are depressing for all who have been involved. Often this leads to stalemate between management and employees, and the process inevitably stops with no real solution.
In despair, short-term fixes, such as ValidR, may seem attractive to get the job done. However, data scientists should work with their IT department and business colleagues to implement a solution that can utilise the explosion of analytical tools and engines and address the many use cases required.
A longer-term solution has come with the emergence of analytical platforms, which are designed to empower the analytic community to build and operationalise analytics to drive business innovation.
For example, Teradata provides an integrated data and analytics
environment that delivers analytic functions and engines at scale; allowing users to easily build and use analytics through support for their preferred analytic tools and languages.
But how can we get new software and tools like this into operation? This is where data scientists and IT professionals can work together, to deliver a viable business case.
Costs are easy to articulate, but what about benefits? Well, this all comes down to the use cases the data scientist is trying to address. So, for example, if fraud detection could be improved by 10% using a new suite of machine learning models – that has a value that can be quantified.
The typical approach is to include as many of these use cases as are required to meet the return on investment threshold for your organisation. All the value cases are much more compelling to approvers if you involve someone from finance in the calculation process. Then they are ‘their’ numbers too!
So, there is light at the end of the tunnel and the depression felt earlier should, hopefully, be lifting – which brings us to the final stage…
Stage four: acceptance
It’s important that data scientists and IT don’t rely on point solutions, but work together to establish a longer-term solution. This last stage looks at other solutions that a forward-looking organisation should include in its armoury.
Let’s assume we’ve solved the problem of getting open source tools into the corporate ecosystem. Once one issue is resolved, it will usually be followed up by the data scientist saying something like:
“It would be really great if we could add Python to the stack.”
And then: “We’d like to experiment with AI – can we have Tensorflow?”
It should be noted that any data scientist worth their salt (and pay!) will want a path to production for whatever they end up developing. So how does an organisation create a space for adventurous data scientists to try out new toys, as well as allowing IT to insulate the organisation from problems that may result, whilst also learning something about the demands of new open source capabilities?
Cloud solutions can help to meet the needs of data scientists by using the cloud to create standalone data labs. This allows them to create environments using the very latest version of the software and the supported tool sets, while utilising the data structures and security profiles of our existing customer production systems.
Businesses can then import defined data sets into these data labs for any ad-hoc and experimental work that the data scientists need to do.
A major advantage of the cloud option is the ability to scale up or down the analytic processing capability, on-demand. The type of work performed by data scientists means that the elasticity of demand for resources is usually very high, so the flexibility of the cloud provides the perfect working model for these types of data labs/test/development workloads.
This makes developing the business case for any lab instance much easier – you get a bigger bang for your buck when you need it. It’s a bit like renting a nice car for your holiday.
This is all aligned with the future architecture too – so that when we upgrade the enterprise environment, we can bring these users back in-house without them having to change a single line of code.
Security is often the seen as the number one issue when considering a cloud offering. We feel that the emphasis by the cloud vendors on security means that this should not be an issue if their guidance is followed.
This does require a change in focus for IT teams though; moving away from managing on-premise platforms to policing data transfer and storage protocols, such as encryption or tokenisation – the process of substituting a sensitive data element with a non-sensitive equivalent that has no exploitable value).
Most on-premise platforms allow remote access – and we’re not sure that this confers magical security advantages over the major vendor offerings. This approach also allows managed isolation from production systems without fiddling about in data centres.
Going through these stages of grief, we’ve tried to suggest a mix of actions, projects and approaches to map a pathway through open-source grief to genuinely new capabilities for innovation and business transformation, for both data scientists and their colleagues in IT.
Will this work for you in your organisation? Hopefully some ideas will – if there was a perfect solution, we wouldn’t have needed to write this series! We’d be really interested to know how readers get on.
If you have found this interesting and want to explore further with Teradata, feel free to reach out to the authors – Stewart.Robbins@Teradata.com to talk about exciting use cases, or Greg.Loxton@Teradata.com for the boring technical stuff.