Sometimes we take a break from building cutting edge AI redaction models to stretch our academic muscles and write about privacy and machine learning. Check back here regularly for our musings.
Anonymization vs de-identification vs redaction vs pseudonymization vs tokenization
There’s a saying ‘the last 20% of the work takes 80% of the time’ and nowhere is that more true than AI systems.
Say you’re looking for credit card numbers. It’s quite easy to set up a regex that looks for 16-digit numbers or four groups of four numbers separated by a ‘-’. A regex like this is highly effective in the perfect world of computer data, but unfortunately the real world is much more complicated.
There exists a vibrant ecosystem of specialized security tools. The sad truth is that it is almost impossible to reach 100% invulnerability. What can we do to get closer?
In the past three years there has been a massive wake-up in customer awareness about privacy. Many customers are now refactoring how they buy, taking their business elsewhere if they don’t trust a company’s data practices.
Privacy Enhancing Technologies Decision Tree:
for developers, managers, and founders looking to
integrate privacy into their software pipelines
AI is rapidly being deployed around the world with few to follow. Along with the complexity of creating the technology, there remain many unanswered legal questions.
The new Tensorflow Lite XNNPACK delegate enables best in-class performance on x86 and ARM CPUs — over 10x faster than the default Tensorflow Lite backend in some cases.
Some techniques to improve DALI resource usage & create a completely CPU-based pipeline.
We introduce the four pillars required to achieve perfectly privacy-preserving AI and discuss various technologies that can help address each of the pillars.
We discuss a practical application of homomorphic encryption to privacy-preserving signal processing, particularly focusing on the Fourier transform.
We cover the basics of homomorphic encryption, followed by a brief overview of open source HE libraries and a tutorial on how to use one of those libraries (namely, PALISADE).
A number of people ask us why we should bother creating NLP tools that preserve privacy. Apparently not everyone spends hours thinking about data breaches and privacy infringements.
A very brief overview of privacy-preserving technologies follows for anyone who’s interested in starting out in this area. I cover symmetric encryption, asymmetric encryption, homomorphic encryption, differential privacy, and secure multi-party computation.