Automating bioinformatic data to help end world hunger

kommit
Jan 29, 2023
5 min read

Updated: Nov 12, 2025

In kommit, we succeeded in supporting the automation and data management strategy for a big-impact genetic research center. By optimizing complex bioinformatic processes, we saved a team of renowned scientists countless hours of work. Their goal? Making crops more resistant to weather and climate change will help end the food crisis — and ours- and help them do it more efficiently.

"One machine can do the work of fifty ordinary men. No machine can do the work of one extraordinary man," said Elbert Hubbard more than a century ago.

Although no artificial intelligence or software data automation existed during the life of this known American writer, he had a point.

At kommit, we've never pretended to be experts in the complexity of our clients' industries or products, nor in bioinformatics (or bioinformatic pipelines for that matter) — nor have we wanted to replace in any way the bioinformatician professionals in that field that reached us for help about six months ago.

But we understood how data management automation could take their advanced genetic research project to a new level, a bioinformatic endeavor called Elastic Search Implementation for Rising Genome Analytics (ESIRGA), with the following outcomes:

Faster data interpretation and visualization
Improved data reliability and utility

Indeed, software automation is everywhere. According to Bots & People, "the global Robotic Process Automation market is projected to hit 23,9 Million US Dollars in 2030". If any time-consuming process in a business or market is worth automating, never doubt it will be.

So we had a unique opportunity to leverage our developers' expertise in this field by owning the optimization of the bioinformatics analysis area of ESIRGA. Our main goal was to provide the underlying genetic data processes with more efficient storage, processing, and retrieval of data, files, and information.

How did we face such a challenge from the software engineering and automation perspective and make scientists' lives easier in their quest to make plants more resistant to weather changes and help crops endure different types of conditions?

Find out about it below, in the first entry of kommit's new blog. We hope you enjoy it.

Data Management and Bioinformatics
Succeeding in Automation
How did we make it work?
Results
kommit's experience with ELK and data
kommit and bioinformatic projects

Bioinformatic Data's Management

The good thing about bioinformaticians is that they know what they're up to. Their expertise relies on managing specialized raw reads to extract complex information for scientific analysis.

A male scientist in a sterile, modern laboratory, using a computer to analyze data from agricultural or biological samples, showcasing kommit's expertise in BioTech software.

However, since they are not software experts, many processes in what they call bioinformatic data pipelines are done manually, consuming the time needed for other tasks like interpreting results, gaining insights into diseases, and devising treatment plans.

That's where kommit stepped in to help, assembling a software engineering team of five full-time dedicated developers to build a customized solution to automate the scientists' procedures in these pipelines.

The road to automation

First, our six-month partnership with the client began by figuring out which type of data the research team extracted during the bioinformatic pipeline, starting from the Genome sequencing and assembly — the complete analysis of DNA sets of a genome and their translation to FASTA files.

This leads scientists to what is known as genome annotation, a three-part process of manual tabulation of different DNA sequences and their corresponding files:

Transposable element (TE) annotation is used to identify and classify DNA sequences with complex patterns under the *.te file format.

Gene annotation is used to identify and classify the genes inside the genome, resulting in .gff files.

Variant calling is the process of identifying mutations in the genome. The final files in this step are in the .vcf format.

A detailed 3D visualization of a DNA double helix, symbolizing genetic research, bioinformatics, and the advanced life science solutions developed by kommit.

As you can see, these activities create several types of files, each with a different management system. This, in turn, challenges the researcher when trying to interpret results and then present them with data visualization.

We tackled such a complex and time-consuming process by automating three main activities:

"The genome assembly and quality analysis"
"The DNA data search in large files"
"The visual DNA comparison among genes of several individuals of the same species"

How did we make it work?

For genome assembly and quality analysis, kommits' team implemented a suite of orchestration scripts on Ruby that enabled bioinformaticians to drag and drop raw files into a folder where the corresponding pipeline executes automatically.

A scientist in a lab interacting with a specialized software interface on a monitor and laptop, displaying detailed biological and genetic data of a plant, developed by kommit.

Afterward, we built an application in Ruby on Rails to extend script functionalities and ease pipeline execution. In the UI, the user enters a path to the file that will be processed, and the application copies it to another folder where a Linux cron job searches for new files and starts the pipeline execution."

We also implemented a custom-built indexing system using the Elasticsearch, Logstash, Beats, and Kibana (ELK) stack to ease the queries in large files using non-relational databases.

Delivering tangible results

The ESIRGA project is in its early stages, and the research team is still experimenting with automation. One of our most important developments was reducing the bioinformatician's workload by 20%.

Now, they can search for a gene or mutation by typing its name or some features, and customized visualizations of their data just pop up.

Also, after our implementation, the scientific project showcased the tool at the 19th International Symposium on Rice Functional Genomics (ISRFG 2022), as "Rising Genome Analytics to Elastic Search Implementation for Rice Genome Analytics".

ELK expertise at kommit

The ELK stack is the most used for real-time data extraction, recompilation, and visualization for enterprise solutions.

As a software engineering company that delivers tech and innovation, we already had experience in data management with ELK, which proved to us to be very powerful in solving the ESIRGA project's challenges, thanks to the following features:

High performance
Easily scalable
Distributed architecture
Document-oriented database
Schema free
API-driven
Real-time search engine
Multi-tenancy

The following diagram illustrates how we used the ELK stack to index, access, and retrieve data from the biological research performed by the ESIRGA team.

A technical architecture diagram showing a data pipeline for scientific data processing. It illustrates how kommit uses Docker, Logstash, Elasticsearch, and Kibana (ELK Stack) to ingest, index, and visualize complex data formats like JSON and FASTA.

Our future with bioinformatic projects

Now that you know how we automated the bioinformatic processes of ESIRGA, one of the world's main projects tackling world hunger through genetic engineering, we're glad to share that we'll continue moving in this direction.

We're already working on ESIRGA's second phase, which entails implementing a module that integrates the system with GenBan and another one that integrates with the high-performance machines that power the project.

Keep in touch if you have any questions or wish to know about kommit's commitment to enterprises with the potential of changing the world as we know it!

And to say goodbye (for now), a famous phrase by Tom Preston-Werner: "You're either the one that creates the automation or you're getting automated."

Written by: kommit