
A lot of developers complains about Golang.
“Why do I have to type
if err != nilevery single time?”“It is too much typing. It is boring.”
[Read More]

A lot of developers complains about Golang.
“Why do I have to type
if err != nilevery single time?”“It is too much typing. It is boring.”
[Read More]
In this tutorial we will explore ways to optimise loading partitioned JSON data in Spark.
I have used the SF Bay Area Bike Share dataset, you can find it here. The original data (status.csv) have gone through few transformations. The result looks like:
[Read More]In this tutorial, we will explore a couple of ways to add a sequential consecutive row number to a dataframe.
For example, let this be our dataframe (taken from Spark: The Definitive Guide github repo):
[Read More]Sometimes, we are required to compute the number of rows per each partition. To do this, there are two ways:
The first way is using Dataframe.mapPartitions().
The second way (the faster according to my observations) is using the spark_partition_id() function, followed by a grouping count aggregation.