Skip to content Skip to sidebar Skip to footer

Remove Files From Directory After Uploading In Databricks Using Dbutils

A very clever person from StackOverflow assisted me in copying files to a directory from Databricks here: copyfiles I am using the same principle to remove the files once it has be

Solution 1:

If you want to delete all files from the following path: '/mnt/adls2/demo/target/', there is a simple command:

dbutils.fs.rm('/mnt/adls2/demo/target/', True)

Anyway, if you want to use your code, take a look at dbutils doc:

rm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory

The second argument of the function is expected to be boolean, but your code has string with path:

dbutils.fs.rm(files[i].path, '/mnt/adls2/demo/target/' + file)

So your new code can be following:

for i inrange (0, len(files)):
    file = files[i].name
        if now in file:  
            dbutils.fs.rm(files[i].path + file, True)
            print ('copied     ' + file)
        else:
            print ('not copied ' + file)

Solution 2:

If you have huge number of files the deleting them in this way might take a lot of time. you can utilize spark parallelism to delete the files in parallel. Answer that I am providing is in scala but can be changed to python.

you can check if the directory exists or not using this function below:

import java.io._
def CheckPathExists(path:String): Boolean = 
{
  try
  {
    dbutils.fs.ls(path)
    returntrue
  }
  catch
  {
    case ioe:java.io.FileNotFoundException => returnfalse
  }
}

You can define a function that is used to delete the files. you are creating this function inside an object and extends that object from Serializable class as below :

object Helper extendsSerializable
{
def delete(directory: String): Unit = {
    dbutils.fs.ls(directory).map(_.path).toDF.foreach { filePath =>
      println(s"deleting file: $filePath")
      dbutils.fs.rm(filePath(0).toString, true)
    }
  }
}

Now you can first check to see if the path exists and if it returns true then you can call the delete function to delete the files within the folder on multiple tasks.

valdirectoryPath="<location"valdirectoryExists= CheckPathExists(directoryPath)
if(directoryExists)
{
Helper.delete(directoryPath)
}

Solution 3:

In order to remove the files from dbfs you can write this in any notebook

%fs rm -r dbfs:/user/sample_data.parquet

Post a Comment for "Remove Files From Directory After Uploading In Databricks Using Dbutils"