0
How to modify the Dataframe

I have dataframe like this

Keyword                   devops       aws       ec2   python     java    dotnet
optional/mandatory        optional mandatory mandatory optional optional mandatory
Candidate Name                                                         
User1                       1          1          1        1        1         1
User2                       1          0          0        1        1         1
...                        ...        ...        ...      ...      ...       ...

User67                      1          1          1        0        0         0  

I want to calculate percentage match for each Candidate Name and add new Percentage match column also the row optional/mandatory play important role in calculating percentage. Like the keyword aws ec2 dotnet are mandatory if there count is 0 then set the percentage match for that Candidate to 0%. Means all mandatory keywords must have count 1 then only calculate the percentage of that cancidate.

Expected Output:

Keyword                   devops       aws       ec2   python     java    dotnet    Percentage
optional/mandatory        optional mandatory mandatory optional optional mandatory    Match
Candidate Name                                                         
User1                       1          1          1        1        1         1        100%
User2                       1          1          1        0        0         1        66.66%
...                        ...        ...        ...      ...      ...       ...

User67                      1          1          1        0        0         0         0%

for calculate percentage:

User1 is 100% as all mandatory have count 1

User2 is 66.66% as two optional have 0 count then Total Number of ons's/Total Keywords*100 (4/6)100

User3 is 0% as out of 3 mandatory one have zero count to directly the percentage is set to 0% irrespective of the optional keyword count

Keyword row and Candidate Name column keeps changing there can be more keywords and Candidate names in future. How to Implement the desire output.

My Code:

import numpy as np
import pandas as pd
read_csv = pd.read_csv('Sample.csv')
df = pd.DataFrame(read_csv)
df["Count"] = np.where(df["Count"] > 0, 1, 0)
print(df)
df = df.pivot_table(index="Candidate Name", columns=["optional/mandatory","Keyword"], values="Count")
print(df)

Output1:

Candidate Name    optional/mandatory Keyword  Count
      User1       optional           devops      1
      User1      mandatory              aws      0
      User1      mandatory              ec2      1
      User1       optional           python      1
      User1       optional             java      1
      User1      mandatory           dotnet      0
      User2       optional           devops      1
      User2      mandatory              aws      1
      User2      mandatory              ec2      0
      User3       optional           devops      1
      User3      mandatory              ec2      1
      User3      mandatory              aws      0
      User3       optional             java      1

optional/mandatory        mandatory             optional            
Keyword              aws dotnet  ec2   devops java python
Candidate Name                                           
    User1            0.0    0.0  1.0      1.0  1.0    1.0
    User2            1.0    NaN  0.0      1.0  NaN    NaN
    User3            0.0    NaN  1.0      1.0  1.0    NaN

Got the desired output by trying :

import pandas as pd

cols = pd.MultiIndex.from_arrays([
    ['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
    ['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])

data = [
    ['User 1', 1,1,1,1,1,1],
    ['User 2',1,0,0,1,1,1],
    # ...,
    ['User N',0,1,1,1,0,0]
]

df = pd.DataFrame(
    columns=cols,
    data=[applicant[1:] for applicant in data],
    index=pd.Index([applicant[0] for applicant in data], name='Candidate Name')
)

df.sort_index(axis='columns', inplace=True)
And then, we can use apply to run a function against each of the rows:


def calculate_skill_coverage_percent(r):
    if sum(r['mandatory']) < len(r['mandatory']):
        return 0
    else:
        return sum(r) / len(r)
df['Percent_Match'] = df.apply(calculate_skill_coverage_percent, axis='columns')

print(df)

Output:

                 mandatory            optional             Percent_Match
                 aws dotnet ec2   devops java python              
Candidate Name                                                        
User 1                 1      1   1        1    1      1           1.0
User 2                 0      1   0        1    1      1           0.0
...
User N                 1      1   1        0    0      0           0.5

I am little confused because, I have hardcoded values inside pd.MultiIndex.from_arrays and data. How to get these values using the above Dataframe in Output1 or any other suggestion would work

cols = pd.MultiIndex.from_arrays([
    ['optional', 'mandatory', 'mandatory', 'mandatory', 'optional', 'optional'],
    ['devops', 'aws', 'ec2', 'dotnet', 'python', 'java']
])

data = [
    ['User 1', 1,1,1,1,1,1],
    ['User 2',1,0,0,1,1,1],
    # ...,
    ['User N',0,1,1,1,0,0]
]


Python 02-11-22, 4:06 p.m. veerablog
0
On your blog, programming help would be trusted by the people which is quite beneficial. I’m sure after the help on your blog beginners and programmers able to grab help. Continue with your recommendation best uk essay writing service stuff here are quite provoking suggestions.
26-11-22, 1:37 p.m. AdonisJones


0
I am new to this and currently doing an internship that I got through the help of IT CV writing service. I have asked one of my peers about the solution to this, as lately I am having difficulty at calling for the NumPy functions while writing my program. Gotta learn more about Pandas.
07-12-22, 3:17 p.m. elmadavis


0
People would rely on your blog for programming assistance, which is pretty valuable. Beginners and programmers will be able to kipas guys obtain assistance from your blog. Continue with your recommendation content; the next suggestions are highly thought-provoking.
12-12-22, 3:48 p.m. superrobe


0
Even numerous column names can be updated simultaneously gartic phone. You must do it by placing additional column names between the curly braces, separated by commas.
11-01-23, 12:23 p.m. leochad


Log-in to answer to this question.